A regular expression to check for prime numbers

18 March 2007 By Avinash Meetoo 170 Comments

Regular expressions are extremely powerful. This is something I read at least once or twice every day while reading articles and blogs on the Web.

While browsing today, I found this page which thoroughly describes the use of the regular expression /^1?$|^(11+?)\1+$/ in Perl to check if a number is prime or not!!!

/^1?$|^(11+?)\1+$/

To be frank, I was skeptical. The regular expression looks like magic! And I wanted to understand it better. I rewrote it in Ruby using irb and I tested it:

Avinash@noulakaz:~$ irb

irb(main):001:0> def is_prime(n)
irb(main):002:1> (“1” * n) !~ /^1?$|^(11+?)\1+$/
irb(main):003:1> end
=> nil

irb(main):004:0> is_prime(10)
=> false

irb(main):005:0> is_prime(11)
=> true

irb(main):006:0> is_prime(12)
=> false

irb(main):007:0> is_prime(13)
=> true

irb(main):008:0> is_prime(99)
=> false

irb(main):009:0> is_prime(100)
=> false

irb(main):010:0> is_prime(101)
=> true

Great! It also works in Ruby! This means that there is no (Perl) magic going on. The regular expression really works. But how? Let’s try to follow the logic behind it.

Is 7 prime?

To know this, the function first generates “1111111” (from “1” * 7) and tries to see if that string does not match /^1?$|^(11+?)\1+$/. If there is no match, then the number is prime.

Notice that the regular expression has two parts (separated with the vertical bar |).

The first part is /^1?$/ is trivial and matches with beginning of line (^), an optional 1 (1?)
and end of line ($) which implies that it matches either the empty string or “1”. This simply indicates that calling that function with n==0 or n==1 will correctly return false (as the “1” * n will match with the first part of the regular expression)

The second part is where the magic occurs…

/^(11+?)\1+$/ matches with beginning of line (^) then by (11+?) then by \1+ and finally by end of line ($). I guess that you know that \1 is a variable which is bound to whatever was matched previously (in our case by (11+?)).

Let’s proceed slowly…

(11+?) does two things

It matches with “1” followed by one or more ones minimally. This means that it matches with “11” initially (notice that if there was no ? (i.e. (11+) was used instead, the whole string would have matched)
The string obtained (“11” initially) is bound to the variable \1.

\1+ then matches with whatever has been matched above (“11” initially) repeated one or more times. If this match succeeds then the number is not prime.

If you are following, you’ll realise that this eliminates all even numbers except 2 (for example, 8 is “11111111” and therefore (11+?) will match with “11” and \1+ will match with “111111”).

As for odd numbers (in our case 7), the (11+?) matches with “11” initially but \1+$ cannot be true (notice the $) as there are 5 remaining ones. The regular expression engine will backtrack and will make (11+?) match “111” and here also \1+$ won’t be true because there will be 4 remaining ones (and \1+$ will only match with a number of ones which is a multiple of 3 followed by end of line) etc. hence “1111111” will not match the regular expression which implies that 7 will be considered as being prime :-)

When I showed this to Christina this morning (true), she told me that this only checked for a number being odd or not. This is also what I felt at the beginning. But it really works. For instance, let’s try to apply it to 9 (which is obviously not even), “1” * 9 should match the regular expression…

“1” * 9 = “111111111”. (11+?) matches “11” initially. \1+$ cannot match because there are 7 remaining ones. Backtracking occurs. (11+?) now matches “111”. And here \1+$ matches the remaining 6 remaining ones! Hence, 9 is not prime.

Easy… and beautiful at the same time ;-)

[Join the discussion on Hacker News]

Comments

Ketwaroo D. Yaasir says

19 March 2007 at 18:23

unfortunately, I am quite allergic to regular expressions… too many bloody flavors to make any sense in the end.
avinash says

19 March 2007 at 22:48

I fully understand. I was quite allergic to regexpr too. But little by little I have come to like and use them a lot.

Let’s see, I use regexpr when working with vi, grep, sed, awk and, now, ruby…
Ketwaroo D. Yaasir says

19 March 2007 at 23:51

I’ve been trying to write a bbcode parser for wordpress µ to cure it of kses.php while keeping the entries of the bloggers “safe”. So I have to figure out how to use regex.

So i guees i too will have to learn to like it.

It’s like voodoo.
avinash says

20 March 2007 at 09:42

And I suppose you have realised that it is a small programming language… so it’s bound to be cool :-)
curiousEngine says

21 March 2007 at 09:25

Programming Quiz No. 2

A serial file named contains a series of values, each of which is how many prime numbers ther are below a particular value. Write a program to read the number of primes between and values where and are requested interactively and compare it to the value obtained from FUNCT(limit) which is defined as limit / LOG(limit).
avinash says

21 March 2007 at 13:34

Hi,

Seems that some of the text went missing… or you were seriously drunk :-)
VagabondoDigitale says

24 September 2007 at 23:19

Cool.. thx for your post. I like matematics quiz :P
doh says

25 September 2007 at 14:53

nice trick!

I find that the “backtracking” part was equivalence to finding out if the number is prime by brute-forcing

e.g. going from 3 to half the number and if any of that is divisible, its not a prime.

thus the ^(11+?)\1+& trick (anyone got it?)

i believe there are a lot of fast algorithms out there to use instead of generating 1s and parsing them with a regular expression which is kinda slow.

however this is truly magic ^^
ksivasenthil says

29 September 2007 at 07:50

hey avinash, what is the exact input that goes to the regex constructor. I am not a ruby programmer but my attempt to use this regex in javascript failed. The test always failed for both prime and composite numbers. If you know javascript please help show me a sample to use this in javascript pls
someone says

3 October 2007 at 17:53

the trick is to apply the regex to a string with n 1’s, for example, for 5, apply the regex to 11111.
Jay says

7 October 2007 at 10:55

A PHP Version:
!preg_match(‘/^1?$|^(11+?)\1+$/x’, str_repeat(‘1’, $n))

Example:

<?php

$n = 1;
while ($n < 1000000) {
if (!preg_match(‘/^1?$|^(11+?)\1+$/x’, str_repeat(‘1’, $n))) {
echo “$n: Prime\n”;
}
$n++;
}
?>
Skymt says

16 October 2007 at 17:19

C# version:

static void Main(string[] args)
{
int c;
if (args.Length == 1 && int.TryParse(args[0], out c))
Console.WriteLine(!Regex.IsMatch(new String(‘1’, c), @”^1?$|^(11+?)\1+$”));
}
Binks says

19 October 2007 at 04:17

Java Version, nice tip btw, guess I should learn more about regular expressions. Java requires that the \ be \’d out, otherwise it sees it as a smiley face character and problems happen :P.

public static void main(string[] args){
if (args.length >= 1)
{
int c = Integer.parseInt(args[0]);
String s = “”;
while (c > 0) {c–; s+=”1″;}
System.out.println(!Pattern.matches(“^1?$|^(11+?)\\1+$”, s));
}}
Venkatesh says

19 October 2007 at 09:11

This is just awesome…
Steven says

30 January 2008 at 02:35

Hello,
adamsky says

10 February 2008 at 16:19

To be honest – regular expressions are “overused”… And are NOT that good. For example in PHP using just common str_replace(); would be in many cases much faster then preg_replace();

Problem is that people tend to use a lot of “sophisticated” code where it’s not necessary, only because it looks more “professional”. My thought on this is “I don’t care what others think – I just need my code to be fast, clean and as simple as it is possible”.
avinash says

11 February 2008 at 01:32

True.

Sometimes regular expressions are extremely cryptic (like the one above in fact ;-) ) but they are also interesting to understand because they are essentially small programs written in a domain specific language.
Nick says

19 February 2008 at 04:33

The magic is in the backtrack…. if it can find a number a=(11+?) that multiplied N times gives our number, it’s obvious that our number can’t be prime (our number = a*N, i.e. not prime)
roscoe says

26 March 2008 at 02:19

How many numbers does this count up to? whats the largest prime that can be found in *real* time?

I’m interested so mail me please.
avinash says

26 March 2008 at 23:24

Hi Roscoe,

I’ve not done any performance test but I guess that if you use a compiled language like C with a good Regex implementation, then, maybe, you can generate prime numbers real fast.

On the other hand, if you really want to generate primes then have a look at The Sieve of Eratosthene.
Paul says

2 April 2008 at 23:38

Avinash. Super, I’ve been looking around for a Regex similar to this for a while. I’ve modified it to suit.
avinash says

3 April 2008 at 19:32

Hi Paul,

Great to know that you find this regular expression useful. I’d like to point out that I’m not the original author though ;-)
rasmus says

27 April 2008 at 03:22

if primenumber mod 2 is 0 then
isprime = true

wouldnt that be easier and faster? ^^
rasmus says

27 April 2008 at 03:30

sorry, its getting late here heh

if primenumber mod 2 != 0 then
isprime = true
avinash says

27 April 2008 at 07:20

Hi Rasmus,

A prime number is not simply odd. Check this.
Celina says

30 June 2008 at 15:44

I have never been able to understand regular expressions, what is the logic behind them?
avinash says

1 July 2008 at 06:54

Hi Celina,

Regular expressions are not trivial. But they are very powerful. If you want to know more about them, this is a good start.

A lot of people also recommend this book.
Anonymous says

10 July 2008 at 03:56

While it’s neat, it’s not a great idea to use for big numbers since it uses O(N^2) time and O(N) memory where even a trivial solution uses O(sqrt(N)) time and O(1) memory.
avinash says

10 July 2008 at 07:56

True :-)

I wonder if someone has actually done some benchmarking with large numbers?
Anonymous says

10 July 2008 at 17:53

This regular expression does not use O(1) and O(N^2), its actually exponential in the size of the input. If it unpacks the number 7 (32 bit) to 7 1’s, thats 7*(32). So a 32 bit number 2147483648 would take up half a gig of memory.
Kadhirvel says

29 July 2008 at 21:06

Please follow the link below for fact about primes

http://kadiprimality.blogspot.com/
nidhi says

9 January 2009 at 11:59

Pls give me regex to match filename that doesn’t start with a number and have html extension.
avinash says

9 January 2009 at 16:05

Maybe something like “^[^0-9].*html$”

Notice that the two ^ have very different meanings. The first one is an anchor which represents the beginning of the line. The second one is the “inverse” operator i.e. [^0-9] matches with any character excepts a digit.
Skymt says

9 January 2009 at 17:02

You could also try this: “^\D\w+.html”

((^)Start of line – (\D)Not a digit – (\w+)several letters – (.html)end with .html)

Works with the .NET flavor of regex…
rentis3 says

31 January 2009 at 19:41

for calculations with big numbers, try F#
erwin says

31 March 2009 at 21:35

try this: 999999 on the above php regexp it says prime….
it isn’t ….
avinash says

1 April 2009 at 11:11

So it seems that there is a bug lurking somewhere…
cetin says

12 April 2009 at 17:51

I don’t think this is formally a regular expression, because of “?” character. The processing of a real regular expression never requires backtracking (i.e. never requires reading the input multiple times) and hence they are very fast once you generate the deterministic finite automata (DFA) for the given regular expression.

Programming languages usually provide extensions to regular expressions (like ? in this case) for convenience to the programmer. There are only 3 operators in formal regular languages, which are star (“*”), union (“|”) and concatenation. All other operators used in a regular expressions must be representable by using these 3, for example you can represent 0+ as 00* and so on. The operator “?” used in the above expression is something totally different and which in fact violates the rule that machines that process a regular expression can do so with finite amount of memory. Hence this expression is actually not regular and the language generated by it is not a regular language, mathematically.
avinash says

12 April 2009 at 20:35

Mathematically, you may be right (but I am not so sure.) But, from a programming point of view, regular expressions have always allowed backtracking…
Fred says

10 July 2009 at 21:25

php may reject large numbers silently due to recursion depth limit (which I think is 32767 levels), that’s why 999999 fails.
avinash says

12 July 2009 at 22:48

Thanks for pointing this out…
Patrick Atambo says

16 July 2009 at 23:25

WOW, pretty cool! I have to revisit regular expressions it seems. :)
akash says

17 September 2009 at 05:23

a language such that a string is made of binary alphabet( of 0s and 1s) and the number of zeros in the string is a 3 digit prime number…….is this language regular…??
can u pls help me out in this ……its a ques appeared in GATE entrance
perler says

3 November 2009 at 16:14

this was good shit. thanks!
Pras Sarkar says

18 November 2009 at 22:23

This is really intriguing, but I’m unclear about your explanation about how it works (it definitely works). For example, try the number 49 (not prime) – 1111111111111111111111111111111111111111111111111 – neither (11)+ matches nor (111)+ matches, so you would think the regex would classify 49 as prime (it does not). So how does it really work?
Makovec says

4 February 2010 at 20:24

I would rather see pattern like this: “^\d*[13579]$” in my scripts.

It’s clear, easy and I think it’s also faster then the one described. If you want negative numbers it’s easy to change.

Anyway it’s nice work and idea!

David
Makovec says

4 February 2010 at 20:29

Oh, hell – my bloody English. Forgot my previous note :( Shit
Nithin Bekal says

15 May 2010 at 10:20

Wow, that’s impressive. A colleague of mine showed me this regular expression, and I couldn’t really believe it was possible. I’ve never really learned regular expressions, but this has definitely encouraged me to start learning.
avinash says

15 May 2010 at 23:29

@Nithin

Yeah. They are cool. Personally, I practically never use the very complex ones (like the one above) but I routinely use the more usual ones especially when I’m using vi or TextMate on my MacBook.
ap2cu says

17 May 2010 at 20:24

Nice!!
I implemented it on my website: http://ap2cu.com/tools/prime
This is working great
You can also check my REGEX online tool: http://ap2cu.com/tools/regex

Thanks for this post!
Sean Walker says

5 July 2010 at 02:12

@Pras Sarker: For your example 49, it works because it matches (1111111+) and \1+ (7×7). Basically it starts by matching 11+ (2 times something), then 111+ (3 times something) then 1111+ (4 times something). It starts at 2 and looks at each succeeding number to see if it is a factor (by matching it and then matching multiples of it). But like everyone said this method is very very slow, the bigger the number you are checking the worse the performance, by a LOT.

@Makovec: You and several other people don’t seem to understand what a prime number is. A prime number is not an odd number. The number 21 would match your simplified regexp because it is odd (ends in 1, matching your expression) but it should FAIL the prime number check because it has factors 3 and 7 (3×7=21). You can’t simply look for odd numbers.

This is probably the simplest way to check for prime numbers within a single regexp for sure, but because there are so many non-regexp ways to do it (which are MUCH more efficient), this is just a novelty and not actually useful for any real world application.

Still WAY cool though!! :)
avinash says

5 July 2010 at 06:19

Exactly :-)
Paulo Geyer says

5 July 2010 at 08:00

Did you notice that you could do this also:
def is_prime(n)
(100-2)%2.0 != 0
end

it’s just amazing, and really easy
Sachin says

5 July 2010 at 09:22

awesome..cool..another feather to my knowledge…thanks for sharing..
Paulo Geyer says

5 July 2010 at 09:28

wait, my code doesn’t work.
This (11+?)\1+$ is more complex than i thought
avinash says

5 July 2010 at 10:25

You’re right.

The regular expression does not test for an odd number. Instead, it tests all divisors which is way more complex and, dare I say, more beautiful :-)
Vlado says

5 July 2010 at 11:37

Interesting expression, but I think the real power of regular expressions is that they are effective, because they are implemented as finite state automates (FSA). Although this one cannot be implemented like such, because of this backtracking. In theory FSA and regular expressions CANNOT check for prime numbers (check Pumping Lemma ;) ). I think such implementations of RE are wrong, because they obviously don’t guarantee you the O(n) complexity of FSA and can become very inefficient :(
Craig says

5 July 2010 at 12:53

@Paulo:
Assuming you intend to return the result of that equation, you are effectively returning false all the time, as you simply take the modulus of 98 with respect to 2.0.

I’m not really sure where you’re headed with that but it looks like you too have confused primes with odd numbers.
Andi Kalsch says

5 July 2010 at 16:42

JavaScript one:

var n = 5; !/^1?$|^(11+?)\1+$/.test((function(n){s=”;while(n–)s+=’1′;return s;})(n))
bogdan says

5 July 2010 at 17:29

For the love of God … stop calling it a regular expression because it is NOT a regular expression. Awesome as it may be, this is just a recursive function written using (mostly) only symbols that are often associated with regular expressions.

It REALLY is a proven fact that no regular expression can be devised that accepts all and only prime numbers.

I truly love the idea though.
Paulo Geyer says

5 July 2010 at 20:23

@Craig

after i wrote the code, I’ve noticed that it just tests for odd numbers. I should pay more attention before posting.

Actually, I didn’t understand how (11+?)\1+$ really works, there is any math paper about this approach to test the primality of number? I’ve never seen that before
avinash says

6 July 2010 at 06:07

@Paula The key is to understand that the matching is being done with a string of ones of lenght N, the number being tested for primality.
brandon says

8 July 2010 at 21:31

Doesn’t work? http://rubular.com/r/PGckdeL1Gq
avinash says

8 July 2010 at 22:17

The RE should be matched with a string of ones of length N, the number being tested.
brandon says

8 July 2010 at 22:27

ahhh, I wan’t paying attention. That’s what I get for skimming over your blog post…

Great work!
Colin says

20 July 2010 at 18:55

Regular expressions aren’t powerful. http://en.wikipedia.org/wiki/Chomsky_hierarchy#The_hierarchy In fact they’re strictly less powerful than most languages which are turing-complete.
Utkarsh says

21 July 2010 at 08:26

:|

Is there anything regex can’t do?

:|
Chmike says

21 July 2010 at 09:50

I had a hard time understanding the algorithm used. Your article doesn’t make it clear.

The algorithm detects if the input string of 1 can be split in 2 or more shorter strings of same length and with at least two 1.

It will obviously work, but it’s just a curiosity. Not very efficient.
Nick says

21 July 2010 at 10:15

You might also find amuzing my rather silly article on calculating fibonacci number with regexes:

http://nick.zoic.org/2010/06/01/fibonaci-regex-perversity/

Bogdan @72 is right, it’s not really a “Regular Expression” as such, but I think it is still appropriate to call it a Regex because most regex engines support this kind of silliness :-).
thomas says

23 July 2010 at 05:35

This is a terrible idea. Just because you can do it, it does not mean you should. I am sure I can hammer down a nail with my $2000 guitar :)
avinash says

23 July 2010 at 07:17

Yeah. Regular expressions are not very efficient. But you have to agree that they have a great “wow” effect :-)
venki says

23 July 2010 at 10:22

fan++ to this blog. :) Awesome post, i’m recently into extensive usage of reg-ex and this works like a charm :) thank you.
Deepti says

23 July 2010 at 12:39

I tried the Java version(given by Binks) –
public static void main(string[] args){
if (args.length >= 1)
{
int c = Integer.parseInt(args[0]);
String s = â€œâ€;
while (c > 0) {câ€“; s+=â€1â€³;}
System.out.println(!Pattern.matches(â€œ^1?$|^(11+?)\\1+$â€, s));
}}

Doesnt work for me. Can anyone help?
Braden says

23 July 2010 at 19:50

@Ketwaroo

Just use HTMLPurifier.
Joe says

23 July 2010 at 21:01

A very silly, useless algorithm, beautifully expressed. :-p
Loki says

24 July 2010 at 00:05

Try this instead.

public static void main(String[] args){

String[] nums = {“0”, “1”, “3”};

if (nums.length >= 1)
{
int c = Integer.parseInt(nums[2]);
String s = “”;
while (c > 0)
{
c –;
s += “1”;
}
System.out.println(!Pattern.matches(“^1?$|^(11+?)\\1+$”, s));
}
}
Mauro says

27 July 2010 at 10:03

Hi,
I think there’s a lot of confusion about what a Regular Expression is (and what it can do)
I don’t know where to start.

The “magic pattern” shown above is not a Regular Expression, but a “regex”, just an input string that a “specific” pattern-patching-engine can process.

“Regular expressions describe regular languages in formal language theory. They have thus the same expressive power as regular grammars. Regular expressions consist of constants and operators that denote sets of strings and operations over these sets, respectively.” – Wikipedia

“Many features found in modern regular expression libraries provide an expressive power that far exceeds the regular languages. For example, many implementations allow grouping subexpressions with parentheses and recalling the value they match in the same expression (backreferences). This means that a pattern can match strings of repeated words like “papa” or “WikiWiki”, called squares in formal language theory. The pattern for these strings is (.*)\1.
…
Many tools, libraries, and engines that provide such constructions still use the term regular expression for their patterns. This has led to a nomenclature where the term regular expression has different meanings in formal language theory and pattern matching. For this reason, some people have taken to using the term regex or simply pattern to describe the latter” ” – Wikipedia

This is not just a linguistic problem.

Looks at algorithm to evaluate “regexes”:

“The third algorithm is to match the pattern against the input string by backtracking. This algorithm is commonly called NFA, but this terminology can be confusing. Its running time can be exponential, which simple implementations exhibit when matching against expressions like (a|aa)*b that contain both alternation and unbounded quantification and force the algorithm to consider an exponentially increasing number of sub-cases. This behavior can cause a security problem called Regular expression Denial of Service – ReDoS, which might be used by hackers who want to attack a regular expression engine.” – Wikipedia

And our “magic pattern” is (11+?)\1+

Now. because an input string, to be declared as “not matching”, must not match ALL of possible paths, the poor engine must evaluate all of them. Here is the problem.

Imagine now to insert a very low number (around a million).
Poor engine should evaluate a string long a million 1s!! Oh my god…

So please, use this funny regex just for study, or to understand how a backtracking engine works, but again, please, think of your children, don’t try this at home.
avinash says

27 July 2010 at 10:10

You’re 100% right.

This regular expression, or more precisely, this regex is pathetically inefficient yet so beautiful in a geekish sort of way :-)
gphilip says

27 July 2010 at 23:35

WOW, this is awesome!
Willian says

30 July 2010 at 18:22

Amazing, but not fast at all with big numbers =p
Imagine you have to verify a larger number, like 4294967293 =D
Sumeet Chawla says

30 July 2010 at 20:42

Even I am pretty interested in regular expressions. I learned them by using in VI and linux but now I find its use a lot while coding in php.
Adar Wesley says

31 July 2010 at 21:34

Hi,

I didn’t go through all the comments above, but I don’t think anyone actually explained why the regexp works, so I’ll give it a try.

The secret is in the representation of the number. The number is represented as a sequence of ones (‘1’) as long as the number. What the regexp does is try to see if this
sequence of ones “can be broken down into 2 or more equal parts”.
Wait a second, that’s exactly like saying the number is divisible. The dividers are the length of the substring matched by (11+?) and the number of times it occurs in the original string.

By the way, I agree that this is extremely inefficient.
I also agree with @avinash that it’s “beautiful in a geekish sort of way”.
Sterling (Chip) Camden says

31 July 2010 at 21:57

I ran some benchmarks: http://www.chipsquips.com/?p=2418&cpage=1#comment-265857

It’s far too inefficient for practical use, but it’s mathematically brilliant.
Svetlana Wiczer says

1 September 2010 at 22:12

Awesome. I love regex. Your regex makes perfect sense from the start. Have you ever tried to draw a table 7-cells wide and fill it row by row with numbers 1-2-3-4 (to infinity – lol) then circle primes? You’ll see beautiful diagonal patterns.
avinash says

1 September 2010 at 22:35

I’ll do that tomorrow, Svetlana :-)
Mike Bethany says

8 October 2010 at 09:04

Very cool trick and probably useful for small prime number tasks but of course it’s just not feasible for any really heavy math. Neat trick though.
hanicker says

13 October 2010 at 06:04

Everybody loves primes!
Prestaul says

15 December 2010 at 21:48

Better JS version:
function isPrime(val) { return !/^1?$|^(11+?)\1+$/.test((new Array(val + 1)).join('1')); }
kiran pawar says

15 May 2012 at 15:53

# script to check wheather no. is ‘composite no’.
# if nothing print’s then given no. is ‘prime’.

my $no = $ARGV[0];

for(my $i=2; $i<=$no; $i++){

my $result=($no/$i);

if($result=~/^[0-9]+$/ && $result=~/[^1]/){

print"$result \* $i\n";

print"$no is composite number";

exit;
}

}
Avinash Meetoo says

15 May 2012 at 18:10

Thanks for sharing!
Niranjan says

4 February 2013 at 01:23

I think this is cool. But it’s not magic. What the regex matcher is doing is that it’s trying to split a unary number into equal parts. 11+? is like a dividend and \1+ is repeater of the dividend. If no dividend exists for a unary representation of a number then it is prime. So it will always work, although it’s not different from checking whether given number is divisible first by 2 then by 3 and then by 4 and so on.
Asen Bozhilov says

24 April 2013 at 19:37

Almost agree with @cetin, but he misses some points. He is right that the regular language is some language for whom we are able to construct the DFA. Most of today’s regular expression flavours use NFA, but that doesn’t make the irregular since for every NFA we can construct the corresponding DFA. Regarding the backtracking, this is naturally behavior in NFA, since we have to evaluate all the paths of the NFA.

I am a little bit frustrated that only one person mention UVW theorem. Indeed you can use the Pumping lemma to prove that is irregular or regular language: http://en.wikipedia.org/wiki/Pumping_lemma_for_regular_languages
Avinash Meetoo says

24 April 2013 at 22:15

Thanks Asen for your comment. I don’t understand everything you write but I feel you are right :-)
Stephan Eriksen says

29 April 2013 at 23:46

That is so sexy. OMG.
Ken Ficara says

1 July 2013 at 00:23

I love this, but it’s not a regular expression (not formally, and not even by Perl standards). The regex backtracks and counts, which is not possible for a formal regular expression, and wouldn’t work at all without the implicit loop that turns the number into a string of 1’s.

But a thing of beauty nontheless.

http://blog.kenficara.com/2013/06/30/irregular-language-and-regular-expressions/
Nadim says

11 July 2013 at 22:30

Since I’m a web developer (Front-end and Back-end), I always read articles concerning my field. So I was reading about the “Adventures in Javascript development” by Rebecca Murphey. Her blog was powered by Octopress, a blogging framework for hackers. I went to find out more about Octopress and while going through it’s documentation and plugins, I found the actor in this Noulakaz article, that is the regex, being mentioned + a link to the source article (this one) added in the comments.

Here’s the plugin page: http://octopress.org/docs/plugins/backtick-codeblock/

Congrats Avinash !
Avinash Meetoo says

12 July 2013 at 06:00

Thanks for noticing. This post on regular expressions is definitely the most popular post of this blog. I’m happy I’ve managed to write something people find useful. I’m not the one who found the regular expression though :-)
Tom Lord says

13 February 2015 at 14:34

I’m currently working on a ruby gem to generate *examples* of given regular expression. For example:

/this|is|awesome/.examples # => [“this”, “is”, “awesome”]

So, I tried it out on this “prime number validator” regex:

/1?|(11+?)\1+/.examples
#=> [“”, “1”, “1111”, “111111”, “11111111”]
/1?|(11+?)\1+/.examples(max_repeater_variance: 50, max_group_results: 100).uniq.sort.map(&:length).first(10)
#=> [0, 1, 4, 6, 8, 9, 10, 12, 14, 15]

require ‘prime’
Array(1..100) – /1?|(11+?)\1+/.examples(max_repeater_variance: 50, max_group_results: 3000).uniq.sort.map(&:length) == Prime.first(25)
#=> true

Horribly inefficient, but pretty cool! Here’s my gem if case anyone’s interested: https://github.com/tom-lord/regexp-examples
Avinash Meetoo says

13 February 2015 at 20:56

Thanks Tom for sharing.
Robert Clausecker says

29 January 2016 at 21:24

This thing does not describe a regular language. I refuse to call it a regular expression. Backreferences are not part or ordinary regular expressions.
Jaap Hollander says

7 June 2016 at 16:09

The language of strings that have prime length over the alphabet {1} is not regular. This follows directly from the pumping lemma.

This means, that a “regex” that find primes relies on extensions and we cannot expect it to work in general.
David Bruchmann says

19 May 2017 at 16:12

Just want to mention that pseudo-primes are not found, so even the regex is damned slow with larger numbers it seems being reliable.
Nevertheless max recursion-levels make it unreliable in PHP for larger numbers, see the comments above.
David Bruchmann says

19 May 2017 at 16:21

Here is an array of pseudo-primes for those who like to test it:

$pseudoprimes = array(9,15,21,25,27,33,35,39,45,49,51,55,57,63,65,69,75,77,81,85,87,91,93,95,99,105,121,133,185,217,273,301,305,323,325,341,377,481,511,561,585,645,657,671,703,705,781,793,817,949,989,1001,1105,1111,1159,1233,1261,1281,1333,1387,1417,1541,1661,1729,1825,1829,1891,1905,2047,2101,2257,2353,2465,2501,2665,2701,2737,2821,2981,3201,3239,3277,3281,3367,3421,3565,3589,3641,3745,3751,3827,3913,4033,4097,4181,4187,4369,4371,4525,4577,4641,4681,4921,4961,5041,5185,5459,5461,5551,5611,5713,5777,6305,6533,6541,6601,6697,6721,7161,7381,7449,7777,7813,7957,8113,8149,8321,8365,8401,8481,8911,9071,9179,9265,9709,10001,10225,10261,10585,10745,10877,11041,11111,11137,11419,11521,11663,12025,12209,12403,12545,12801,13021,13201,13333,13665,13919,13981,14089,14701,14839,14981,15251,15457,15709,15751,15841,16109,16211,16297,16531,16705,17119,17329,17711,18361,18407,18705,18721,18971,19043,19345,20017,20197,20417,20425,21049,21361,22049,22499,23407,23521,23653,24211,24465,24569,24661,25199,25351,25761,25829,25877,26069,27323,27971,28009,29281,29341,29539,29681,29857,29891,30121,30673,30739,30857,30889,31021,31621,31631,31697,32759,33153,34441,34561,34943,34945,35207,35425,35785,36301,38081,39059,39203,39305,39331,39493,39689,40309,40501,41041,42799,43621,44099,44173,44287,44801,46657,46979,47197,47641,47879,48133,49141,49241,49321,50183,50881,51841,51983,52633,52701,53663,53971,54705,55969,56033,58519,58825,60377,63139,63973,64079,64681,67861,67921,68251,72041,72389,73919,74593,75077,75241,75361,76627,78937,79381,80189,90061,90241,96049,97439,97921,98173,100065,100127,100651,102311,105281,113573,115639,118441,125249,130139,137549,137801,146611,153931,155819,158399,161027,162133,162401,163081,172081,176399,176471,186961,189419,192509,197209,197801,218321,219781,224369,230691,231703,243629,249331,252601,253259,254321,257761,268349,268801,271441,272611,288919,302101,313499,324899,370229,399001,429479,430127,449065,459191,473891,480689,488881,530881,600059,621781,632249,635627,656601,670033,741751,838201,851927,904631,997633,1024651,1033997,1050985,1106327,1149851,1256293,1317121,1388903,1392169,1533601,1615681,1653601,1690501,1697183,1773289,1857241,2113921,2263127,2433601,2435423,2455921,2662277,2704801,3057601,3175883,3218801,3224065,3338221,3399527,3452147,3581761,3664585,3774377,3828001,3900797,3967201,4109363,4226777,4403027,4463641,4828277,4870847,4903921,5481451,7405201,7640137,7678321,8539786,9264097,9439201,9532033,9582145,10237921,12813274,16532714,17340938,24658561,27422714,27664033,31150351,33940178,46672291,64132426,89733106,95173786,102690901,109437751,130944133,139952671,178482151,187473826,196075949,203211098,214038533,234735586,284301751,353686906,383425351,395044651,407282851,417027451,498706651,517697641,545670533,582799951,612816751,799171066,801123451,855073301,903136901,919831058,970355431,1091327579,1133818561,1188287794,1235188597,1389675541,1502682721,1955272906,2059739221,2166139898,2304156469,2309861746,2864860298,2976407809,3273820903,3841324339,3871638242,3924969689,4982970241,5130186571,5242624003,5313594466,5867301826,6335800411,7045248121,7279379941,7825642579);
On request I can send more details about it.
France fermeture says

25 June 2018 at 01:08

Article trÃ¨s instructif… :)
Gerard says

19 October 2018 at 12:28

I get pleasure from, lead to I discovered just what I used to
be taking a look for. You have ended my four day long hunt!

God Bless you man. Have a nice day. Bye
aiden says

15 April 2019 at 18:34

JavaScript version in 1 line
var isPrime=(val)=>!/^1?$|^(11+?)\1+$/.test(‘1’.repeat(val));
isPrime(19)==true
aarogyam 1.3 says

26 July 2021 at 12:10

Very nice patterns collection, I really like it, Very creative.Thanks a lot for sharing !!
Jesse BurstrÃ¶m says

5 March 2022 at 12:23

Fortunately all research is to no pattern matching for primes they are truly random so only quantuum computers can upheaval cryptography.

This is though the opposite of that: prime numbers are truly magical expressions!
seniormars says

17 May 2022 at 22:00

Below is the rust version

“`rust
// regular regex crate does not support backtracking
use fancy_regex::Regex;

/// Tests if a positive integer is a prime or not
///
/// # Notes
/// Extremely inefficient
///
/// # Panics
///
/// Panics if the regex cannot be created
/// Panics if number is extremely large or much greater than recusion limit
pub fn is_prime(number: usize) -> Result {
let re = Regex::new(r”^1?$|^(11+?)\1+$”)?;
let possible_prime_match = “1”.repeat(number);

match re.is_match(&possible_prime_match) {
Ok(is_prime_bool) => Ok(!is_prime_bool),
Err(err) => Err(err),
}
}

#[cfg(test)]
mod tests {
use crate::is_prime;

#[test]
fn blog_examples() {
assert!(!is_prime(10).unwrap(), “10 is not a prime”);
assert!(is_prime(11).unwrap());
assert!(!is_prime(12).unwrap(), “12 is not a prime”);
assert!(is_prime(13).unwrap());
assert!(!is_prime(99).unwrap(), “99 is not a prime”);
assert!(!is_prime(100).unwrap(), “100 is not a prime”);
assert!(is_prime(101).unwrap());
}

#[test]
fn psuedoprimes() {
let psuedoprimes = vec![
9, 15, 21, 25, 27, 33, 35, 39, 45, 49, 51, 55, 57, 63, 65, 69,
];

assert_eq!(
psuedoprimes
.iter()
.map(|x| is_prime(*x).unwrap())
.collect::<Vec>(),
vec![false; psuedoprimes.len()]
)
}

#[test]
fn large_prime() {
// shows how slow this matching process is
let large_prime: usize = 3010349;
let large_non_prime: usize = 3010350;
assert!(is_prime(large_prime).is_err(), “Stack Overflow”);
assert!(is_prime(large_non_prime).is_err(), “Stack Overflow”)
}
}
“`
Avinash Meetoo says

18 May 2022 at 07:26

Thanks seniormars. I always wanted to learn a bit of Rust (and Go to be honest). Why do you like Rust?
seniormars says

19 May 2022 at 07:09

I like Rust because it gives me absolute control when I need it, but at the same time, the language is so high level that I can rapidly create the software I need. In other words, rust is a language that allows me to focus on programming while maintaining the speed you would expect from C and C++ — that’s powerful. Another reason I particularly love rust is the excellent compiler and type system rust has. While the compiler can be seen as a strict nuisance, I believe a better way to see it is a teacher that tries to guide you to write the best code. Additionally, I like that everyone uses cargo — the rust package manager to lint, publish code, run tests, and create documentation. This way, you won’t ever have to deal with the issues we see in make/CMake. Finally, what captures my heart is that Rust features many functional programming aspects. From abstract data structures, lazy evaluation, to upcoming features such as higher-rank traits, rust captures the best part of multiple programming paradigms. At one point, you eventually want to use rust for everything if you can, and that is why it is an extremely loved language — because the community believes in it.

I’m sure that once you start using rust, you will realize the power of a language based on modern programming language theory.
Avinash Meetoo says

19 May 2022 at 08:42

Thanks for your very insightful observations. I did try Rust some time ago and I liked it. I’ve a CS background myself and love type theory (I spent a lot of time with Haskell in the past) and functional programming (for me, Scheme is perfection). I’ll get into Rust again. Thanks.
P V says

17 July 2023 at 15:25

In vim scripts, the expression ‘\v^1?$|^(11{-1,})\1+$’ works.