2007 03 18

Regular expressions are extremely powerful. This is something I read at least once or twice every day while reading articles and blogs on the Web.

While browsing today, I found this page which thoroughly describes the use of the regular expression /^1?$|^(11+?)\1+$/ in Perl to check if a number is prime or not!!!

To be frank, I was skeptical. The regular expression looks like magic! And I wanted to understand it better. I rewrote it in Ruby using irb and I tested it:

Avinash@noulakaz:~$ irb
irb(main):001:0> def is_prime(n)
irb(main):002:1> ("1" * n) !~ /^1?$|^(11+?)\1+$/
irb(main):003:1> end
=> nil
irb(main):004:0> is_prime(10)
=> false
irb(main):005:0> is_prime(11)
=> true
irb(main):006:0> is_prime(12)
=> false
irb(main):007:0> is_prime(13)
=> true
irb(main):008:0> is_prime(99)
=> false
irb(main):009:0> is_prime(100)
=> false
irb(main):010:0> is_prime(101)
=> true

Great! It also works in Ruby! This means that there is no (Perl) magic going on. The regular expression really works. But how? Let’s try to follow the logic behind it.

Is 7 prime?

To know this, the function first generates “1111111″ (from “1″ * 7) and tries to see if that string does not match /^1?$|^(11+?)\1+$/. If there is no match, then the number is prime.

Notice that the regular expression has two parts (separated with the vertical bar |).

The first part is /^1?$/ is trivial and matches with beginning of line (^), an optional 1 (1?)
and end of line ($) which implies that it matches either the empty string or “1″. This simply indicates that calling that function with n==0 or n==1 will correctly return false (as the “1″ * n will match with the first part of the regular expression)

The second part is where the magic occurs…

/^(11+?)\1+$/ matches with beginning of line (^) then by (11+?) then by \1+ and finally by end of line ($). I guess that you know that \1 is a variable which is bound to whatever was matched previously (in our case by (11+?)).

Let’s proceed slowly…

(11+?) does two things

  1. It matches with “1″ followed by one or more ones minimally. This means that it matches with “11″ initially (notice that if there was no ? (i.e. (11+) was used instead, the whole string would have matched)
  2. The string obtained (“11″ initially) is bound to the variable \1.

\1+ then matches with whatever has been matched above (“11″ initially) repeated one or more times. If this match succeeds then the number is not prime.

If you are following, you’ll realise that this eliminates all even numbers except 2 (for example, 8 is “11111111″ and therefore (11+?) will match with “11″ and \1+ will match with “111111″).

As for odd numbers (in our case 7), the (11+?) matches with “11″ initially but \1+$ cannot be true (notice the $) as there are 5 remaining ones. The regular expression engine will backtrack and will make (11+?) match “111″ and here also \1+$ won’t be true because there will be 4 remaining ones (and \1+$ will only match with a number of ones which is a multiple of 3 followed by end of line) etc. hence “1111111″ will not match the regular expression which implies that 7 will be considered as being prime :-)

When I showed this to Christina this morning (true), she told me that this only checked for a number being odd or not. This is also what I felt at the beginning. But it really works. For instance, let’s try to apply it to 9 (which is obviously not even), “1″ * 9 should match the regular expression…

“1″ * 9 = “111111111″. (11+?) matches “11″ initially. \1+$ cannot match because there are 7 remaining ones. Backtracking occurs. (11+?) now matches “111″. And here \1+$ matches the remaining 6 remaining ones! Hence, 9 is not prime.

Easy… and beautiful at the same time ;-)

Popularity: 27% [?]

  • Share/Bookmark
Add comments

written by avinash

58 Responses to “A regular expression to check for prime numbers”

  1. Ketwaroo D. Yaasir says:

    unfortunately, I am quite allergic to regular expressions… too many bloody flavors to make any sense in the end.

  2. avinash says:

    I fully understand. I was quite allergic to regexpr too. But little by little I have come to like and use them a lot.

    Let’s see, I use regexpr when working with vi, grep, sed, awk and, now, ruby…

  3. Ketwaroo D. Yaasir says:

    I’ve been trying to write a bbcode parser for wordpress µ to cure it of kses.php while keeping the entries of the bloggers “safe”. So I have to figure out how to use regex.

    So i guees i too will have to learn to like it.

    It’s like voodoo.

  4. avinash says:

    And I suppose you have realised that it is a small programming language… so it’s bound to be cool :-)

  5. curiousEngine says:

    Programming Quiz No. 2

    A serial file named contains a series of values, each of which is how many prime numbers ther are below a particular value. Write a program to read the number of primes between and values where and are requested interactively and compare it to the value obtained from FUNCT(limit) which is defined as limit / LOG(limit).

  6. avinash says:

    Hi,

    Seems that some of the text went missing… or you were seriously drunk :-)

  7. VagabondoDigitale says:

    Cool.. thx for your post. I like matematics quiz :P

  8. doh says:

    nice trick!

    I find that the “backtracking” part was equivalence to finding out if the number is prime by brute-forcing

    e.g. going from 3 to half the number and if any of that is divisible, its not a prime.

    thus the ^(11+?)\1+& trick (anyone got it?)

    i believe there are a lot of fast algorithms out there to use instead of generating 1s and parsing them with a regular expression which is kinda slow.

    however this is truly magic ^^

  9. ksivasenthil says:

    hey avinash, what is the exact input that goes to the regex constructor. I am not a ruby programmer but my attempt to use this regex in javascript failed. The test always failed for both prime and composite numbers. If you know javascript please help show me a sample to use this in javascript pls

  10. someone says:

    the trick is to apply the regex to a string with n 1’s, for example, for 5, apply the regex to 11111.

  11. Jay says:

    A PHP Version:
    !preg_match(‘/^1?$|^(11+?)\1+$/x’, str_repeat(‘1′, $n))

    Example:

    <?php

    $n = 1;
    while ($n < 1000000) {
    if (!preg_match(‘/^1?$|^(11+?)\1+$/x’, str_repeat(‘1′, $n))) {
    echo “$n: Prime\n”;
    }
    $n++;
    }
    ?>

  12. Skymt says:

    C# version:

    static void Main(string[] args)
    {
    int c;
    if (args.Length == 1 && int.TryParse(args[0], out c))
    Console.WriteLine(!Regex.IsMatch(new String(‘1′, c), @”^1?$|^(11+?)\1+$”));
    }

  13. Binks says:

    Java Version, nice tip btw, guess I should learn more about regular expressions. Java requires that the \ be \’d out, otherwise it sees it as a smiley face character and problems happen :P .

    public static void main(string[] args){
    if (args.length >= 1)
    {
    int c = Integer.parseInt(args[0]);
    String s = “”;
    while (c > 0) {c–; s+=”1″;}
    System.out.println(!Pattern.matches(“^1?$|^(11+?)\\1+$”, s));
    }}

  14. Venkatesh says:

    This is just awesome…

  15. Steven says:

    Hello,

  16. adamsky says:

    To be honest – regular expressions are “overused”… And are NOT that good. For example in PHP using just common str_replace(); would be in many cases much faster then preg_replace();

    Problem is that people tend to use a lot of “sophisticated” code where it’s not necessary, only because it looks more “professional”. My thought on this is “I don’t care what others think – I just need my code to be fast, clean and as simple as it is possible”.

  17. avinash says:

    True.

    Sometimes regular expressions are extremely cryptic (like the one above in fact ;-) ) but they are also interesting to understand because they are essentially small programs written in a domain specific language.

  18. Nick says:

    The magic is in the backtrack…. if it can find a number a=(11+?) that multiplied N times gives our number, it’s obvious that our number can’t be prime (our number = a*N, i.e. not prime)

  19. This must be shared: /^1?$|^(11+?)1+$/ to check for prime numbers at Philipp Meier’s weblog says:

    [...] More details on Avinash Meetoo’s Blog. [...]

  20. roscoe says:

    How many numbers does this count up to? whats the largest prime that can be found in *real* time?

    I’m interested so mail me please.

  21. avinash says:

    Hi Roscoe,

    I’ve not done any performance test but I guess that if you use a compiled language like C with a good Regex implementation, then, maybe, you can generate prime numbers real fast.

    On the other hand, if you really want to generate primes then have a look at The Sieve of Eratosthene.

  22. Paul says:

    Avinash. Super, I’ve been looking around for a Regex similar to this for a while. I’ve modified it to suit.

  23. avinash says:

    Hi Paul,

    Great to know that you find this regular expression useful. I’d like to point out that I’m not the original author though ;-)

  24. rasmus says:

    if primenumber mod 2 is 0 then
    isprime = true

    wouldnt that be easier and faster? ^^

  25. rasmus says:

    sorry, its getting late here heh

    if primenumber mod 2 != 0 then
    isprime = true

  26. avinash says:

    Hi Rasmus,

    A prime number is not simply odd. Check this.

  27. Celina says:

    I have never been able to understand regular expressions, what is the logic behind them?

  28. avinash says:

    Hi Celina,

    Regular expressions are not trivial. But they are very powerful. If you want to know more about them, this is a good start.

    A lot of people also recommend this book.

  29. Anonymous says:

    While it’s neat, it’s not a great idea to use for big numbers since it uses O(N^2) time and O(N) memory where even a trivial solution uses O(sqrt(N)) time and O(1) memory.

  30. avinash says:

    True :-)

    I wonder if someone has actually done some benchmarking with large numbers?

  31. Anonymous says:

    This regular expression does not use O(1) and O(N^2), its actually exponential in the size of the input. If it unpacks the number 7 (32 bit) to 7 1’s, thats 7*(32). So a 32 bit number 2147483648 would take up half a gig of memory.

  32. Kadhirvel says:

    Please follow the link below for fact about primes

    http://kadiprimality.blogspot.com/

  33. rul3z » Blog Archive » Calculating a prime number with a shell script says:

    [...] The script seems to be working, but is not very efficient. While writing this script I found a regular expression that does the same thing, only a couple of hundred times [...]

  34. Blogaholic » RegEx zur Primzahlsuche says:

    [...] Quelle: A regular expression to check for prime numbersTags: merken [...]

  35. nidhi says:

    Pls give me regex to match filename that doesn’t start with a number and have html extension.

  36. avinash says:

    Maybe something like “^[^0-9].*html$”

    Notice that the two ^ have very different meanings. The first one is an anchor which represents the beginning of the line. The second one is the “inverse” operator i.e. [^0-9] matches with any character excepts a digit.

  37. Skymt says:

    You could also try this: “^\D\w+.html”

    ((^)Start of line – (\D)Not a digit – (\w+)several letters – (.html)end with .html)

    Works with the .NET flavor of regex…

  38. links for 2009-01-10 | The Tech Guy says:

    [...] Avinash Meetoo: Blog » A regular expression to check for prime numbers (tags: regexp prime number) Dans la catégorie Del.icio.us « links for 2009-01-09 [...]

  39. Primzahlen mit RegEx bestimmen (the nerdy way) | Webmaster, Security und Technik Blog says:

    [...] habe ich bei Noulakaz einen sehr verwirrenden Weg gefunden eine Zahl auf eine Primzahl hin zu prüfen. Also ob zum [...]

  40. is it normal to be kicked out of irb? | keyongtech says:

    [...] is_prime?(1000000000) [FATAL] failed to allocate memory $ p.s.: is_prime? method taken from http://www.noulakaz.net/weblog/2007/…prime-numbers/ — Posted via [...]

  41. Prime Number Regex! » Code Musings and Such says:

    [...] an nice explanation of how it works. [...]

  42. rentis3 says:

    for calculations with big numbers, try F#

  43. CanoeLabs » Regexp says:

    [...] source. [...]

  44. erwin says:

    try this: 999999 on the above php regexp it says prime….
    it isn’t ….

  45. avinash says:

    So it seems that there is a bug lurking somewhere…

  46. cetin says:

    I don’t think this is formally a regular expression, because of “?” character. The processing of a real regular expression never requires backtracking (i.e. never requires reading the input multiple times) and hence they are very fast once you generate the deterministic finite automata (DFA) for the given regular expression.

    Programming languages usually provide extensions to regular expressions (like ? in this case) for convenience to the programmer. There are only 3 operators in formal regular languages, which are star (“*”), union (“|”) and concatenation. All other operators used in a regular expressions must be representable by using these 3, for example you can represent 0+ as 00* and so on. The operator “?” used in the above expression is something totally different and which in fact violates the rule that machines that process a regular expression can do so with finite amount of memory. Hence this expression is actually not regular and the language generated by it is not a regular language, mathematically.

  47. avinash says:

    Mathematically, you may be right (but I am not so sure.) But, from a programming point of view, regular expressions have always allowed backtracking…

  48. Fred says:

    php may reject large numbers silently due to recursion depth limit (which I think is 32767 levels), that’s why 999999 fails.

  49. avinash says:

    Thanks for pointing this out…

  50. Patrick Atambo says:

    WOW, pretty cool! I have to revisit regular expressions it seems. :)

  51. Using the regular expression backtracking engine for prime number magic | Barklund.org says:

    [...] just saw the most impressive, intelligent use of the regular expression backtracking engine and simply have to re-post it in order to give my own decryption of the infinitely simple [...]

  52. Walking Randomly » Finding prime numbers using voodoo regular expressions says:

    [...] Update (27th August 2009): Someone pointed me to an alternative explanation here. [...]

  53. akash says:

    a language such that a string is made of binary alphabet( of 0s and 1s) and the number of zeros in the string is a 3 digit prime number…….is this language regular…??
    can u pls help me out in this ……its a ques appeared in GATE entrance

  54. perler says:

    this was good shit. thanks!

  55. Pras Sarkar says:

    This is really intriguing, but I’m unclear about your explanation about how it works (it definitely works). For example, try the number 49 (not prime) – 1111111111111111111111111111111111111111111111111 – neither (11)+ matches nor (111)+ matches, so you would think the regex would classify 49 as prime (it does not). So how does it really work?

  56. Weekend miscellany — The Endeavour says:

    [...] A regular expression to check for prime numbers [...]

  57. Makovec says:

    I would rather see pattern like this: “^\d*[13579]$” in my scripts.

    It’s clear, easy and I think it’s also faster then the one described. If you want negative numbers it’s easy to change.

    Anyway it’s nice work and idea!

    David

  58. Makovec says:

    Oh, hell – my bloody English. Forgot my previous note :( Shit

Leave a Reply