this post was submitted on 31 Oct 2024
54 points (96.6% liked)

Programmer Humor

32338 readers
648 users here now

Post funny things about programming here! (Or just rant about your favourite programming language.)

Rules:

founded 5 years ago
MODERATORS
 

^.?$|^(..+?)\1+$

Matches strings of any character repeated a non-prime number of times

https://www.youtube.com/watch?v=5vbk0TwkokM

top 11 comments
sorted by: hot top controversial new old
[–] fubo@lemmy.world 10 points 1 hour ago* (last edited 18 minutes ago) (1 children)

The answer given in the spoiler tag is not quite correct!

Test caseAccording to the spoiler, this shouldn't match "abab", but it does.

Corrected regexThis will match what the spoiler says: ^.?$|^((.)\2+?)\1+$

Full workupAny Perl-compatible regex can be parsed into a syntax tree using the Common Lisp package CL-PPCRE. So if you already know Common Lisp, you don't need to learn regex syntax too!

So let's put the original regex into CL-PPCRE's parser. (Note, we have to add a backslash to escape the backslash in the string.) The parser will turn the regex notation into a nice pretty S-expression.

> (cl-ppcre:parse-string "^.?$|^(..+?)\\1+$")
(:ALTERNATION
 (:SEQUENCE :START-ANCHOR (:GREEDY-REPETITION 0 1 :EVERYTHING) :END-ANCHOR)
 (:SEQUENCE :START-ANCHOR
  (:REGISTER
   (:SEQUENCE :EVERYTHING (:NON-GREEDY-REPETITION 1 NIL :EVERYTHING)))
  (:GREEDY-REPETITION 1 NIL (:BACK-REFERENCE 1)) :END-ANCHOR))

At which point we can tell it's tricky because there's a capturing register using a non-greedy repetition. (That's the \1 and the +? in the original.)

The top level is an alternation (the | in the original) and the first branch is pretty simple: it's just zero or one of any character.

The second branch is the fun one. It's looking for two or more repetitions of the captured group, which is itself two or more characters. So, for instance, "aaaa", or "ababab", or "abbabba", but not "aaaaa" or "abba".

So strings that this matches will be of non-prime length: zero, one, or a multiple of two numbers 2 or greater.

But it is not true that it matches only "any character repeated a non-prime number of times" because it also matches composite-length sequences formed by repeating a string of different characters, like "abcabc".

If we actually want what the spoiler says — only non-prime repetitions of a single character — then we need to use a second capturing register inside the first. This gives us:

^.?$|^((.)\2+?)\1+$.

Specifically, this replaces (..+?) with ((.)\2+?). The \2 matches the character captured by (.), so the whole regex now needs to see the same character throughout.

[–] ikidd@lemmy.world 2 points 11 minutes ago

I upvoted this because I hate it.

[–] sbv@sh.itjust.works 24 points 3 hours ago
[–] RegalPotoo@lemmy.world 13 points 3 hours ago (1 children)
[–] RegalPotoo@lemmy.world 6 points 2 hours ago* (last edited 2 hours ago) (1 children)

Something like

!"A line with exactly 0 or 1 characters, or a line with a sequence of 1 or 3 or more characters, repeated at least twice"!<

[–] NateNate60@lemmy.world 2 points 52 minutes ago (1 children)

It's a line with a sequence of two or more characters repeated at least twice.

[–] MummifiedClient5000@feddit.dk 2 points 36 minutes ago

Only the part after the pipe character. The pipe character works as an "or" operator. RegalPotoo is right.

[–] muntedcrocodile@lemm.ee 8 points 3 hours ago

Just waiting for the oppertunity to hide this in prod.

[–] MummifiedClient5000@feddit.dk -1 points 38 minutes ago

Hot take: You're shit at coding if you can't do regex.

[–] NigelFrobisher@aussie.zone 2 points 2 hours ago

It matches “yo momma”.

[–] RiQuY@lemm.ee 2 points 3 hours ago

Looks like APL to me.