What's the technical reason for "lookbehind assertion MUST be fixed length" in regex?

RegexPcreLookbehind

Regex Problem Overview


For example,the regex below will cause failure reporting lookbehind assertion is not fixed length:

#(?<!(?:(?:src)|(?:href))=["\']?)((?:https?|ftp)://[^\s\'"<>()]+)#S

Such kind of restriction doesn't exist for lookahead.

Regex Solutions


Solution 1 - Regex

Lookahead and lookbehind aren't nearly as similar as their names imply. The lookahead expression works exactly the same as it would if it were a standalone regex, except it's anchored at the current match position and it doesn't consume what it matches.

Lookbehind is a whole different story. Starting at the current match position, it steps backward through the text one character at a time, attempting to match its expression at each position. In cases where no match is possible, the lookbehind has to go all the way to the beginning of the text (one character at a time, remember) before it gives up. Compare that to the lookahead expression, which gets applied exactly once.

This is a gross oversimplification, of course, and not all flavors work that way, but you get the idea. The way lookbehinds are applied is fundamentally different from (and much, much less efficient than) the way lookaheads are applied. It only makes sense to put a limit on how far back the lookbehind has to look.

Solution 2 - Regex

First of all, this isn't true for all regular expression libraries (like .NET).

For PCRE, the reason appears to be:

> The implementation of lookbehind > assertions is, for each alternative, > to temporarily move the current > position back by the fixed width and > then try to match.

(at least, according to http://www.autoitscript.com/autoit3/pcrepattern.html).

Solution 3 - Regex

PCRE doesn't support floating lookbehind because it can cause major performance problems. This is because of the lack of right-to-left matching capability: PCRE can start a branch only from a fixed left, but left of a variable-length lookbehind can not be fixed.

Generally, try to branch your lookbehind part to fixed length patterns if possible. For example instead of:

(?<=(src|href)=")etc.

(1) use this:

(?:(?<=src=")|(?<=href="))etc.

(2) Or with \K:

(src|href)="\Ketc.

Note that \K is not a real lookbehind, because it always starts search at the end of previous match (no potential backstep into the previous match).

(3) In some complex lookbehind-only cases you can search with an "inverted" lookahead expression in a reversed string. Not too elegant but it works:

.cte(?="=(ferh|crs))

Solution 4 - Regex

I had the same issue and fixed it by using (?: subexpression)

> Defines a noncapturing group. such as Write(?:Line)? "WriteLine" in > "Console.WriteLine()" "Write" in "Console.Write(value)"

I had to change the Regex below which is suppose to catch before , or something in the start of string which was giving me lookbehind assertion is not fixed length.

(?<=,|^)

with this,

(?:(?<=,)|^)

Solution 5 - Regex

grep -P '(?<=((three)|(one)) )two' <<< "one two three three two one"
grep: lookbehind assertion is not fixed length

grep -P '((?<=(three) )|(?<=(one) ))two' <<< "one two three three two one"
one two three three two one

For processing efficiency PCRE does not support right-to-left matching or recursion. When doing a lookbehind PCRE searches the end of any previous matching string - implementing variable size matches would require recursion and reduce efficiency. See: Look Behind Assertions

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionwampView Question on Stackoverflow
Solution 1 - RegexAlan MooreView Answer on Stackoverflow
Solution 2 - RegexmbeckishView Answer on Stackoverflow
Solution 3 - RegexDávid HorváthView Answer on Stackoverflow
Solution 4 - RegexMehradView Answer on Stackoverflow
Solution 5 - RegexAdam D.View Answer on Stackoverflow