Sed expression doesn't allow optional grouped string

RegexSed

Regex Problem Overview


I'm trying to use the following regex in a sed script but it doesn't work:

sed -n '/\(www\.\)\?teste/p'

The regex above doesn't seem to work. sed doesn't seem to apply the ? to the grouped www\..

It works if you use the -E parameter that switches sed to use the Extended Regex, so the syntax becomes:

sed -En '/(www\.)?teste/p'

This works fine but I want to run this script on a machine that doesn't support the -E operator. I'm pretty sure that this is possible and I'm doing something very stupid.

Regex Solutions


Solution 1 - Regex

Standard sed only understands POSIX Basic Regular Expressions (BRE), not Extended Regular Expressions (ERE), and the ? is a metacharacter in EREs, but not in BREs.

Your version of sed might support EREs if you turn them on. With GNU sed, the relevant options are -r and --regexp-extended, described as "use extended regular expressions in the script".

However, if your sed does not support it - quite plausible - then you are stuck. Either import a version of sed that does support them, or redesign your processing. Maybe you should use awk instead.


2014-02-21

I don't know why I didn't mention that even though sed does not support the shorthand ? or \? notation, it does support counted ranges with \{n,m\}, so you can simulate ? with \{0,1\}:

sed -n '/\(www\.\)\{0,1\}teste/p' << EOF
http://www.tested.com/
http://tested.com/
http://www.teased.com/
EOF

which produces:

http://www.tested.com/
http://tested.com/

Tested on Mac OS X 10.9.1 Mavericks with the standard BSD sed and with GNU sed 4.2.2.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionEduardoView Question on Stackoverflow
Solution 1 - RegexJonathan LefflerView Answer on Stackoverflow