OR condition in Regex

Regex

Regex Problem Overview


Let's say I have

1 ABC Street
1 A ABC Street

With \d, it matches 1 (what I expect), with \d \w, it matches 1 A (expected). When I combine the patterns together \d|\d \w, it matches only the first one but ignores the second one.

My question is how to use "or" condition correctly in this particular case?

PS: The condition is wrapping the number only when there is no single letter after that, otherwise wrap the number and the single letter.

Example: 1 ABC Street match number 1 only, but when 1 A ABC Street wrap the 1 A

Regex Solutions


Solution 1 - Regex

Try

\d \w |\d

or add a positive lookahead if you don't want to include the trailing space in the match

\d \w(?= )|\d

When you have two alternatives where one is an extension of the other, put the longer one first, otherwise it will have no opportunity to be matched.

Solution 2 - Regex

A classic "or" would be |. For example, ab|de would match either side of the expression.

However, for something like your case you might want to use the ? quantifier, which will match the previous expression exactly 0 or 1 times (1 times preferred; i.e. it's a "greedy" match). Another (probably more relyable) alternative would be using a custom character group:

\d+\s+[A-Z\s]+\s+[A-Z][A-Za-z]+

This pattern will match:

  • \d+: One or more numbers.
  • \s+: One or more whitespaces.
  • [A-Z\s]+: One or more uppercase characters or space characters
  • \s+: One or more whitespaces.
  • [A-Z][A-Za-z\s]+: An uppercase character followed by at least one more character (uppercase or lowercase) or whitespaces.

If you'd like a more static check, e.g. indeed only match ABC and A ABC, then you can combine a (non-matching) group and define the alternatives inside (to limit the scope):

\d (?:ABC|A ABC) Street

Or another alternative using a quantifier:

\d (?:A )?ABC Street

Solution 3 - Regex

I think what you need might be simply:

\d( \w)?

Note that your regex would have worked too if it was written as \d \w|\d instead of \d|\d \w.

This is because in your case, once the regex matches the first option, \d, it ceases to search for a new match, so to speak.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionHoan DangView Question on Stackoverflow
Solution 1 - RegexMikeMView Answer on Stackoverflow
Solution 2 - RegexMarioView Answer on Stackoverflow
Solution 3 - RegexRoney MichaelView Answer on Stackoverflow