How to match hyphens with Regular Expression?

C#Regex

C# Problem Overview


How to rewrite the [a-zA-Z0-9!$* \t\r\n] pattern to match hyphen along with the existing characters ?

C# Solutions


Solution 1 - C#

The hyphen is usually a normal character in regular expressions. Only if it’s in a character class and between two other characters does it take a special meaning.

Thus:

  • [-] matches a hyphen.
  • [abc-] matches a, b, c or a hyphen.
  • [-abc] matches a, b, c or a hyphen.
  • [ab-d] matches a, b, c or d (only here the hyphen denotes a character range).

Solution 2 - C#

Escape the hyphen.

[a-zA-Z0-9!$* \t\r\n\-]

UPDATE:
Never mind this answer - you can add the hyphen to the group but you don't have to escape it. See Konrad Rudolph's answer instead which does a much better job of answering and explains why.

Solution 3 - C#

It’s less confusing to always use an escaped hyphen, so that it doesn't have to be positionally dependent. That’s a \- inside the bracketed character class.

But there’s something else to consider. Some of those enumerated characters should possibly be written differently. In some circumstances, they definitely should.

This comparison of regex flavors says that C♯ can use some of the simpler Unicode properties. If you’re dealing with Unicode, you should probably use the general category \p{L} for all possible letters, and maybe \p{Nd} for decimal numbers. Also, if you want to accomodate all that dash punctuation, not just HYPHEN-MINUS, you should use the \p{Pd} property. You might also want to write that sequence of whitespace characters simply as \s, assuming that’s not too general for you.

All together, that works out to apattern of [\p{L}\p{Nd}\p{Pd}!$*] to match any one character from that set.

I’d likely use that anyway, even if I didn’t plan on dealing with the full Unicode set, because it’s a good habit to get into, and because these things often grow beyond their original parameters. Now when you lift it to use in other code, it will still work correctly. If you hard‐code all the characters, it won’t.

Solution 4 - C#

[-a-z0-9]+,[a-z0-9-]+,[a-z-0-9]+ and also [a-z-0-9]+ all are same.The hyphen between two ranges considered as a symbol.And also [a-z0-9-+()]+ this regex allow hyphen.

Solution 5 - C#

Is this what you are after?

MatchCollection matches = Regex.Matches(mystring, "-");

Solution 6 - C#

use "\p{Pd}" without quotes to match any type of hyphen. The '-' character is just one type of hyphen which also happens to be a special character in Regex.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionThomas AndersonView Question on Stackoverflow
Solution 1 - C#Konrad RudolphView Answer on Stackoverflow
Solution 2 - C#Neil BarnwellView Answer on Stackoverflow
Solution 3 - C#tchristView Answer on Stackoverflow
Solution 4 - C#ParimalaView Answer on Stackoverflow
Solution 5 - C#AliostadView Answer on Stackoverflow
Solution 6 - C#Radu SimionescuView Answer on Stackoverflow