Regex: I want this AND that AND that... in any order

C#Regex

C# Problem Overview


I'm not even sure if this is possible or not, but here's what I'd like.

String: "NS306 FEBRUARY 20078/9/201013B1-9-1Low31 AUGUST 19870"

I have a text box where I type in the search parameters and they are space delimited. Because of this, I want to return a match is string1 is in the string and then string2 is in the string, OR string2 is in the string and then string1 is in the string. I don't care what order the strings are in, but they ALL (will somethings me more than 2) have to be in the string.

So for instance, in the provided string I would want:

"FEB Low"

or

"Low FEB"

...to return as a match.

I'm REALLY new to regex, only read some tutorials on http://perldoc.perl.org/perlretut.html">here</a> but that was a while ago and I need to get this done today. Monday I start a new project which is much more important and can't be distracted with this issue. Is there anyway to do this with regular expressions, or do I have to iterate through each part of the search filter and permutate the order? Any and all help is extremely appreciated. Thanks.

UPDATE: The reason I don't want to iterate through a loop and am looking for the best performance wise is because unfortunately, the dataTable I'm using calls this function on every key press, and I don't want it to bog down.

UPDATE: Thank you everyone for your help, it was much appreciated.

CODE UPDATE:

Ultimately, this is what I went with.

string sSearch = nvc["sSearch"].ToString().Replace(" ", ")(?=.*");
if (sSearch != null && sSearch != "")
{
  Regex r = new Regex("^(?=.*" + sSearch + ").*$", RegexOptions.IgnoreCase);
  _AdminList = _AdminList.Where<IPB>(
                                       delegate(IPB ipb)
                                       {
                                          //Concatenated all elements of IPB into a string
                                          bool returnValue = r.IsMatch(strTest); //strTest is the concatenated string
                                          return returnValue;
                                    }).ToList<IPB>();
                                       }
}

The IPB class has X number of elements and in no one table throughout the site I'm working on are the columns in the same order. Therefore, I needed to any order search and I didn't want to have to write a lot of code to do it. There were other good ideas in here, but I know my boss really likes Regex (preaches them) and therefore I thought it'd be best if I went with that for now. If for whatever reason the site's performance slips (intranet site) then I'll try another way. Thanks everyone.

C# Solutions


Solution 1 - C#

You can use (?=…) positive lookahead; it asserts that a given pattern can be matched. You'd anchor at the beginning of the string, and one by one, in any order, look for a match of each of your patterns.

It'll look something like this:

^(?=.*one)(?=.*two)(?=.*three).*$

This will match a string that contains "one", "two", "three", in any order (as seen on rubular.com).

Depending on the context, you may want to anchor on \A and \Z, and use single-line mode so the dot matches everything.

This is not the most efficient solution to the problem. The best solution would be to parse out the words in your input and putting it into an efficient set representation, etc.


More practical example: password validation

Let's say that we want our password to:

  • Contain between 8 and 15 characters
  • Must contain an uppercase letter
  • Must contain a lowercase letter
  • Must contain a digit
  • Must contain one of special symbols

Then we can write a regex like this:

^(?=.{8,15}$)(?=.*[A-Z])(?=.*[a-z])(?=.*[0-9])(?=.*[!@#$%^&*]).*$
 \__________/\_________/\_________/\_________/\______________/
    length      upper      lower      digit        symbol

Solution 2 - C#

Why not just do a simple check for the text since order doesn't matter?

string test = "NS306 FEBRUARY 20078/9/201013B1-9-1Low31 AUGUST 19870";
test = test.ToUpper();
bool match = ((test.IndexOf("FEB") >= 0) && (test.IndexOf("LOW") >= 0));

Do you need it to use regex?

Solution 3 - C#

I think the most expedient thing for today will be to string.Split(' ') the search terms and then iterate over the results confirming that sourceString.Contains(searchTerm)

var source = @"NS306 FEBRUARY 20078/9/201013B1-9-1Low31 AUGUST 19870".ToLowerInvariant();
var search = "FEB Low";

var terms = search.Split(' ');

bool all_match = !terms.Any(term => !(source.Contains(term.ToLowerInvariant())));

Notice that we use Any() to set up a short-circuit, so if the first term fails to match, we skip checking the second, third, and so forth.


This is not a great use case for RegEx. The string manipulation necessary to take an arbitrary number of search strings and convert that into a pattern almost certainly negates the performance benefit of matching the pattern with the RegEx engine, though this may vary depending on what you're matching against.

You've indicated in some comments that you want to avoid a loop, but RegEx is not a one-pass solution. It is not hard to create horrifically non-performant searches that loop and step character by character, such as the infamous catastrophic backtracking, where a very simple match takes thousands of steps to return false.

Solution 4 - C#

var text = @"NS306Low FEBRUARY 2FEB0078/9/201013B1-9-1Low31 AUGUST 19870";   
var matches = Regex.Matches(text, @"(FEB)|(Low)");
foreach (Match match in matches) {
    Console.WriteLine(match.Value);
}

output:

Low
FEB
FEB
Low

should get you started

Solution 5 - C#

The answer by @polygenelubricants is both complete and perfect but I had a case where I wanted to match a date and something else e.g. a 10-digit number so the lookahead does not match and I cannot do it with just lookaheads so I used named groups:

(?:.*(?P<1>[0-9]{10}).*(?P<2>2[0-9]{3}-(?:0?[0-9]|1[0-2])-(?:[0-2]?[0-9]|3[0-1])).*)+

and this way the number is always group 1 and the date is always group 2. Of course it has a few flaws but it was very useful for me and I just thought I should share it! ( take a look https://www.debuggex.com/r/YULCcpn8XtysHfmE )

Solution 6 - C#

You don't have to test each permutation, just split your search into multiple parts "FEB" and "Low" and make sure each part matches. That will be far easier than trying to come up with a regex which matches the whole thing in one go (which I'm sure is theoretically possible, but probably not practical in reality).

Solution 7 - C#

Use string.Split(). It will return an array of subtrings thata re delimited by a specified string/char. The code will look something like this.

int maximumSize = 100;
string myString = "NS306 FEBRUARY 20078/9/201013B1-9-1Low31 AUGUST 19870";
string[] individualString = myString.Split(' ', maximumSize);

For more information http://msdn.microsoft.com/en-us/library/system.string.split.aspx

Edit: If you really wanted to use Regular Expressions this pattern will work. [^ ]* And you will just use Regex.Matches(); The code will be something like this:

string myString = "NS306 FEBRUARY 20078/9/201013B1-9-1Low31 AUGUST 19870";
string pattern = "[^ ]*"; Regex rgx = new Regex(pattern);
foreach(Match match in reg.Matches(s))
{
//do stuff with match.value
}

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionXstreamINsanityView Question on Stackoverflow
Solution 1 - C#polygenelubricantsView Answer on Stackoverflow
Solution 2 - C#KelseyView Answer on Stackoverflow
Solution 3 - C#JayView Answer on Stackoverflow
Solution 4 - C#RustyView Answer on Stackoverflow
Solution 5 - C#PanaisView Answer on Stackoverflow
Solution 6 - C#Bryce WagnerView Answer on Stackoverflow
Solution 7 - C#RefluxView Answer on Stackoverflow