Regular expression to match string starting with a specific word

Regex

Regex Problem Overview


How do I create a regular expression to match a word at the beginning of a string?

We are looking to match stop at the beginning of a string and anything can follow it.

For example, the expression should match:

stop
stop random
stopping

Regex Solutions


Solution 1 - Regex

If you wish to match only lines beginning with stop, use

^stop

If you wish to match lines beginning with the word stop followed by a space:

^stop\s

Or, if you wish to match lines beginning with the word stop, but followed by either a space or any other non-word character you can use (your regex flavor permitting)

^stop\W

On the other hand, what follows matches a word at the beginning of a string on most regex flavors (in these flavors \w matches the opposite of \W)

^\w

If your flavor does not have the \w shortcut, you can use

^[a-zA-Z0-9]+

Be wary that this second idiom will only match letters and numbers, no symbol whatsoever.

Check your regex flavor manual to know what shortcuts are allowed and what exactly do they match (and how do they deal with Unicode).

Solution 2 - Regex

Try this:

/^stop.*$/

Explanation:

  • / charachters delimit the regular expression (i.e. they are not part of the Regex per se)
  • ^ means match at the beginning of the line
  • . followed by * means match any character (.), any number of times (*)
  • $ means to the end of the line

If you would like to enforce that stop be followed by a whitespace, you could modify the RegEx like so:

/^stop\s+.*$/
  • \s means any whitespace character
  • + following the \s means there has to be at least one whitespace character following after the stop word

Note: Also keep in mind that the RegEx above requires that the stop word be followed by a space! So it wouldn't match a line that only contains: stop

Solution 3 - Regex

If you want to match anything after a word, stop, and not only at the start of the line, you may use: \bstop.*\b - word followed by a line.

Word till the end of string

Or if you want to match the word in the string, use \bstop[a-zA-Z]* - only the words starting with stop.

Only the words starting with stop

Or the start of lines with stop - ^stop[a-zA-Z]* for the word only - first word only. The whole line ^stop.* - first line of the string only.

And if you want to match every string starting with stop, including newlines, use: /^stop.*/s - multiline string starting with stop.

Solution 4 - Regex

Like @SharadHolani said. This won't match every word beginning with "stop"

. Only if it's at the beginning of a line like "stop going". @Waxo gave the right answer:

This one is slightly better, if you want to match any word beginning with "stop" and containing nothing but letters from A to Z.

\bstop[a-zA-Z]*\b

This would match all

> stop (1) > > stop random (2) > > stopping (3) > > want to stop (4) > > please stop (5)

But

/^stop[a-zA-Z]*/

would only match (1) until (3), but not (4) & (5)

Solution 5 - Regex

/stop([a-zA-Z])+/

Will match any stop word (stop, stopped, stopping, etc)

However, if you just want to match "stop" at the start of a string

/^stop/

will do :D

Solution 6 - Regex

If you want to match anything that starts with "stop" including "stop going", "stop" and "stopping" use:

^stop

If you want to match the word stop followed by anything as in "stop going", "stop this", but not "stopped" and not "stopping" use:

^stop\W

Solution 7 - Regex

If you want the word to start with "stop", you can use the following pattern. "^stop.*"

This will match words starting with stop followed by anything.

Solution 8 - Regex

/^stop*$/i

i - in case it is case sensitive.

Solution 9 - Regex

I'd advise against a simple regular expression approach to this problem. There are too many words that are substrings of other unrelated words, and you'll probably drive yourself crazy trying to overadapt the simpler solutions already provided.

You'll want at least a naive stemming algorithm (try the Porter stemmer; there's available, free code in most languages) to process text first. Keep this processed text and the preprocessed text in two separate space-split arrays. Make sure each non-alphabetical character also gets its own index in this array. Whatever list of words you're filtering, stem them also.

The next step would be to find the array indices which match to your list of stemmed 'stop' words. Remove those from the unprocessed array, and then rejoin on spaces.

This is only slightly more complicated, but will be much more reliable an approach. If you've got any doubts on the value of a more NLP-oriented approach, you might want to do some research into http://thedailywtf.com/Articles/The-Clbuttic-Mistake-.aspx">clbuttic mistakes.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionSeanView Question on Stackoverflow
Solution 1 - RegexVinko VrsalovicView Answer on Stackoverflow
Solution 2 - RegexMike DinescuView Answer on Stackoverflow
Solution 3 - RegexWaxoView Answer on Stackoverflow
Solution 4 - RegexSedat KilincView Answer on Stackoverflow
Solution 5 - RegexMezView Answer on Stackoverflow
Solution 6 - RegexAlex BView Answer on Stackoverflow
Solution 7 - RegexManisha ChaurasiaView Answer on Stackoverflow
Solution 8 - RegexCharles KasasiraView Answer on Stackoverflow
Solution 9 - RegexRobert ElwellView Answer on Stackoverflow