Regex to match URL end-of-line or "/" character

Regex

Regex Problem Overview


I have a URL, and I'm trying to match it to a regular expression to pull out some groups. The problem I'm having is that the URL can either end or continue with a "/" and more URL text. I'd like to match URLs like this:

But not match something like this:

So, I thought my best bet was something like this:

/(.+)/(\d{4}-\d{2}-\d{2})-(\d+)[/$]

where the character class at the end contained either the "/" or the end-of-line. The character class doesn't seem to be happy with the "$" in there though. How can I best discriminate between these URLs while still pulling back the correct groups?

Regex Solutions


Solution 1 - Regex

To match either / or end of content, use (/|\z)

This only applies if you are not using multi-line matching (i.e. you're matching a single URL, not a newline-delimited list of URLs).


To put that with an updated version of what you had:

/(\S+?)/(\d{4}-\d{2}-\d{2})-(\d+)(/|\z)

Note that I've changed the start to be a non-greedy match for non-whitespace ( \S+? ) rather than matching anything and everything ( .* )

Solution 2 - Regex

You've got a couple regexes now which will do what you want, so that's adequately covered.

What hasn't been mentioned is why your attempt won't work: Inside a character class, $ (as well as ^, ., and /) has no special meaning, so [/$] matches either a literal / or a literal $ rather than terminating the regex (/) or matching end-of-line ($).

Solution 3 - Regex

/(.+)/(\d{4}-\d{2}-\d{2})-(\d+)(/.*)?$

1st Capturing Group (.+)

.+ matches any character (except for line terminators)

  • + Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)

2nd Capturing Group (\d{4}-\d{2}-\d{2})

\d{4} matches a digit (equal to [0-9])

  • {4} Quantifier — Matches exactly 4 times

- matches the character - literally (case sensitive)

\d{2} matches a digit (equal to [0-9])

  • {2} Quantifier — Matches exactly 2 times

- matches the character - literally (case sensitive)

\d{2} matches a digit (equal to [0-9])

  • {2} Quantifier — Matches exactly 2 times

- matches the character - literally (case sensitive)

3rd Capturing Group (\d+)

\d+ matches a digit (equal to [0-9])

  • + Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)

4th Capturing Group (.*)?

? Quantifier — Matches between zero and one times, as many times as possible, giving back as needed (greedy)

.* matches any character (except for line terminators)

  • * Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)

$ asserts position at the end of the string

Solution 4 - Regex

In Ruby and Bash, you can use $ inside parentheses.

/(\S+?)/(\d{4}-\d{2}-\d{2})-(\d+)(/|$)

(This solution is similar to Pete Boughton's, but preserves the usage of $, which means end of line, rather than using \z, which means end of string.)

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionChris FarmerView Question on Stackoverflow
Solution 1 - RegexPeter BoughtonView Answer on Stackoverflow
Solution 2 - RegexDave SherohmanView Answer on Stackoverflow
Solution 3 - RegexAdam TegenView Answer on Stackoverflow
Solution 4 - RegexSparhawkView Answer on Stackoverflow