Regex to match URL end-of-line or "/" character

Regex Problem Overview

I have a URL, and I'm trying to match it to a regular expression to pull out some groups. The problem I'm having is that the URL can either end or continue with a "/" and more URL text. I'd like to match URLs like this:

But not match something like this:

http://server/xyz/2008-10-08-4-1

So, I thought my best bet was something like this:

/(.+)/(\d{4}-\d{2}-\d{2})-(\d+)[/$]

where the character class at the end contained either the "/" or the end-of-line. The character class doesn't seem to be happy with the "$" in there though. How can I best discriminate between these URLs while still pulling back the correct groups?

Regex Solutions

Solution 1 - Regex

To match either / or end of content, use (/|\z)

This only applies if you are not using multi-line matching (i.e. you're matching a single URL, not a newline-delimited list of URLs).

To put that with an updated version of what you had:

/(\S+?)/(\d{4}-\d{2}-\d{2})-(\d+)(/|\z)

Note that I've changed the start to be a non-greedy match for non-whitespace ( \S+? ) rather than matching anything and everything ( .* )

Solution 2 - Regex

You've got a couple regexes now which will do what you want, so that's adequately covered.

What hasn't been mentioned is why your attempt won't work: Inside a character class, $ (as well as ^, ., and /) has no special meaning, so [/$] matches either a literal / or a literal $ rather than terminating the regex (/) or matching end-of-line ($).

Solution 3 - Regex

/(.+)/(\d{4}-\d{2}-\d{2})-(\d+)(/.*)?$

1st Capturing Group (.+)

.+ matches any character (except for line terminators)

+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)

2nd Capturing Group (\d{4}-\d{2}-\d{2})

\d{4} matches a digit (equal to [0-9])

{4} Quantifier — Matches exactly 4 times

- matches the character - literally (case sensitive)

\d{2} matches a digit (equal to [0-9])

{2} Quantifier — Matches exactly 2 times

- matches the character - literally (case sensitive)

\d{2} matches a digit (equal to [0-9])

{2} Quantifier — Matches exactly 2 times

- matches the character - literally (case sensitive)

3rd Capturing Group (\d+)

\d+ matches a digit (equal to [0-9])

+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)

4th Capturing Group (.*)?

? Quantifier — Matches between zero and one times, as many times as possible, giving back as needed (greedy)

.* matches any character (except for line terminators)

* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)

$ asserts position at the end of the string

Solution 4 - Regex

In Ruby and Bash, you can use $ inside parentheses.

/(\S+?)/(\d{4}-\d{2}-\d{2})-(\d+)(/|$)

(This solution is similar to Pete Boughton's, but preserves the usage of $, which means end of line, rather than using \z, which means end of string.)

Content Type	Original Author	Original Content on Stackoverflow
Question	Chris Farmer	View Question on Stackoverflow
Solution 1 - Regex	Peter Boughton	View Answer on Stackoverflow
Solution 2 - Regex	Dave Sherohman	View Answer on Stackoverflow
Solution 3 - Regex	Adam Tegen	View Answer on Stackoverflow
Solution 4 - Regex	Sparhawk	View Answer on Stackoverflow

Regex to match URL end-of-line or "/" character

Regex Problem Overview

Regex Solutions

Solution 1 - Regex

Solution 2 - Regex

Solution 3 - Regex

Solution 4 - Regex

What is obj folder generated for?

How do I get to the menu in Emacs in console mode?

Attributions