Stack Builders News

A collection of thoughts and notes by our team

Justin Leitgeb

Safe Multi-Line Regular Expressions in Ruby

If you’re programming in Ruby and didn’t know that it has regular expressions that are multi-line by default, chances are you’ve written unsafe code.

Let’s jump into some examples. Does this match?

"foo\nbar\nbaz" =~ /^bar$/

Most programmers who have experience in regular expression engines in other languages would say that it doesn’t match since the entire input string does not start with the beginning and ending anchors (“^” and “$”). However the result that we get back demonstrates that it matches at position 4 – the starting position of the matching phrase in the middle of the string:

1.9.3p374 :002 > "foo\nbar\nbaz" =~ /^bar$/
 => 4

The reason that it matches is because the anchors ^ and $ are end of line anchors rather than end of string anchors. Ruby’s regular expressions are multi-line by default (you have to turn this on specifically in languages like Perl), which means that any one of the lines in the input that match the given rule will return a positive match.

This probably isn’t what you intended. In fact, in many cases it could have allowed unsafe input to enter the system. This is why static analysis tools like Brakeman give warnings when you use “^” and “$”

The fix for this is simple. Use the start of string and end of string anchors instead of the line-based ones:

1.9.3p374 :004 > "foo\nbar\nbaz" =~ /\Abar\z/
=> nil

Problem solved.

Do You Have What it Takes To Be a Stack Builder?