URI.escape is obsolete. Percent-encoding your query string
Have you encountered one of those warnings in your Ruby 2.7.0 project?
Find out how to fix it!
A little bit of history
Ruby 2.7.0 shows a warning when invoking URI.escape
or its alias, URI.encode
. It might look like a fresh deprecation, but the fact is, these methods have been marked as obsolete for… over 10 years now! If you’re wondering how come you’ve never encountered the warning before, here’s the answer: previously it was displayed only if you run your script in a verbose mode, and this has changed just recently.
So why is URI.escape
obsolete, anyway?
The trouble with a concept of “escaping the URI” is that URI consists of many components (like path
or query
), and we don’t want to escape them in the same way. For example, it’s fine for a #
character to appear at the end of the URI (when it’s used as what’s usually called an anchor, or in URI parlance - a fragment
component) - but when the same #
is part of user’s input (like in a search query), we want to encode it to ensure correct interpretation. Similarly, if the query string value contains other reserved characters, like =
or &
, we do want to escape them so that they are not incorrectly interpreted as delimiters, as if they were used as reserved characters.
URI.escape
relies on a simple gsub
operation for the whole string and doesn’t differentiate between distinct components, which doesn’t take into account intricacies like those mentioned above.
How to fix it?
Since I haven’t found an existing solution that would, when given a whole-URI string, interpret distinct components and apply different escaping rules in a desired way on its own, my advice is to encode different components separately. The most common (and most sensitive) use-case is probably encoding of the query string in the query
component, so I’ll focus on this case. And Ruby’s URI
module itself provides two handy methods that will help us achieve just that!
Percent-encoding your query string
URI.encode_www_form_component(string, enc=nil)
This method will percent-encode every reserved character, leaving only *, -, ., 0-9, A-Z, _, a-z
intact.
It also substitues space with +
. Examples:
As you can see, we can now safely build our query strings with values escaped this way. But if this seems like too much of manual work, there is a handy way to handle the whole query for you. Introducing…
URI.encode_www_form(enum, enc=nil)
Slightly different interface. It takes an Enumerable
(usually a nested Array
or a Hash
) and builds the query accordingly. It uses .encode_www_form_component
internally, so the encoding rules stay the same. The difference lies in the way you use it. Examples:
Quite straightforward, isn’t it?
Rails project?
Hash#to_query
(also aliased as Hash.to_param
)
Rails extends the Hash
class with this handy method. It will return a query string in a correct format, with its values correctly encoded where needed.
(Notice how it also sorts the keys alphabetically.)
You can also pass an optional namespace to this method, which would enclose the key names in the returned string, resulting in a format like search[q]="how to speed up your CI?"
(but obviously percent-encoded):
Summing up
I hope that short article cleared up any confusion you might have had about encoding your strings for use in URI. Please let us know in the comments if you use any other ways of dealing with percent-encoding in your Ruby/Rails projects!
And if your project is struggling with slow CI builds, check out Knapsack Pro, which can help you solve that problem.