» Perl » Programming » Don’t Forget about URI::Heuristic

Don’t Forget about URI::Heuristic

Imagine you’ve got some user input that is supposed to be a valid URL, but it’s user input, so you can’t be sure of anything. It’s not very consistent data, so you at least make sure to prepend a default scheme to it. It’s a fairly common case. Sometimes I see it solved this way:

my $url = 'example.com';
$url = 'http://' . $url unless $url =~ m{http://}i;

This converts example.com to http://example.com, but it can be error prone. For instance, what if I forgot to make the regex case insensitive? Actually, I’ve already made a mistake. Did you spot it? In my haste I’ve neglected to deal with https URLs. Not good. URI::Heuristic can help here.

use URI::Heuristic qw(uf_uristr);
my $url = 'example.com';
$url    = uf_uristr( $url );

This does exactly the same thing as the example above, but I’ve left the logic of checking for an existing scheme to the URI::Heuristic module. If you like this approach, but you’d rather get a URI object back then try this:

use URI::Heuristic qw(uf_uri);
my $url = 'example.com';
$url    = uf_uristr( $url );
say $url->as_string;

Caveats

use URI::Heuristic qw(uf_uri);
my $url = uf_uri('/etc/passwd');      # file:/etc/passwd

Are we sure this is what we want? Checking the scheme is helpful and even if we weren’t using this module, we’d probably want to do this anyway.

use List::AllUtils qw( any );
use URI::Heuristic qw(uf_uri);

my $url = uf_uri('/etc/passwd');
unless ( $url->scheme && any { $url->scheme eq $_ } ('http', 'https') ) {
    die 'unsupported scheme: ' . $url->scheme;
}

That’s it! This module has been around for almost 18 years now, but it still solves some of today’s problems.