Don’t Forget about URI::Heuristic

Imagine you’ve got some user input that is supposed to be a valid URL, but it’s user input, so you can’t be sure of anything. It’s not very consistent data, so you at least make sure to prepend a default scheme to it. It’s a fairly common case. Sometimes I see it solved this way:

This converts example.com to http://example.com, but it can be error prone. For instance, what if I forgot to make the regex case insensitive? Actually, I’ve already made a mistake. Did you spot it? In my haste I’ve neglected to deal with https URLs. Not good. URI::Heuristic can help here.

This does exactly the same thing as the example above, but I’ve left the logic of checking for an existing scheme to the URI::Heuristic module. If you like this approach, but you’d rather get a URI object back then try this:

Caveats

Are we sure this is what we want? Checking the scheme is helpful and even if we weren’t using this module, we’d probably want to do this anyway.

That’s it! This module has been around for almost 18 years now, but it still solves some of today’s problems.

How to Get a CPAN Module Download URL

Every so often you find yourself requiring the download URL for a CPAN module. You can use the MetaCPAN API to do this quite easily, but depending on your use case, you may not be able to do this in a single query. Well, that’s actually not entirely true. Now that we have v1 of the MetaCPAN API deployed, you can test out the shiny new (experimental) download_url endpoint. This was an endpoint added by Clinton Gormley at the QA Hackathon in Berlin. Its primary purpose is to make it easy for an app like cpanm to figure out which archive to download when a module needs to be installed. MetaCPAN::Client doesn’t support this new endpoint yet, but if you want to take advantage of it, it’s pretty easy.

Now invoke your script:


olaf$ perl download_url.pl Plack
https://cpan.metacpan.org/authors/id/M/MI/MIYAGAWA/Plack-1.0039.tar.gz

Update!

After I originally wrote this post, MICKEY stepped up and actually added the functionality to MetaCPAN::Client. A huge thank you to him for doing this. ūüôā Let’s try this again:

That cuts the lines of code almost in half and is less error prone than crafting the query ourselves. I’d encourage you to use MetaCPAN::Client unless you have a compelling reason not to.

Caveats

This endpoint is experimental.  It might not do what you want in all cases.  See this GitHub issue for reference.  Please add to this issue if you find more cases which need to be addressed.  Having said that, this endpoint should do the right thing for most cases.  Feel free to play with it to see if it suits your needs.