meta::hack Wrap-up Report

meta::hack v1

Earlier this month (Thu, Nov 16 – Sun, Nov 20) I had the pleasure of meeting up with 7 other Perl hackers at ServerCentral’s downtown Chicago offices, in order to hack on MetaCPAN. Before I get started, I’d like to thank our sponsors.

This hackathon wouldn’t have been possible without the overwhelming support of our sponsors. Our platinum sponsors were and cPanel. Our gold sponsors were Elastic, FastMail, and Perl Careers. Our silver sponsors were ActiveState, Perl Services, ServerCentral and Advance Systems. Our bronze sponsors were, Easyname, and the Enlightened Perl Organisation (EPO). Please take a moment to thank them for helping our Perl community.

For the past 2.5 years, we’ve been working off and on at porting MetaCPAN from Elasticsearch 0.20.2 to 1.x and (eventually) 2.x. There were enough breaking changes between the versions to make this a non-trivial task. We had made very good progress over the past two QA hackathons, but the job was just too big to finish in the hours that we had available.

After the QA Hackathon in Rugby, I spoke to Neil Bowers about how we might go about doing some fundraising. Neil was so kind as to offer to help. His offer to help soon evolved into him taking on all of the work (thanks Neil)! Neil worked his magic and got the event fully funded. I know there was a lot of work invovled, but he made it look easy. Mark Keating and the Enlightened Perl Organization kindly took on the financial side of things, invoicing and accepting payment from sponsors. Without EPO and Neil, this event never would have taken place. (Please do take a moment to thank them).

While this was going on, we began searching for a venue. Joel Berger offered to host us at ServerCentral in Chicago and we immediately took him up on the offer. After that it was just a matter of folks booking plane tickets and getting approval from employers for the time off.

The final list of invitees was:

  • Brad Lhotsky (San Francisco)
  • Doug Bell (Chicago)
  • Graham Knop (Baltimore)
  • Joel Berger (Chicago)
  • Leo Lapworth (London)
  • Mickey Nasriachi (Amsterdam)
  • Olaf Alders (Toronto)
  • Thomas Sibley (Seattle)

The event was invitation only. We did this in order to maximize the amount of work we’d be able to finish at the event. [Insert reference to “The Mythical Man Month”]. Everyone who participated was already up to speed on the internals of the project or has an area of expertise which we needed in order to complete our goal of launching fully with v1 of the API. Because everyone already had a working VM and working knowledge of the project, we were able to tackle the problems at hand right from the first morning.

As far as living space goes, we initially had looked at renting hotel rooms, but the cost would have made it almost prohibitive to meet in Chicago. After doing some research, we booked two apartments (each with 3 bedrooms) on the same floor of a condo building in the Lincoln Park area of Chicago. We booked the accommodations via of course. 🙂 I think we were happy with the housing. Everyone had their own room and we had big enough living rooms for all of us to meet up mornings and some evenings. At the end of the day the rental was a fraction of the price of a Chicago hotel. I’ve also made a mental note not to be the last one to arrive in town. Apparently it also means you get the smallest room.

Each day we took the subway downtown to ServerCentral. We had a dedicated boardroom in the office with a large TV that we could use for sharing presentations, IRC chat or error logs. ServerCentral also sponsored lunch each day of the event. Extra monitors were also available for those who wanted them. (Lots of Roost laptop stands were to be seen. Also lots of people who couldn’t figure out how to open them after having collapsed them for the first time in forever).

After settling in at the office we’d discuss our plans for the day and map out goals for that day. We had breakout discussions where appropriate but the time spent not writing code was minimal. Generally, as a group, we worked well into the evenings. We didn’t get the full Chicago experience, but we got a lot done. We did make it to the Chicago Christkindlmarkt, which was a few blocks from the office and we went out for a breakfast and a dinner as well. Minimal downtime, but the breaks we had were lots of fun.

Day one was spent removing anything which was blocking the API upgrade. Wishlist items were ignored and as a group we worked really well. Lots of pull requests were created, reviewed and merged.

By day two of the hackathon we flipped the switch and went live with the new API. We could have waited a bit longer, but we opted to make the change earlier so that we could troubleshoot any issues as a group and watch the error logs in real time. There were no showstopping bugs and the transition was actually pretty smooth.

Day three was spent squashing some of the bugs which came up after the upgrade. We also started to tackle some wishlist items.

Day four was a slightly shorter day. We wrapped around 4 PM. Some of us went to check out “the Bean” before flying out while Leo and I headed right for our respective airports.

This list is by no means exhaustive, but over this long weekend we:

  • moved ++ data to v1 of the API.
  • moved to v1 of the API.
  • implemented load balancing via Fastly, our CDN sponsor.
  • reduced noise in the logs by squashing bugs which generated warnings or exceptions.
  • updated our API documentation as well as the metacpan-examples GitHub repository from v0 to v1.
  • published an upgrade document which explains to how upgrade your query syntax and configuration for v1.
  • moved to v1 of the API.
  • began work on streaming logs to Elasticsearch.
  • began moving the query logic that uses over to the API so that other clients can use this same logic.
  • began porting author queries from to the API as well.
  • added a meta::hack event page along with sponsor info to
  • continued work on adding a /permission endpoint which will provide access to the data in 06perms.txt.
  • added more tests for the /download_url endpoint which translates module names into download URL. Specifically this is meant to be used by cpanm.
  • added snapshotting of Elasticsearch indices in v1 so that we can easily restore from backup.

/permission is something I spent a fair bit of my time working on over the last two days. Having 06perms.txt data in the API will mean that we can display a list of all authors who have maint on a module on This will make it easier to track down authors who can release a module, particularly for those who aren’t familiar with the way PAUSE works. I think this branch is probably about 1.5 years old, so I was happy to get the time to try to finish it off. I didn’t quite get there, but that’s okay. It was a wishlist item and it’s actually quite close to being released.

Also of note is the fact that we’ve now officially deprecated the v0 API. There is a 6 month runway to move clients over to v1 and v0 will be taken offline on or after June 1, 2017.

Since now uses v1 of the API, results for v0 are no longer available. If you have a client which uses v0 of the API, please feel free to reach out to us with any concerns you may have about making the switch.

If you rely on updated ++ data, you’ll need to switch to v1 now, as ++ data in v0 is no longer being updated. The indexer is, however, still running on v0, so it will still find and index new CPAN uploads. v0 development is officially closed. Any v0 bugs (barring catastrophic issues) will likely not be addressed. v0 has been around for just over 6 years now. It has served us well, but it’s time to let it go. [Insert musical scene with a talking snowman, an ice queen and her loyal sister.]

Announcing meta::hack

Every so often, someone asks if they can donate money to MetaCPAN. I usually direct them to CPAN Testers, since (due to our generous hosting sponsors) we’ve generally not had a need for money. You can probably see where I’m going with this. Times have changed. We’re no longer turning financial sponsors away.

Back at the QA Hackathon in Rugby, we had a great group of hackers together and we got a lot of work done. However, as we worked together, it became clear that the size of our job meant that we wouldn’t be able to finish everything we had set out to do over that four day period. There are times when there’s no replacement for getting everyone in the same room together.


The first dedicated MetaCPAN hackathon will be held at the offices of ServerCentral
in Chicago, from November 17th through 20th. The primary goal for this hackathon is to complete MetaCPAN’s transition to Elasticsearch version 2. This will enable the live service to run on a cluster of machines, greatly improving reliability and performance. The hackathon will also give the core team a chance to plan work for the coming 18 months.

The meta::hack event is a hackathon where we’re bringing together key developers to work on the MetaCPAN search engine and API. This will give core team members time to work together to complete the transition to Elasticsearch version 2, and time to discuss gnarly issues and plan the roadmap beyond the v1 upgrade.

MetaCPAN is now one of the key tools in a Perl developer’s toolbox, so supporting this event is a great way to support the Perl community and raise your company’s profile at the same time. This hackathon is by invitation only. It’s a core group of MetaCPAN hackers. We are keeping the group small in order to maintain focus on the v1 API and maximize the productivity of the group.

Why sponsor the MetaCPAN Hackathon?


• If your company uses Perl in any way, then your developers almost certainly use MetaCPAN to find CPAN modules, and they probably use other tools that are built on the MetaCPAN API.
• The MetaCPAN upgrade will improve the search engine and the API for all Perl developers. As a critical tool, we need it to be always available, and fast. This upgrade is a key step in that direction.
• This is a good way to establish your company as a friend of Perl, for example if you’re hiring.



There will be 8 people taking part, including me. Everyone taking part is an experienced senior-level software engineer, and most of them have already spent a lot of time working on MetaCPAN. As noted above, this is an invitational event with a very specific focus.

What is meta::hack?


MetaCPAN was created in late 2010. Version 0 of the MetaCPAN API was built on a very early version of Elasticsearch. For the first 5 years, most of the work on MetaCPAN focussed on improving the data coverage, and the web interface. In that time Elasticsearch has moved on, and we’re now well behind.

The work to upgrade Elasticsearch began in May of 2014. It continued in early Feb of 2015. Later, at the 2015 QA Hackathon in Berlin, Clinton Gormley (who works for Elastic) and I worked on moving MetaCPAN to Elasticsearch version 2. This work was continued at the 2016 QA Hackathon in Rugby, and as a result we now have a beta version in live usage.

The primary goal of meta::hack is to complete the port to Elasticsearch version 2, so the public API and search engine can be switched over. There are a number of benefits:

• Switching from a single server to a cluster of 3 servers, giving a more reliable service and improved performance.
• Once we decommission the old service, we’ll be able to set up a second cluster of 3 machines in a second data centre, for further improvements.
• We’ll be able to take advantage of new Elasticsearch features, like search suggesters.
• We’ll be able to use a new endpoint that has been developed specifically to speed up cpanminus lookups. Cpanminus is probably the most widely used CPAN client these days, so improving this will benefit a large percentage of the community.
• If and when is decommissioned, we’ll be able to handle the extra traffic that will bring with it, and we’ll also have the redundancy to do this safely.
• We’ll be able to shift focus back to bug fixes and new MetaCPAN features.

Becoming a Sponsor


Neil Bowers has kindly taken on the task of shepherding the sponsorship process.  (He also wrote the sponsorship prospectus from which I cribbed most of this post.) Please contact Neil or contact me for a copy of the meta::hack sponsorship prospectus.  It contains most of the information listed above as well as the various available sponsorship levels which are available.  Thank you for your help in making this event happen.  We’re looking forward to getting the key people together in one room again and making this already useful tool even better.

How to Get a CPAN Module Download URL

Every so often you find yourself requiring the download URL for a CPAN module. You can use the MetaCPAN API to do this quite easily, but depending on your use case, you may not be able to do this in a single query. Well, that’s actually not entirely true. Now that we have v1 of the MetaCPAN API deployed, you can test out the shiny new (experimental) download_url endpoint. This was an endpoint added by Clinton Gormley at the QA Hackathon in Berlin. Its primary purpose is to make it easy for an app like cpanm to figure out which archive to download when a module needs to be installed. MetaCPAN::Client doesn’t support this new endpoint yet, but if you want to take advantage of it, it’s pretty easy.

Now invoke your script:

olaf$ perl Plack


After I originally wrote this post, MICKEY stepped up and actually added the functionality to MetaCPAN::Client. A huge thank you to him for doing this. 🙂 Let’s try this again:

That cuts the lines of code almost in half and is less error prone than crafting the query ourselves. I’d encourage you to use MetaCPAN::Client unless you have a compelling reason not to.


This endpoint is experimental.  It might not do what you want in all cases.  See this GitHub issue for reference.  Please add to this issue if you find more cases which need to be addressed.  Having said that, this endpoint should do the right thing for most cases.  Feel free to play with it to see if it suits your needs.

MetaCPAN at the 2016 Perl QA Hackathon

Before I start, I’d like to thank our sponsors

This year I once again had the pleasure of attending my 4th Perl QA Hackathon. Before I get into any details, I’d like to thank the organizers: Neil Bowers, Barbie and JJ Allen. They did a fantastic job. It was a very well organized event and really a lot of fun. It was well worth attending and it made a huge difference to the MetaCPAN project.  Thanks also to Wendy for making sure that everyone had what they needed.

Lastly, I’d like to thank all of the sponsors who made this event possible. These companies and individuals understand what makes the QA Hackathon such an important event and I’m happy that they wanted to help make it happen.

The Crew

My focus this year (as always) was working on MetaCPAN, but this time around I had much more help than usual. Leo Lapworth joined us from London for the first two days, working on the sysadmin side. Mickey Nasriachi came in from Amsterdam to work with us on the back end code. Matt Trout spent a lot of his time helping us with debugging and database replication. Sawyer spent a great deal of his time pair programming with us and helping us debug some really thorny issues. Also, what began as a conversation with Joel Berger about a simple MetaCPAN patch resulted in him spending much of his time looking at various issues. He now has a solid understanding of the MetaCPAN stack and we hope he can continue to contribute as we move forward.

We had a really good crew and we were all quite focussed. We removed ourselves from the main hackathon room so that we were able to have our own conversations and be less subject to distracting conversations from other groups. Since we were just outside of the main room we were able to talk with various others as they passed by our table. It was like having a space to ourselves, but we still felt very much a part of the hackathon.

Our main goal was to upgrade MetaCPAN from Elasticsearch 0.20.2 to 2.3.0 I spent a lot of time on this with Clinton Gormley at last year’s hackathon. The upgrade at that time was planned to be a 0.20.2 to a 1.x version. We were optimistic, but it became clear that it was a job that we couldn’t realistically finish. So, we left last year’s hackathon with some good changes, but we weren’t close to being able to deploy them. By this year, Elasticsearch had introduced even more breaking changes as it moved from 1.x to 2.x, so we had to factor those in as well.

For 2016, in the weeks coming up to the hackathon, Leo and I had been pushing a lot of code in preparation for this weekend. Around the same time, Mickey arrived on the scene and really moved things forward with his code changes too. So, we had a small core of developers working on the code well in advance of the hackathon. That’s actually one of the nice things about an event like this. I didn’t just write code when I got here. Having a firm date by which a number of things had to be done forced me to sit down and solve various problems in the weeks leading up to hackathon.

What did we actually get done?

Elasticsearch Cluster

One criticism of MetaCPAN has been a lack of redundancy. We’ve had a good amount of hardware available to us for some time, but we haven’t had a really good way to take advantage of it. Thanks to some of the work leading up to the hackathon, v1 of the API will run on an Elasticsearch cluster of 3 machines (rather than the 1 currently on production box, which is v0). Having a proper cluster at our disposal should make for faster searching and also greater redundancy if one of these machines needs to take an unscheduled break. On the human side, it will be a lot less stressful to lose one machine on a cluster of three than to lose one machine on a cluster of one. We all know these things happen. It’s just a matter of time. So, we’ll be better prepared for when a machine goes down.


Occasionally we need to re-index everything on CPAN. This takes a very long time. The current incarnation of MetaCPAN (v0) uses a script to do this and it can take 12 hours or more to run. If that script runs into some unhandled exception along the way, you have the rare pleasure of starting it up again manually. It needs some babysitting and it’s far from bulletproof. It’s also a bit hard to scale it.

Rather than trying to speed up our current system, we’ve added a Minion queue to our setup. This means that when we re-index CPAN, we add each upload as an item in our queue. We can then start workers on various boxes on the same network and we can run indexing in parallel. In our experiments we ran 17 workers each on 3 different boxes, giving us 51 workers in total. This gives us more speed and also more insight into which jobs have failed, how far along we are with indexing etc. It’s a huge improvement for us.


Minion has more than one possible back end. We’ve chosen to go with Postgres. This means that we now have Postgres installed for the first time and also available for other uses. Matt Trout has been working on Postgres replication for us so that we have some redundancy for our queues as well. Once that is available, he can also write a Pg schema which MetaCPAN can use as part of the back end. This means that at some future date we could begin to store our data in both Pg and Elasticsearch. This would give us a hybrid approach, allowing us to use Elasticsearch for the things it does well and a relational database for the kinds of queries which a NoSQL store doesn’t handle well or at all in some cases.

As a historical footnote, the original version of the API first inserted into an SQLite database and then dumped that data into Elasticsearch. We may eventually come full circle and use a similar approach with Postgres.

RAM Disk

As part of Leo’s sysadmin work, he has set up a RAM disk for the indexer to use when unpacking tarballs. Even if this only saves a fraction of a second per archive, when you’re indexing 500,000 archives, even a small savings of time can be a win.

Elasticsearch Upgrade

Currently production runs on Elasticsearch version 0.20.2.  Our work this weekend has pushed us to using 2.3.0. Part of what has been holding us back is the many breaking changes which are involved in this particular upgrade. Much of our efforts at the hackathon were directed towards dealing with these various breaking changes. We haven’t quite tackled all of them yet, but we’re very close.

Deploying and Indexing a Beta Cluster

We now have a cluster of machines running our v1 beta.  I will publish the URLs as soon as we are ready for feedback.

Please note that our API versioning does not follow the Elasticsearch versioning. This frees us up to change API endpoints etc outside of the scope of another Elasticsearch upgrade.

CPAN River Integration

Joel Berger submitted a patch to integrate CPAN River statistics into the /distribution endpoint. The actual data will be provided by Neil Bowers. The patch to add this data to the /distribution endpoint has already been merged to the v1 branch and there has been some work done by Barbara to work on a front end display for the data. Integration

I had a chance to speak with Paul Johnson about I had initially put together an integration for his site 2 years ago at the QA Hackathon. I thought the integration was fine, but I ran into enough resistance from the MetaCPAN team that this pull request was never merged. We’ve now agreed on a way to move forward with this which will make everybody happy. There are open tickets on both the front and back end of MetaCPAN to address this.

Debian Packaging Information

Book is working on adding some information which can be used to correlate modules with their corresponding Debian packages. Once this is finished, this data can also be added to the distribution endpoint. The integration itself is pretty simple and will work much like the CPAN River.

Changes files

Graham Knopf wasn’t able to attend the QA Hackathon, but he did spend some time hacking from home. He has a patch in to alter how changes files are displayed.

Moving Towards Test2::Harness

I spoke with Chad Granum on the evening before the hackathon and I mentioned that we were using Test::Aggregate, one of the few distributions which was not yet playing nicely with Test2. I wasn’t too worried about this since we pin our dependencies via Carton but also because I’d been hoping to move away from it. I had been thinking about Test::Class::Moose as an alternative, but I didn’t want to go to the trouble of setting up test runners etc. Something simpler would be nice. Chad showed me Test2::Harness, which would give us the same advantages of running under Test::Aggregate. It looks great and should be available shortly. In the meantime I’ve gutted the Test::Aggregate logic from the tests and we’re running everything the old fashioned (slower) way for the time being. A switch to Test2::Harness in the near future should be trivial.


As part of our general cleanup, I released MetaCPAN::Moose. This is a simple bit of code which imports MooseX::StrictConstructor and namespace::autoclean into any class which uses it. After writing the code and the tests, I showed it to Sawyer. He sat down and immediately rewrote it using Import::Into. The code was now at least 50% smaller than it previously was and it was a lot cleaner. The tests continued to pass, so I was happy to release that to CPAN.

Moving forward we’re going to publish a few more of our internal modules to CPAN. These will serve several purposes:

  • It will be useful to us as a way of sharing code between various apps which we have. We use Carton to manage various app installs, so sharing code can be tricky. We didn’t want to go the submodule route unless we really have to.
  • Some people may also find this code useful. It’s a good way to showcase our logic as a way of doing things (like setting up your own custom Moose type library). People could learn from it.
  • Alternatively, people might look at it and realize it’s terrible. At this point they’ll hopefully hack on it and send pull requests. Because this code is standalone with its own test suite, the overhead of getting started will be much, much less than it is for hacking on the rest of CPAN.

I don’t think generally publishing internal logic to CPAN is a good idea, but for the above stated reasons, I think the code that we are talking about is well suited for this.

CPANTesters Data

We used to import CPAN Testers data into MetaCPAN using an SQLite database which they provided. At some point this database became unavailable. I’m encouraged to hear that this may not be a permanent state of affairs. If something can be worked out, the MetaCPAN can once again easily import testers data into its API using the database.

Somewhere out there I can hear someone complaining that this isn’t RESTful or whatever, but for this amount of data involved, it’s actually a good fit. I did discuss with Doug what a REST API for this might look like, but to be honest, that would potentially be much more work than just creating the database on some arbitrary schedule and publishing it.

Interesting Things I Learned From Random Conversations:

  • Matt Trout suggests abandoning MooseX::Types and moving our type checking to Type::Tiny. I’m on board with that, but it’s not a priority right now.
  • I learned from Sawyer that a simple speed optimization is switching to a Perl which is compiled without taint. Also he recommended some XS modules for header and cookie handling. The XS part wasn’t news to me, but it’s something I’ll keep in mind for future and certainly something I can make sure we do with MetaCPAN.

    Edit and caveat: As far as compiling Perl without taint mode goes, Sawyer was kind enough to refer me to some relevant p5p messages: Apparently there is some performance to be gained, but whether or not it’s worthwhile for you likely depends very much on the behaviour of your application.

  • I heard (once again) that Devel::Confess is a “better” tool for debugging. I’ve been using it for a while now and am very happy with it. I’m not the only one.
  • From Mickey, I learned about Devel::QuickCover, which sounds like an interesting way to get a first pass at coverage data.
  • I now know how to pronounce Upasana.
  • I learned that I’m not the only person who has no clue how to read a flame graph.
  • After a lengthy conversation with Matt Trout on the Thursday it wasn’t until I said, “hang on, I’ll send you the link on IRC” that he looked at his screen and then looked back up and said, “oh, that’s who you are”. I guess I could have introduced myself formally when he first sat down, but eventually we got there.
  • After seeing the Roost laptop stand in action, I think I need one.

Unrelated to MetaCPAN

Karen Etheridge was able to merge my fix to allow MooseX::Getopt to play nicely with init_arg. It’s a bug that has bitten me on more than one occasion. The fix has now been released.

After a conversation with BINGOS on Sort::Naturally, he got me co-maint on that module so that I can look at addressing an outstanding issue.

In Conclusion

For me, it was a great few days for moving the project along and socially quite fun. I got to see a bit of London on my arrival and spend a few hours at the British Museum, which I last visited about 20 years ago. In the afternoon, Leo was kind enough to drive me up to Rugby. Leo, Mickey and Joel were among the people whom I have spoken with on IRC but had never met in person. Making those real life connections is great.

On a practical level, I mostly started looking the correct way when crossing the street, but I wouldn’t bet anyone else’s safety on my ability to do the right thing there. Most of my ride from the airport to Leo’s office consisted of me feeling quite sick to my stomach as part of me really wanted the driver to switch to the correct right side of the road. London rush hour traffic and narrow streets with two way traffic probably didn’t help.

It was nice to see RJBS get a special show of thanks for his years as pumpking and also to witness the passing of the torch to Sawyer, who will do a fantastic job as he takes over. Also the tradition of publicly thanking the organizers has continued, which is a nice part of the weekend.

I should mention that this year there were no special outings. No video game museum tours, no chance to see how Chartreuse is made. Not even a trip to the set of Downton Abbey. That meant a few extra hours of hacking, bug squashing etc, which is nice too. I’m sure that deep down inside Neil really wanted to take us to a filming of Coronation Street, but he resisted the urge in order to further the goal of productivity.

All in all, I felt it was an extremely productive week for me and for MetaCPAN in general. My sincere thanks go out to the gang for having had me along once again this year.