meta::hack 3 Wrap Report

As I mentioned in my meta::hack preview post, for the third year running we have had the privilege of being financially sponsored by Booking.com and also working out of the ServerCentral offices in downtown Chicago in order to hack on MetaCPAN. None of this would have been possible without the support of Mark Keating and the Enlightened Perl Organization. Mark has (as usual) worked tirelessly to ensure that sponsor money moves in the right directions so that we are able to fund meta::hack every year. We are extremely grateful for his help.

I’d also like to thank MaxMind (my employer) for supporting me in attending this event.

This year, five of us worked on improving our corner of the Perl ecosystem.

The Attendee List

Getting There

We all had a great time this year. It’s always nice to spend time with friends and it’s great to see pet projects move ahead. Shawn and I flew in from Toronto on the Wednesday afternoon and had lunch at a German restaurant.

Mickey arrived from Amsterdam a little later in the day. The three of us met up with Doug for drinks and dinner in the evening.

Getting Started

On Thursday morning, Mickey, Shawn and I got up early and got a run in before heading to the office. We ran along the water and through Millenium Park, stopping off at “the bean” to take pictures and have a look around. That was a very nice, if chilly, way to start the day.

Right from day one we had a productive time. We started by discussing what needed to be done vs what we wanted to do and I think we got some combination of these things done. From my personal experience, I think the trick is to have people generally work on the things they find interesting, while also spending some time on the things that just need to get done. I think we found a good balance. Now, rather than give a chronology of the event, I’m going to cover our “greatest hits” for the event.

The Highlights

  • It turns out our CPAN mirrors were not in sync, causing some hard to debug errors. This got fixed.
  • We removed an unused and undocumented endpoint from the api (/search/simple)
  • FastMail came on board as a sponsor. As of this week they are handling our mailboxes. Many thanks to them and to https://metacpan.org/author/RJBS for getting us up and running in a hurry.
  • We are now equipped to delete CPAN authors and releases from MetaCPAN. Previously this was non-trivial, but we can now deal with spammy modules, authors and other issues which require uploads and or data to be removed.
  • The caching on our Travis CI API tests had stopped working, so we fixed this.
  • Our changes endpoint on the API now recognizes more types of Changes files. The means the search site will now more easily be able to help you find Changes files when you need them.
  • There were previously some cases where the API returned a non-unique list of modules in the provides output. This has now been fixed moving forward. If you spot a module which needs to be re-indexed, let us know and we can do this.
  • The search site now includes links to cpancover.com in cases where coverage reports are available. I hope this will bring more visibility to cpancover.com and also be useful to authors and users of CPAN modules. You’ll now more easily to be able to find Devel::Cover output for your favourite modules.
  • The /lab/dashboard endpoint of the search site has been broken for a long time. The exception was fixed and regression tests have been added. The content of the page is still mostly broken, but that can be fixed moving forward.
  • A Minion UI has been added for admins, so that we can now view the status of the indexing queue via a web browser
  • More endpoints have been documented. We have done this using OpenAPI. You can view the beginnings of it at fastapi.metacpan.org/static/index.html.
  • We have partially implemented a user search for admins. This will allow us more easily to troubleshoot issues with users logging in or adding PAUSE, github and other identities to their MetaCPAN logins. Also this will soon help us to identify issues with users who are not seeing all of their favourite modules listed.
  • We have partially implemented an admin endpoint for adding CPAN releases for re-indexing. We currently do this at the command line. Having a browser view for this means we can do this without requiring SSH. More importantly, we’ll eventually be able to make this available to end users of MetaCPAN so that people can request re-indexing of releases they care about.

That’s the bulk of what we got done. I haven’t even listed the work that Doug got done on CPANTesters, mostly because I stayed out of his way and I just don’t have the details. Doug was a gracious host. In addition to helping make sure that we got fed and got into the office when we needed it, he stayed late with us on those evenings when we just didn’t want to leave and was back again early the next morning. We should also thank Joel Berger who also helped to host us during our stay. Working out of the ServerCentral offices for the third year in a row has been really great. We know our way around and we’ve been treated extremely well. It was a real treat to be allowed back.

Joel was not able to join us onsite on the weekend, but he did join us on screen.

Other Fun Stuff

Aside from getting our work done, we got in a bike ride on Friday morning. It was snowing and quite windy along the water where we rode. I lost most of the feeling in my hands. In future I’ll either pack warmer gloves or exercise better judgement. Obviously I can’t count on the others to talk me out of bad ideas. 😉

We also got in more runs on Saturday and Sunday morning. So, while we spent a lot of time sitting in the office, putting calories into our bodies, we did get exercise every day of the conference. That really helped me be productive during the day.

Evenings were fun as well. We had drinks on the 96th floor of the Hancock Building, visited the Chicago Institute of Art and Mickey and I went to a Bulls game on Saturday night. There was only one night where everyone else was too tired to go out. I got myself a table for one at the hotel bar. Wipes tears from eyes. (That was actually not a bad experience either.)

In Conclusion

Our Sunday deployment was a bit of a bumpy ride. In retrospect I would have preferred less moving parts to be involved in a single deployment, but it was also interesting to debug the deployment in real time with a few of us troubleshooting via a 60 inch screen.

We left Doug behind in the office, with what was left of his TODO list.

The flight home was uneventful. There were still some deployment issues to sort out on Monday but those eventually got taken care of.

Several of our usual attendees were not able to make it this year and we really missed them. On the whole however, I was quite happy with what we accomplished this time around. Moving forward we’ll be able to continue to build on the work done during meta::hack and continue to improve MetaCPAN.

meta::hack is back!

For the third year running, we have the privilege of working out of the ServerCentral offices in downtown Chicago in order to hack on MetaCPAN. This year, five of us will be working on improving our corner of the Perl ecosystem. The physical attendee list is follows:

  • Doug Bell
  • Joel Berger
  • Olaf Alders
  • Mickey Nasriachi
  • Shawn Sorichetti

This, of course, would not be possible without the help of our 2018 sponsors: Booking.com and ServerCentral. You’ll notice their logos on the front page of MetaCPAN into 2019 as our way of thanking them for their support.

I’d like briefly to revisit last year’s event, which was quite productive and would not have happened without our 2017 sponsors:

In 2017, we made excellent progress on various fronts. Part of this progress came via an onsite visit from Panopta, our monitoring sponsor. They have a Chicago office and were very responsive when I asked them if they’d be willing to send someone down to chat with us about improving our Panopta configuration. So, Shabbir kindly dropped by, giving us in-person advice on how to configure our Panopta monitoring. One of the most helpful tweaks was setting up an IRC integration, so that outages are now broadcast to #metacpan on irc.perl.org.

Here’s a photo of Shabbir’s visit. (I wasn’t able to attend in person last year, but I was able to do so remotely. I’m the one watching from the wall.)

meta::hack v2 in 2017

From left to right:

Doug Bell, Thomas Sibley, Nicolas Rochelemagne, Graham Knop, Leo Lapworth (front), Joel Berger, Brad Lhotsky, Olaf Alders (pictured on monitor), Shabbir Karimi (from Panopta) and Mickey Nasriachi.

I’ll post more about our progress over the next few days. We’re working on some fixes and something new and shiny too. Wish us luck!

Perl Toolchain Summit 2018 Wrap-up Report

Perl Toolchain Summit 2018 Wrap-up Report

Getting There

This year I had the pleasure of attending the Perl Toolchain Summit in Oslo, Norway. Because of a conflict in my schedule, I initially didn’t think I’d be able to attend. After a lot of mental back and forth, I decided I’d try to make it work. The compromise was that this year I would leave on Sunday morning rather than on Monday. That meant I wasn’t able to participate in the last day of the summit, but I’d still have 3 entire days to get things done.

I left Toronto on Tuesday evening and arrived in Frankfurt on Wednesday morning. I got breakfast in Germany and after a two hour layover I was on a short flight to Oslo. From the Oslo airport I took a train into town.

I got into my tiny, tiny room and changed into my running clothes. I spent the next couple of hours running mostly along the water and making lots of stops along the way to take pictures and hang out. That was enough to burn off any excess energy.

In the evening I met up with everyone else at a bar called RØØR. Here it finally became clear to my why Salve had said alcohol would be expensive. (I hadn’t really believed him when he originally let us know). I had a great time regardless and it was nice to see a lot of familiar faces and catch up.

Day One

On Thursday morning we had an 8:30 breakfast at the venue and then 9:00 AM standup, where everyone had a chance to introduce themselves and talk about what they intended to work on. Leo and Mickey had a claimed a small room for our group, which meant that for the span of the summit we would all be able to sit around the same table and whiteboard. Some folks dropped by early in the morning to talk about a few items on their wish list. We also made a game plan for the summit and made various work assignments amongst ourselves. These sorts of face to face meetings are what make the summit invaluable. The first morning meeting allowed us to get everyone on the same page and then get to work.

This year we didn’t come into the summit with any one grand project to complete. Rather, we spent some time on smaller items, bug fixes, attending to outstanding issues on Github and infrastructure work. Specifically, we’re generally working towards having a better disaster recover plan and failover capacity for MetaCPAN. Leo did a lot of work on that, so I’ll let him blog about his contributions. For my part, I picked back up on the work that I had started at meta::hack this past November.

At meta::hack I had spent a good chunk of time upgrading our Debian from Wheezy to Stretch. This is an upgrade of two major releases and it turns out to be non-trivial in our case. A large part of the problem is that with our current setup we are on a version of Puppet (3.8.7) which is no longer supported. Since Wheezy has a newer Puppet (4.8.2) by default, I took this as an opportunity to upgrade Puppet. This meant that some of our Puppet config needed updating and that we also need to update our PuppetForge modules (3rd party dependencies). This was complicated enough that I was going to need a lot of help from Leo, so I sat myself beside him in order to make bothering him more convenient for the both of us. It turned out that I bothered him a lot. I shudder to think how long it would have taken me to work through this remotely. I can’t say that I enjoyed the work, but it was much more enjoyable to do this face to face and it was a much better use of everyone’s time.

Thursday also happened to be Leo’s birthday so we went out for drinks and various foods made of pork at a bar called Crowbar. Marcus Ramberg and Batman joined us there and eventually a good chunk of the attendees came along as well.

Day Two

I had gotten the bulk of the upgrade done on Thursday. On Friday I got up early and went for a run. Then, at the venue we continued tying up some loose ends on the Debian/Puppet upgrade and getting Leo involved in testing it out. I should add that part of this upgrade also included an overhaul of how we build our Vagrant boxes. Currently if you want to deploy a Vagrant instance you download a pre-built box which we have uploaded to one of our servers. Moving forward you’ll be able to build your own box on demand. It will take slightly longer to get set up, but you’ll have the latest and greatest software right away and you won’t need to rely on any one of us to upload an updated box for you to download. In the evening we all went out to a restaurant where there were speeches, excellent food and great conversations.

Day Three

On Saturday morning, Mickey and I met up at 6:30 AM for a quick run. At 7:30 AM we met up with Leo and Sawyer and rented some bikes. We cycled along the waterfront by the fortress and the Opera House before making it back just in time for standup. By now we were upgrading Debian and deploying the new Puppet on two of the three machines which Bytemark generously donates to us. This turned out to be non-trivial as the first machine we upgraded would not reboot. After some excellent technical support from Bytemark we were able to get past this as well.

I also added a URL mapping for static files to the MetaCPAN API. This is a first step in self-hosting some API documentation, likely using Swagger in some way. This is partly what’s preventing the MacOS Dash application from having more Perl documentation available.

In the evening most of us hung out at a local hacker/maker space. For me, it was a nice way to wrap up the summit.

In General

While all of this was going on, we also had a good debate about sponsorship of our project. We’ve been getting a lot of interest from potential sponsors over the last year or so and we’ve mostly been approaching it on a case by case basis. Having everyone in on a group discussion allowed us to set a sponsor policy moving forward that we think will be good for both us and our sponsors. (If your company would like to sponsor MetaCPAN or our next meta::hack conference, please don’t hesitate to get in touch with me).

I also fit in the usual code reviews over the 3 days and also tried to catch up on some outstanding Github issues. It turns out I had missed some entirely. There were issues going back as far as January which had not yet been responded to. 🙁

As part of some of the other work going on, there was progress made towards a tighter integration of the CPAN river data in the MetaCPAN API. Also, we now have access to Neil Bowers’ CPAN river data generator. There’s a plan to have MetaCPAN directly generate this data, rather than having us pull the data from him.

There was also a fair bit of discussion as to where to host the 2018 meta::hack. Our TODO list is long enough that the Perl Toolchain Summit isn’t quite enough for us to make the progress that we think we need to make. So, hopefully, we’ll be able to get together once more this year to continue to make progress.

Additionally Babs Veloso showed us a proposal for a redesign of the MetaCPAN author pages. It looks great. Now we just need to find someone to implement it. We also spent a fair amount of time on a part of the project which I can’t actually tell you about just yet, mostly because the details haven’t all been finalized yet. If all goes as expected, we’ll announce that in the coming months.

Getting Home

My flight back early on Sunday was uneventful. I spent an hour in Copenhagen and then was back home on Sunday afternoon in time to take the kids to the park. I got into an Eastern Standard Time sleep schedule by Sunday night. I dropped the kids off at school on Monday morning and then flew off to Boston for meetings until the following Friday. Since I telecommute, I rarely need to travel (or even wear shoes), but for some reason I ended up with back to back trips this year. In the end, it all worked out.

Thanks

On the whole this year’s PTS was very productive, fun and very well organized. I’d like to thank Salve J. Nilson, Neil Bowers, Stig Palmquist, Philippe Bruhat, Laurent Boivin and anyone else I may have missed. The venue, food, level of organization (and hoodies) were all excellent. It was a smooth experience from start to finish. I feel like I was able to get a lot of good work done and I am particularly pleased that I was able to finish what I had set out to do. Collectively, there was much more work completed on MetaCPAN than what I’ve described.

I would not have been able to make this trip without the support of MaxMind (my employer). MaxMind has sent me to PTS without hesitation each time that I’ve been invited and has also been a sponsor of the event for these past two years. Thanks, MaxMind!

Additionally, I’d like to thank all of our sponsors, without whom this would not have been possible:

MaxMind,
NUUG Foundation,
Teknologihuset,
Booking.com,
cPanel,
FastMail,
Elastic,
ZipRecruiter,
MongoDB,
SureVoIP,
Campus Explorer,
Bytemark,
Infinity Interactive,
OpusVL,
Eligo,
Perl Services,
Oetiker+Partner.

Announcing meta::hack v2

It’s that time of year again. We did a bunch of hacking on MetaCPAN at the Perl Toolchain Summit and we got a lot done, but it wasn’t nearly enough. Our TODO list never gets shorter and there are lots of folks willing to pitch in, so today I’m announcing that meta::hack v2 will take place from Nov 16-19, 2017 at Server Central in Chicago. As a reminder of how things went with meta::hack v1, please refer to my wrap-up report from that event.

The attendees this year are:

This group basically represents the MetaCPAN core contributors. We are restricting meta::hack to this core group again purely so that we can focus on getting the maximum amount of work done over the short time that we have together. If you are interested in joining a future meta::hack, now is a good time to start contributing to the project. Our intention is not exclude willing participants, but to keep the group limited to people who are already up to speed on contributing to the project. Having said that, if any hackers in the Chicago area want to hang out, we’re happy to go out to dinner with you while we are in town.

An event like this does cost money and we are still looking for some more sponsors. Some real estate on the front page of MetaCPAN will be used to recognize sponsors. If you or your company are interested in supporting this event, please contact us so that we can send you a copy of the sponsor prospectus.

meta::hack Wrap-up Report

meta::hack v1

Earlier this month (Thu, Nov 16 – Sun, Nov 20) I had the pleasure of meeting up with 7 other Perl hackers at ServerCentral’s downtown Chicago offices, in order to hack on MetaCPAN. Before I get started, I’d like to thank our sponsors.

This hackathon wouldn’t have been possible without the overwhelming support of our sponsors. Our platinum sponsors were Booking.com and cPanel. Our gold sponsors were Elastic, FastMail, and Perl Careers. Our silver sponsors were ActiveState, Perl Services, ServerCentral and Advance Systems. Our bronze sponsors were Vienna.pm, Easyname, and the Enlightened Perl Organisation (EPO). Please take a moment to thank them for helping our Perl community.

For the past 2.5 years, we’ve been working off and on at porting MetaCPAN from Elasticsearch 0.20.2 to 1.x and (eventually) 2.x. There were enough breaking changes between the versions to make this a non-trivial task. We had made very good progress over the past two QA hackathons, but the job was just too big to finish in the hours that we had available.

After the QA Hackathon in Rugby, I spoke to Neil Bowers about how we might go about doing some fundraising. Neil was so kind as to offer to help. His offer to help soon evolved into him taking on all of the work (thanks Neil)! Neil worked his magic and got the event fully funded. I know there was a lot of work invovled, but he made it look easy. Mark Keating and the Enlightened Perl Organization kindly took on the financial side of things, invoicing and accepting payment from sponsors. Without EPO and Neil, this event never would have taken place. (Please do take a moment to thank them).

While this was going on, we began searching for a venue. Joel Berger offered to host us at ServerCentral in Chicago and we immediately took him up on the offer. After that it was just a matter of folks booking plane tickets and getting approval from employers for the time off.

The final list of invitees was:

  • Brad Lhotsky (San Francisco)
  • Doug Bell (Chicago)
  • Graham Knop (Baltimore)
  • Joel Berger (Chicago)
  • Leo Lapworth (London)
  • Mickey Nasriachi (Amsterdam)
  • Olaf Alders (Toronto)
  • Thomas Sibley (Seattle)

The event was invitation only. We did this in order to maximize the amount of work we’d be able to finish at the event. [Insert reference to “The Mythical Man Month”]. Everyone who participated was already up to speed on the internals of the project or has an area of expertise which we needed in order to complete our goal of launching fully with v1 of the API. Because everyone already had a working VM and working knowledge of the project, we were able to tackle the problems at hand right from the first morning.

As far as living space goes, we initially had looked at renting hotel rooms, but the cost would have made it almost prohibitive to meet in Chicago. After doing some research, we booked two apartments (each with 3 bedrooms) on the same floor of a condo building in the Lincoln Park area of Chicago. We booked the accommodations via booking.com of course. 🙂 I think we were happy with the housing. Everyone had their own room and we had big enough living rooms for all of us to meet up mornings and some evenings. At the end of the day the rental was a fraction of the price of a Chicago hotel. I’ve also made a mental note not to be the last one to arrive in town. Apparently it also means you get the smallest room.

Each day we took the subway downtown to ServerCentral. We had a dedicated boardroom in the office with a large TV that we could use for sharing presentations, IRC chat or error logs. ServerCentral also sponsored lunch each day of the event. Extra monitors were also available for those who wanted them. (Lots of Roost laptop stands were to be seen. Also lots of people who couldn’t figure out how to open them after having collapsed them for the first time in forever).

After settling in at the office we’d discuss our plans for the day and map out goals for that day. We had breakout discussions where appropriate but the time spent not writing code was minimal. Generally, as a group, we worked well into the evenings. We didn’t get the full Chicago experience, but we got a lot done. We did make it to the Chicago Christkindlmarkt, which was a few blocks from the office and we went out for a breakfast and a dinner as well. Minimal downtime, but the breaks we had were lots of fun.

Day one was spent removing anything which was blocking the API upgrade. Wishlist items were ignored and as a group we worked really well. Lots of pull requests were created, reviewed and merged.

By day two of the hackathon we flipped the switch and went live with the new API. We could have waited a bit longer, but we opted to make the change earlier so that we could troubleshoot any issues as a group and watch the error logs in real time. There were no showstopping bugs and the transition was actually pretty smooth.

Day three was spent squashing some of the bugs which came up after the upgrade. We also started to tackle some wishlist items.

Day four was a slightly shorter day. We wrapped around 4 PM. Some of us went to check out “the Bean” before flying out while Leo and I headed right for our respective airports.

This list is by no means exhaustive, but over this long weekend we:

  • moved ++ data to v1 of the API.
  • moved https://metacpan.org to v1 of the API.
  • implemented load balancing via Fastly, our CDN sponsor.
  • reduced noise in the logs by squashing bugs which generated warnings or exceptions.
  • updated our API documentation as well as the metacpan-examples GitHub repository from v0 to v1.
  • published an upgrade document which explains to how upgrade your query syntax and configuration for v1.
  • moved http://explorer.metacpan.org to v1 of the API.
  • began work on streaming logs to Elasticsearch.
  • began moving the query logic that metacpan.org uses over to the API so that other clients can use this same logic.
  • began porting author queries from metacpan.org to the API as well.
  • added a meta::hack event page along with sponsor info to metacpan.org.
  • continued work on adding a /permission endpoint which will provide access to the data in 06perms.txt.
  • added more tests for the /download_url endpoint which translates module names into download URL. Specifically this is meant to be used by cpanm.
  • added snapshotting of Elasticsearch indices in v1 so that we can easily restore from backup.

/permission is something I spent a fair bit of my time working on over the last two days. Having 06perms.txt data in the API will mean that we can display a list of all authors who have maint on a module on metacpan.org. This will make it easier to track down authors who can release a module, particularly for those who aren’t familiar with the way PAUSE works. I think this branch is probably about 1.5 years old, so I was happy to get the time to try to finish it off. I didn’t quite get there, but that’s okay. It was a wishlist item and it’s actually quite close to being released.

Also of note is the fact that we’ve now officially deprecated the v0 API. There is a 6 month runway to move clients over to v1 and v0 will be taken offline on or after June 1, 2017.

Since https://metacpan.org now uses v1 of the API, results for v0 are no longer available. If you have a client which uses v0 of the API, please feel free to reach out to us with any concerns you may have about making the switch.

If you rely on updated ++ data, you’ll need to switch to v1 now, as ++ data in v0 is no longer being updated. The indexer is, however, still running on v0, so it will still find and index new CPAN uploads. v0 development is officially closed. Any v0 bugs (barring catastrophic issues) will likely not be addressed. v0 has been around for just over 6 years now. It has served us well, but it’s time to let it go. [Insert musical scene with a talking snowman, an ice queen and her loyal sister.]

Announcing meta::hack

Every so often, someone asks if they can donate money to MetaCPAN. I usually direct them to CPAN Testers, since (due to our generous hosting sponsors) we’ve generally not had a need for money. You can probably see where I’m going with this. Times have changed. We’re no longer turning financial sponsors away.

Back at the QA Hackathon in Rugby, we had a great group of hackers together and we got a lot of work done. However, as we worked together, it became clear that the size of our job meant that we wouldn’t be able to finish everything we had set out to do over that four day period. There are times when there’s no replacement for getting everyone in the same room together.


P4230367.jpg

The first dedicated MetaCPAN hackathon will be held at the offices of ServerCentral
in Chicago, from November 17th through 20th. The primary goal for this hackathon is to complete MetaCPAN’s transition to Elasticsearch version 2. This will enable the live service to run on a cluster of machines, greatly improving reliability and performance. The hackathon will also give the core team a chance to plan work for the coming 18 months.

The meta::hack event is a hackathon where we’re bringing together key developers to work on the MetaCPAN search engine and API. This will give core team members time to work together to complete the transition to Elasticsearch version 2, and time to discuss gnarly issues and plan the roadmap beyond the v1 upgrade.

MetaCPAN is now one of the key tools in a Perl developer’s toolbox, so supporting this event is a great way to support the Perl community and raise your company’s profile at the same time. This hackathon is by invitation only. It’s a core group of MetaCPAN hackers. We are keeping the group small in order to maintain focus on the v1 API and maximize the productivity of the group.

Why sponsor the MetaCPAN Hackathon?

 

• If your company uses Perl in any way, then your developers almost certainly use MetaCPAN to find CPAN modules, and they probably use other tools that are built on the MetaCPAN API.
• The MetaCPAN upgrade will improve the search engine and the API for all Perl developers. As a critical tool, we need it to be always available, and fast. This upgrade is a key step in that direction.
• This is a good way to establish your company as a friend of Perl, for example if you’re hiring.

Participants

 

There will be 8 people taking part, including me. Everyone taking part is an experienced senior-level software engineer, and most of them have already spent a lot of time working on MetaCPAN. As noted above, this is an invitational event with a very specific focus.

What is meta::hack?

 

MetaCPAN was created in late 2010. Version 0 of the MetaCPAN API was built on a very early version of Elasticsearch. For the first 5 years, most of the work on MetaCPAN focussed on improving the data coverage, and the web interface. In that time Elasticsearch has moved on, and we’re now well behind.

The work to upgrade Elasticsearch began in May of 2014. It continued in early Feb of 2015. Later, at the 2015 QA Hackathon in Berlin, Clinton Gormley (who works for Elastic) and I worked on moving MetaCPAN to Elasticsearch version 2. This work was continued at the 2016 QA Hackathon in Rugby, and as a result we now have a beta version in live usage.

The primary goal of meta::hack is to complete the port to Elasticsearch version 2, so the public API and search engine can be switched over. There are a number of benefits:

• Switching from a single server to a cluster of 3 servers, giving a more reliable service and improved performance.
• Once we decommission the old service, we’ll be able to set up a second cluster of 3 machines in a second data centre, for further improvements.
• We’ll be able to take advantage of new Elasticsearch features, like search suggesters.
• We’ll be able to use a new endpoint that has been developed specifically to speed up cpanminus lookups. Cpanminus is probably the most widely used CPAN client these days, so improving this will benefit a large percentage of the community.
• If and when search.cpan.org is decommissioned, we’ll be able to handle the extra traffic that will bring with it, and we’ll also have the redundancy to do this safely.
• We’ll be able to shift focus back to bug fixes and new MetaCPAN features.

Becoming a Sponsor

 

Neil Bowers has kindly taken on the task of shepherding the sponsorship process.  (He also wrote the sponsorship prospectus from which I cribbed most of this post.) Please contact Neil or contact me for a copy of the meta::hack sponsorship prospectus.  It contains most of the information listed above as well as the various available sponsorship levels which are available.  Thank you for your help in making this event happen.  We’re looking forward to getting the key people together in one room again and making this already useful tool even better.

How to Get a CPAN Module Download URL

Every so often you find yourself requiring the download URL for a CPAN module. You can use the MetaCPAN API to do this quite easily, but depending on your use case, you may not be able to do this in a single query. Well, that’s actually not entirely true. Now that we have v1 of the MetaCPAN API deployed, you can test out the shiny new (experimental) download_url endpoint. This was an endpoint added by Clinton Gormley at the QA Hackathon in Berlin. Its primary purpose is to make it easy for an app like cpanm to figure out which archive to download when a module needs to be installed. MetaCPAN::Client doesn’t support this new endpoint yet, but if you want to take advantage of it, it’s pretty easy.

Now invoke your script:


olaf$ perl download_url.pl Plack
https://cpan.metacpan.org/authors/id/M/MI/MIYAGAWA/Plack-1.0039.tar.gz

Update!

After I originally wrote this post, MICKEY stepped up and actually added the functionality to MetaCPAN::Client. A huge thank you to him for doing this. 🙂 Let’s try this again:

That cuts the lines of code almost in half and is less error prone than crafting the query ourselves. I’d encourage you to use MetaCPAN::Client unless you have a compelling reason not to.

Caveats

This endpoint is experimental.  It might not do what you want in all cases.  See this GitHub issue for reference.  Please add to this issue if you find more cases which need to be addressed.  Having said that, this endpoint should do the right thing for most cases.  Feel free to play with it to see if it suits your needs.

MetaCPAN at the 2016 Perl QA Hackathon

Before I start, I’d like to thank our sponsors

This year I once again had the pleasure of attending my 4th Perl QA Hackathon. Before I get into any details, I’d like to thank the organizers: Neil Bowers, Barbie and JJ Allen. They did a fantastic job. It was a very well organized event and really a lot of fun. It was well worth attending and it made a huge difference to the MetaCPAN project.  Thanks also to Wendy for making sure that everyone had what they needed.

Lastly, I’d like to thank all of the sponsors who made this event possible. These companies and individuals understand what makes the QA Hackathon such an important event and I’m happy that they wanted to help make it happen.

The Crew

My focus this year (as always) was working on MetaCPAN, but this time around I had much more help than usual. Leo Lapworth joined us from London for the first two days, working on the sysadmin side. Mickey Nasriachi came in from Amsterdam to work with us on the back end code. Matt Trout spent a lot of his time helping us with debugging and database replication. Sawyer spent a great deal of his time pair programming with us and helping us debug some really thorny issues. Also, what began as a conversation with Joel Berger about a simple MetaCPAN patch resulted in him spending much of his time looking at various issues. He now has a solid understanding of the MetaCPAN stack and we hope he can continue to contribute as we move forward.

We had a really good crew and we were all quite focussed. We removed ourselves from the main hackathon room so that we were able to have our own conversations and be less subject to distracting conversations from other groups. Since we were just outside of the main room we were able to talk with various others as they passed by our table. It was like having a space to ourselves, but we still felt very much a part of the hackathon.

Our main goal was to upgrade MetaCPAN from Elasticsearch 0.20.2 to 2.3.0 I spent a lot of time on this with Clinton Gormley at last year’s hackathon. The upgrade at that time was planned to be a 0.20.2 to a 1.x version. We were optimistic, but it became clear that it was a job that we couldn’t realistically finish. So, we left last year’s hackathon with some good changes, but we weren’t close to being able to deploy them. By this year, Elasticsearch had introduced even more breaking changes as it moved from 1.x to 2.x, so we had to factor those in as well.

For 2016, in the weeks coming up to the hackathon, Leo and I had been pushing a lot of code in preparation for this weekend. Around the same time, Mickey arrived on the scene and really moved things forward with his code changes too. So, we had a small core of developers working on the code well in advance of the hackathon. That’s actually one of the nice things about an event like this. I didn’t just write code when I got here. Having a firm date by which a number of things had to be done forced me to sit down and solve various problems in the weeks leading up to hackathon.

What did we actually get done?

Elasticsearch Cluster

One criticism of MetaCPAN has been a lack of redundancy. We’ve had a good amount of hardware available to us for some time, but we haven’t had a really good way to take advantage of it. Thanks to some of the work leading up to the hackathon, v1 of the API will run on an Elasticsearch cluster of 3 machines (rather than the 1 currently on production box, which is v0). Having a proper cluster at our disposal should make for faster searching and also greater redundancy if one of these machines needs to take an unscheduled break. On the human side, it will be a lot less stressful to lose one machine on a cluster of three than to lose one machine on a cluster of one. We all know these things happen. It’s just a matter of time. So, we’ll be better prepared for when a machine goes down.

Minion

Occasionally we need to re-index everything on CPAN. This takes a very long time. The current incarnation of MetaCPAN (v0) uses a script to do this and it can take 12 hours or more to run. If that script runs into some unhandled exception along the way, you have the rare pleasure of starting it up again manually. It needs some babysitting and it’s far from bulletproof. It’s also a bit hard to scale it.

Rather than trying to speed up our current system, we’ve added a Minion queue to our setup. This means that when we re-index CPAN, we add each upload as an item in our queue. We can then start workers on various boxes on the same network and we can run indexing in parallel. In our experiments we ran 17 workers each on 3 different boxes, giving us 51 workers in total. This gives us more speed and also more insight into which jobs have failed, how far along we are with indexing etc. It’s a huge improvement for us.

Postgres

Minion has more than one possible back end. We’ve chosen to go with Postgres. This means that we now have Postgres installed for the first time and also available for other uses. Matt Trout has been working on Postgres replication for us so that we have some redundancy for our queues as well. Once that is available, he can also write a Pg schema which MetaCPAN can use as part of the back end. This means that at some future date we could begin to store our data in both Pg and Elasticsearch. This would give us a hybrid approach, allowing us to use Elasticsearch for the things it does well and a relational database for the kinds of queries which a NoSQL store doesn’t handle well or at all in some cases.

As a historical footnote, the original version of the API first inserted into an SQLite database and then dumped that data into Elasticsearch. We may eventually come full circle and use a similar approach with Postgres.

RAM Disk

As part of Leo’s sysadmin work, he has set up a RAM disk for the indexer to use when unpacking tarballs. Even if this only saves a fraction of a second per archive, when you’re indexing 500,000 archives, even a small savings of time can be a win.

Elasticsearch Upgrade

Currently production runs on Elasticsearch version 0.20.2.  Our work this weekend has pushed us to using 2.3.0. Part of what has been holding us back is the many breaking changes which are involved in this particular upgrade. Much of our efforts at the hackathon were directed towards dealing with these various breaking changes. We haven’t quite tackled all of them yet, but we’re very close.

Deploying and Indexing a Beta Cluster

We now have a cluster of machines running our v1 beta.  I will publish the URLs as soon as we are ready for feedback.

Please note that our API versioning does not follow the Elasticsearch versioning. This frees us up to change API endpoints etc outside of the scope of another Elasticsearch upgrade.

CPAN River Integration

Joel Berger submitted a patch to integrate CPAN River statistics into the /distribution endpoint. The actual data will be provided by Neil Bowers. The patch to add this data to the /distribution endpoint has already been merged to the v1 branch and there has been some work done by Barbara to work on a front end display for the data.

CPANCover.com Integration

I had a chance to speak with Paul Johnson about cpancover.com I had initially put together an integration for his site 2 years ago at the QA Hackathon. I thought the integration was fine, but I ran into enough resistance from the MetaCPAN team that this pull request was never merged. We’ve now agreed on a way to move forward with this which will make everybody happy. There are open tickets on both the front and back end of MetaCPAN to address this.

Debian Packaging Information

Book is working on adding some information which can be used to correlate modules with their corresponding Debian packages. Once this is finished, this data can also be added to the distribution endpoint. The integration itself is pretty simple and will work much like the CPAN River.

Changes files

Graham Knopf wasn’t able to attend the QA Hackathon, but he did spend some time hacking from home. He has a patch in to alter how changes files are displayed.

Moving Towards Test2::Harness

I spoke with Chad Granum on the evening before the hackathon and I mentioned that we were using Test::Aggregate, one of the few distributions which was not yet playing nicely with Test2. I wasn’t too worried about this since we pin our dependencies via Carton but also because I’d been hoping to move away from it. I had been thinking about Test::Class::Moose as an alternative, but I didn’t want to go to the trouble of setting up test runners etc. Something simpler would be nice. Chad showed me Test2::Harness, which would give us the same advantages of running under Test::Aggregate. It looks great and should be available shortly. In the meantime I’ve gutted the Test::Aggregate logic from the tests and we’re running everything the old fashioned (slower) way for the time being. A switch to Test2::Harness in the near future should be trivial.

MetaCPAN::Moose

As part of our general cleanup, I released MetaCPAN::Moose. This is a simple bit of code which imports MooseX::StrictConstructor and namespace::autoclean into any class which uses it. After writing the code and the tests, I showed it to Sawyer. He sat down and immediately rewrote it using Import::Into. The code was now at least 50% smaller than it previously was and it was a lot cleaner. The tests continued to pass, so I was happy to release that to CPAN.

Moving forward we’re going to publish a few more of our internal modules to CPAN. These will serve several purposes:

  • It will be useful to us as a way of sharing code between various apps which we have. We use Carton to manage various app installs, so sharing code can be tricky. We didn’t want to go the submodule route unless we really have to.
  • Some people may also find this code useful. It’s a good way to showcase our logic as a way of doing things (like setting up your own custom Moose type library). People could learn from it.
  • Alternatively, people might look at it and realize it’s terrible. At this point they’ll hopefully hack on it and send pull requests. Because this code is standalone with its own test suite, the overhead of getting started will be much, much less than it is for hacking on the rest of CPAN.

I don’t think generally publishing internal logic to CPAN is a good idea, but for the above stated reasons, I think the code that we are talking about is well suited for this.

CPANTesters Data

We used to import CPAN Testers data into MetaCPAN using an SQLite database which they provided. At some point this database became unavailable. I’m encouraged to hear that this may not be a permanent state of affairs. If something can be worked out, the MetaCPAN can once again easily import testers data into its API using the database.

Somewhere out there I can hear someone complaining that this isn’t RESTful or whatever, but for this amount of data involved, it’s actually a good fit. I did discuss with Doug what a REST API for this might look like, but to be honest, that would potentially be much more work than just creating the database on some arbitrary schedule and publishing it.

Interesting Things I Learned From Random Conversations:

  • Matt Trout suggests abandoning MooseX::Types and moving our type checking to Type::Tiny. I’m on board with that, but it’s not a priority right now.
  • I learned from Sawyer that a simple speed optimization is switching to a Perl which is compiled without taint. Also he recommended some XS modules for header and cookie handling. The XS part wasn’t news to me, but it’s something I’ll keep in mind for future and certainly something I can make sure we do with MetaCPAN.

    Edit and caveat: As far as compiling Perl without taint mode goes, Sawyer was kind enough to refer me to some relevant p5p messages: http://nntp.perl.org/group/perl.perl5.porters/193822 http://nntp.perl.org/group/perl.perl5.porters/194361 Apparently there is some performance to be gained, but whether or not it’s worthwhile for you likely depends very much on the behaviour of your application.

  • I heard (once again) that Devel::Confess is a “better” tool for debugging. I’ve been using it for a while now and am very happy with it. I’m not the only one.
  • From Mickey, I learned about Devel::QuickCover, which sounds like an interesting way to get a first pass at coverage data.
  • I now know how to pronounce Upasana.
  • I learned that I’m not the only person who has no clue how to read a flame graph.
  • After a lengthy conversation with Matt Trout on the Thursday it wasn’t until I said, “hang on, I’ll send you the link on IRC” that he looked at his screen and then looked back up and said, “oh, that’s who you are”. I guess I could have introduced myself formally when he first sat down, but eventually we got there.
  • After seeing the Roost laptop stand in action, I think I need one.

Unrelated to MetaCPAN

Karen Etheridge was able to merge my fix to allow MooseX::Getopt to play nicely with init_arg. It’s a bug that has bitten me on more than one occasion. The fix has now been released.

After a conversation with BINGOS on Sort::Naturally, he got me co-maint on that module so that I can look at addressing an outstanding issue.

In Conclusion

For me, it was a great few days for moving the project along and socially quite fun. I got to see a bit of London on my arrival and spend a few hours at the British Museum, which I last visited about 20 years ago. In the afternoon, Leo was kind enough to drive me up to Rugby. Leo, Mickey and Joel were among the people whom I have spoken with on IRC but had never met in person. Making those real life connections is great.

On a practical level, I mostly started looking the correct way when crossing the street, but I wouldn’t bet anyone else’s safety on my ability to do the right thing there. Most of my ride from the airport to Leo’s office consisted of me feeling quite sick to my stomach as part of me really wanted the driver to switch to the correct right side of the road. London rush hour traffic and narrow streets with two way traffic probably didn’t help.

It was nice to see RJBS get a special show of thanks for his years as pumpking and also to witness the passing of the torch to Sawyer, who will do a fantastic job as he takes over. Also the tradition of publicly thanking the organizers has continued, which is a nice part of the weekend.

I should mention that this year there were no special outings. No video game museum tours, no chance to see how Chartreuse is made. Not even a trip to the set of Downton Abbey. That meant a few extra hours of hacking, bug squashing etc, which is nice too. I’m sure that deep down inside Neil really wanted to take us to a filming of Coronation Street, but he resisted the urge in order to further the goal of productivity.

All in all, I felt it was an extremely productive week for me and for MetaCPAN in general. My sincere thanks go out to the gang for having had me along once again this year.