Saturday, March 31, 2012

MediaWiki; from svn to git & gerrit and a bit of math

Been a while since I wrote here. I wanted to discuss a great change that has come to MediaWiki, and it is the adaptation of Git and Gerrit over our old Subversion system. It has been discussed at length already, but I wanted to discuss the actual switch process and what it meant for me as an individual.

TLDR version: Little time, big switch, Gerrit needs lots of work, more coherent documentation needed and stay vigilant. Bad or Good cannot be stated yet.

Where I'm coming from

First of all, I should clarify that I already used Git quite a bit. We used it within VideoLAN and I use it myself almost on a daily basis as a wrapper around some of the Subversion repositories I use. So you could say that using it should not be too troublesome to me. I already know the commands and the principle ideas behind git and how they differ from other SCM systems. The only new addition is Gerrit...

I have little time on my hands to work on Wikimedia and MediaWiki these days. 3 hours total during the weekdays, 4 hours in a weekend and that's about it. Most of that time is spent on reading bugzilla, the Village Pump, mailing lists, updating my code or other things that are required to 'keep up' with current affairs. The rest of that time I tend to fill with smaller bugfixes, which easily fit within an hour or two of available time. So switching repo systems might seem like a small thing, but if you have 3 installations of MediaWiki and you need to convert them and install some additional software on the side to later submit your code, then that basically fills the time of one week that I can put in on the projects. Since I had quite a bit of time last weekend, I figured I'd better jump on the bandwagon right away, in fear of getting too far behind to catch up any time soon.

Doing the switch

All in all, the process of actually getting the code was easy for someone familiar with git. The most difficult bit was switching the 3 local installations that I use for testing stuff. I decided to cut down one, so that left 2. But still, switching out all the extensions in two installations with their git variants and finding out that actually some of my installed extensions were NOT switched to git but are still in subversion (after already having deleted them of course) was quite the task. Summary, installing new versions of git, git-review and updating half of my macports installed software; 1,5 hour. Switching the repos and their extensions, and migrating any stuff that I had changed locally but not had a chance to commit yet; about 2,5 hours to get everything up and running again and making patches of my local changes.

Changing workflow and a bit about math

Math in Wikipedia
Well everything fixed you'd think. Well sort of. The other issue I immediately recognized, was that the more problematic part in the long run, would be the change in workflow. For someone just doing about 3 hours of coding on average a week on a project as this, anything interrupting your workflow takes huge chunks of time out of those 3 hours. So I decided to better get on with it and learn right now, so I could identify blocking issues and see them solved as soon as possible. Where to start... I always find it best to pick 'small' identifiable chunks of work in these cases, that are somewhat isolated from the regular work flow. I picked the Math extension.

The Math extension is basically a LaTeX Math parser, that can output math entered in for instance Wikipedia and show it in an article using either HTML or rendered images. This is needed because the original web, really had no way to properly render math. It required too many symbols that were not part of any regular font, and the positioning of those symbols went far beyond what was easily possible with HTML. So the extension interprets the LaTeX code, and if possible uses HTML, but more often actually renders the text using a LaTeX renderer to a PNG image that it then includes in the webpage. There are many downsides to this approach, but for years it was the only way to do it remotely predictable. I'm also a bit familiar with the code as I had applied several patches to it in the past.

I love keeping an eye on stuff like this, the stuff that sort of limits the quality that Wikipedia can deliver in areas where it really does want to deliver. The same with mobile, video, music scores, text that has a different directionality then Left-to-right or text for which no fonts exist. Those tickets are often on my bugzilla watchlist.

Math using MathJax
For math, things have finally progressed after 8 years of status-quo. Over the past few years we have seen the maturing of MathML, the standard which is supposed to bring math to the web. We have also seen the coming of Webfont technology, and ever more advanced Javascript. All this has led to the creation of MathJax, a javascript library that uses what it knows about your browser, to generate readable math equations in HTML or MathML, and provides any reader with the fonts needed to properly represent this in your browser, regardless of how well your browser or OS supports math. It has several downsides as well. It is slow as hell and incredibly complicated, but the big plus is that it actually works without requiring images (unless you are on real ancient stuff like IE6).

For years math has been a hot debate within Wikimedia, with people desperately desiring higher quality and better reusability. With the advent and maturing of MathJax, that finally seems a possibility and MathJax was an oft requested feature by the Mathematics project on the English Wikipedia. A while ago, User:Nageh developed a MathJax user script to bring MathJax to Wikipedia. When Brion Vibber was looking into how to progress with Bug 24207, he was pointed at Nageh's script and decided to investigate if it was possible to start properly integrating MathJax into the Math extension to finally start to improve Math rendering in Wikipedia. He came back with some preliminary results pretty quickly.

Committing patches

Brion did the heavy lifting of making it MediaWiki ready, but as always with conversions such as these, there are always loads of 'small things' that need to get done before it can actually be deployed on Wikipedia. That made it a perfect area for me to test the new Git/Gerrit workflow by creating a few patches and submitting them to Gerrit.

My Gerrit dashboard with the changes in question.

My experience:

  • Gerrit is an awful interface. It's like going back to bugzilla if you are used to Jira. Much work will be needed there.
  • One of the downsides I found to Gerrit, is creating permanent links to users or lists of changes. I often want to look into a stream of a user's changes or look at a particular type of changes, and there is no way to do this other then to keep clicking in Gerrit until you find that list. Basically anything but permalinks to patch submissions and individual patches seems to be hard to get your hands on.
  • Git-review is taking away much of the Gerrit trouble, but still commands like: git fetch ssh:// refs/changes/71/3771/1 && git checkout FETCH_HEAD tend to be needed in my workflow and that's just unfriendly and difficult to work with.
  • Gerrit is great for individual patch review.
  • Gerrit is terrible at "huge rewrite review" at first look.
  • Gerrit is terrible for "many sequential patches building on each other" review AND workflow. I want to commit often and get small chunks reviewed if they are useful on it self, without squashing them in one huge patch that I then need to spend hours in review limbo over. It's however almost impossibly disruptive on the workflow as soon as something intermediate needs to be adapted half way one of those dependencies.
  • This will require much getting used to, especially for the newbies. We better set up a system to feed Gerrit with Git patch diffs created using git format-patch, it's much easier for the real n00bies.
  • We need to do more about the patch changes and how individuals have contributed to making the final patch. It's subpar in Gerrit in my opinion (most of the history is left in Gerrit instead of in the final merge).
  • I'm not sure yet if my workflow will be faster or slower, just yet, but I suspect slower. On the other hand, I think we will see more submissions for code that we as developers are not really sure about. I have a slew of changes that require further testing and review, but due to lack of setup or feedback from other developers have basically been in limbo since forever. I plan to just submit them and see what happens. They might rot in Gerrit review limbo, but they won't rot on my HDD anymore. Hopefully having them in review will force someone to pick up the pieces.

A verdict ?

Is it good, or is it bad ? I'm not sure yet. My current guess is that it's a boon for the quality of the sourcecode, and that it will probably speed up the pace of development overall, as well as the workflow of the more experienced developers. I fear however for those still learning. The workflow has become incredibly more convoluted, adding risk of people giving up half way. That's not guaranteed to happen of course and I have hope yet that with improving Gerrit and working on our workflow and tooling documentation that we can stave that off. I know SumanahRoan and several others have been working tirelessly on just that. But we are not done by a long shot on that front I fear.

I also fear that MediaWiki is gonna be more like Wikipedia. A gigantic set of rules you need to fulfill before your article change/patch makes the cut, driving up the requirements we put on our editors/committers and widening the gap between those just getting started and the vets. We need to make sure to stay incredibly welcoming and make sure volunteers feel that their changes are accepted as they are, instead of overly reworked and rewritten to make the community cut. There is no feeling so rewarding for those still learning, as seeing your change go straight into a major piece of software. We need to keep a close eye on that as a software development community.

Thank you for reading

That's my story of the switch that I had to go trough almost a week ago now. Find more experiences by other developers on the Wikitech mailing list. And a shout-out to the Signpost for once again doing proper summarizing of a complicated topic in last week's edition.