Monday, October 8, 2012

5 years of Article message boxes

Do you recognize these boxes ? Most likely you do. These are the very recognizable "amboxes", which is a short for "Article message boxes". They are often visible at the top of articles in Wikipedia and one of the most recognizable elements of those articles.

Today I noticed that these boxes are now just over 5 years (and a month) old. They were first introduced to the general public starting from September 2007. Their features are in short; a single consistent design, color coded for severity and purpose, dynamic but consistent in width (stackable), IE 5.5 and IE 6.0 compatible and a consistent parameter setup for its content.
And that is a big deal, because I still remember what it looked like before when it had none of that. There were dozens of templates with different widths, different colors, different spacing and they all had different parameters. [I've been trying to find an image from back then, but I haven't been able to find one. Please let me know if you find one.]

It seems just like yesterday that I was one many people participating in their creation. The main design idea of the color bars at the left side seems to have come from [[User:Flamurai]], who already envisioned this in november 2006 it seems, calling it 'Blanca'. It seems he is no longer editing, but I would still like to thank him for this wonderful simple idea that has been in use seemingly without much opposition for so many years now. Most of the implementation was spear headed by David Göthberg if memory serves me well.

The revamp led to an entire family of templates for notices for different kinds of pages in 2008, {{mbox}}, {{cmbox}}, {{imbox}}, {{tmbox}}, {{ombox}}{{asbox}} and {{fmbox}}. In the end the whole effort was a very collaborative effort in which 3 dozen or so active editors made important contributions, including well known names as MZMcBride, Anomie, Happy-Melon, David Levy, Quiddity, RockMFR, Remember the Dot, Ilmari Karonen, Father Goose, Ned Scott etc etc. A lot of effort, opinion and testing has gone into these templates back then and in my opinion, that is why they have been so successful for so long.

So to all those involved over 5 years ago in creating the new article message box styles, congratulations and a big thank you. Especially [[User:Flamurai]] and [[User:David Göthberg]].

Sunday, July 29, 2012

Bleeding edge or is it ?

As most people know, Wikipedia usually runs the bleeding edge code of MediaWiki. Currently new versions are deployed every 2 weeks. This is great, necessary and sometimes annoying for Wikipedians. There is a common complaint that MediaWiki treats Wikipedia as it's experimentation grounds.

On the other hand MediaWiki is overly focused on Wikipedia. Without Wikipedia, I think that the default MediaWiki would look a lot more like Wikia than like Wikipedia. In my opinion, if MediaWiki treats Wikipedia as it's sandbox then it does so because the only sandbox that compares to Wikipedia is Wikipedia itself. There ARE no other viable experimentation grounds that compare to the distorted reality of Wikipedia.

So how bleeding edge is bleeding edge? Code is deployed almost every 2 weeks, yet HTML5 has been the default for MediaWiki for over 3 years now, but has still not made it to Wikipedia for all sorts of compatibility reasons and accommodating to the volunteer tech community.

HTML5 mode is currently scheduled to be deployed this summer.

Saturday, March 31, 2012

MediaWiki; from svn to git & gerrit and a bit of math

Been a while since I wrote here. I wanted to discuss a great change that has come to MediaWiki, and it is the adaptation of Git and Gerrit over our old Subversion system. It has been discussed at length already, but I wanted to discuss the actual switch process and what it meant for me as an individual.

TLDR version: Little time, big switch, Gerrit needs lots of work, more coherent documentation needed and stay vigilant. Bad or Good cannot be stated yet.

Where I'm coming from

First of all, I should clarify that I already used Git quite a bit. We used it within VideoLAN and I use it myself almost on a daily basis as a wrapper around some of the Subversion repositories I use. So you could say that using it should not be too troublesome to me. I already know the commands and the principle ideas behind git and how they differ from other SCM systems. The only new addition is Gerrit...

I have little time on my hands to work on Wikimedia and MediaWiki these days. 3 hours total during the weekdays, 4 hours in a weekend and that's about it. Most of that time is spent on reading bugzilla, the Village Pump, mailing lists, updating my code or other things that are required to 'keep up' with current affairs. The rest of that time I tend to fill with smaller bugfixes, which easily fit within an hour or two of available time. So switching repo systems might seem like a small thing, but if you have 3 installations of MediaWiki and you need to convert them and install some additional software on the side to later submit your code, then that basically fills the time of one week that I can put in on the projects. Since I had quite a bit of time last weekend, I figured I'd better jump on the bandwagon right away, in fear of getting too far behind to catch up any time soon.

Doing the switch

All in all, the process of actually getting the code was easy for someone familiar with git. The most difficult bit was switching the 3 local installations that I use for testing stuff. I decided to cut down one, so that left 2. But still, switching out all the extensions in two installations with their git variants and finding out that actually some of my installed extensions were NOT switched to git but are still in subversion (after already having deleted them of course) was quite the task. Summary, installing new versions of git, git-review and updating half of my macports installed software; 1,5 hour. Switching the repos and their extensions, and migrating any stuff that I had changed locally but not had a chance to commit yet; about 2,5 hours to get everything up and running again and making patches of my local changes.

Changing workflow and a bit about math

Math in Wikipedia
Well everything fixed you'd think. Well sort of. The other issue I immediately recognized, was that the more problematic part in the long run, would be the change in workflow. For someone just doing about 3 hours of coding on average a week on a project as this, anything interrupting your workflow takes huge chunks of time out of those 3 hours. So I decided to better get on with it and learn right now, so I could identify blocking issues and see them solved as soon as possible. Where to start... I always find it best to pick 'small' identifiable chunks of work in these cases, that are somewhat isolated from the regular work flow. I picked the Math extension.

The Math extension is basically a LaTeX Math parser, that can output math entered in for instance Wikipedia and show it in an article using either HTML or rendered images. This is needed because the original web, really had no way to properly render math. It required too many symbols that were not part of any regular font, and the positioning of those symbols went far beyond what was easily possible with HTML. So the extension interprets the LaTeX code, and if possible uses HTML, but more often actually renders the text using a LaTeX renderer to a PNG image that it then includes in the webpage. There are many downsides to this approach, but for years it was the only way to do it remotely predictable. I'm also a bit familiar with the code as I had applied several patches to it in the past.

I love keeping an eye on stuff like this, the stuff that sort of limits the quality that Wikipedia can deliver in areas where it really does want to deliver. The same with mobile, video, music scores, text that has a different directionality then Left-to-right or text for which no fonts exist. Those tickets are often on my bugzilla watchlist.

Math using MathJax
For math, things have finally progressed after 8 years of status-quo. Over the past few years we have seen the maturing of MathML, the standard which is supposed to bring math to the web. We have also seen the coming of Webfont technology, and ever more advanced Javascript. All this has led to the creation of MathJax, a javascript library that uses what it knows about your browser, to generate readable math equations in HTML or MathML, and provides any reader with the fonts needed to properly represent this in your browser, regardless of how well your browser or OS supports math. It has several downsides as well. It is slow as hell and incredibly complicated, but the big plus is that it actually works without requiring images (unless you are on real ancient stuff like IE6).

For years math has been a hot debate within Wikimedia, with people desperately desiring higher quality and better reusability. With the advent and maturing of MathJax, that finally seems a possibility and MathJax was an oft requested feature by the Mathematics project on the English Wikipedia. A while ago, User:Nageh developed a MathJax user script to bring MathJax to Wikipedia. When Brion Vibber was looking into how to progress with Bug 24207, he was pointed at Nageh's script and decided to investigate if it was possible to start properly integrating MathJax into the Math extension to finally start to improve Math rendering in Wikipedia. He came back with some preliminary results pretty quickly.

Committing patches

Brion did the heavy lifting of making it MediaWiki ready, but as always with conversions such as these, there are always loads of 'small things' that need to get done before it can actually be deployed on Wikipedia. That made it a perfect area for me to test the new Git/Gerrit workflow by creating a few patches and submitting them to Gerrit.

My Gerrit dashboard with the changes in question.

My experience:

  • Gerrit is an awful interface. It's like going back to bugzilla if you are used to Jira. Much work will be needed there.
  • One of the downsides I found to Gerrit, is creating permanent links to users or lists of changes. I often want to look into a stream of a user's changes or look at a particular type of changes, and there is no way to do this other then to keep clicking in Gerrit until you find that list. Basically anything but permalinks to patch submissions and individual patches seems to be hard to get your hands on.
  • Git-review is taking away much of the Gerrit trouble, but still commands like: git fetch ssh:// refs/changes/71/3771/1 && git checkout FETCH_HEAD tend to be needed in my workflow and that's just unfriendly and difficult to work with.
  • Gerrit is great for individual patch review.
  • Gerrit is terrible at "huge rewrite review" at first look.
  • Gerrit is terrible for "many sequential patches building on each other" review AND workflow. I want to commit often and get small chunks reviewed if they are useful on it self, without squashing them in one huge patch that I then need to spend hours in review limbo over. It's however almost impossibly disruptive on the workflow as soon as something intermediate needs to be adapted half way one of those dependencies.
  • This will require much getting used to, especially for the newbies. We better set up a system to feed Gerrit with Git patch diffs created using git format-patch, it's much easier for the real n00bies.
  • We need to do more about the patch changes and how individuals have contributed to making the final patch. It's subpar in Gerrit in my opinion (most of the history is left in Gerrit instead of in the final merge).
  • I'm not sure yet if my workflow will be faster or slower, just yet, but I suspect slower. On the other hand, I think we will see more submissions for code that we as developers are not really sure about. I have a slew of changes that require further testing and review, but due to lack of setup or feedback from other developers have basically been in limbo since forever. I plan to just submit them and see what happens. They might rot in Gerrit review limbo, but they won't rot on my HDD anymore. Hopefully having them in review will force someone to pick up the pieces.

A verdict ?

Is it good, or is it bad ? I'm not sure yet. My current guess is that it's a boon for the quality of the sourcecode, and that it will probably speed up the pace of development overall, as well as the workflow of the more experienced developers. I fear however for those still learning. The workflow has become incredibly more convoluted, adding risk of people giving up half way. That's not guaranteed to happen of course and I have hope yet that with improving Gerrit and working on our workflow and tooling documentation that we can stave that off. I know SumanahRoan and several others have been working tirelessly on just that. But we are not done by a long shot on that front I fear.

I also fear that MediaWiki is gonna be more like Wikipedia. A gigantic set of rules you need to fulfill before your article change/patch makes the cut, driving up the requirements we put on our editors/committers and widening the gap between those just getting started and the vets. We need to make sure to stay incredibly welcoming and make sure volunteers feel that their changes are accepted as they are, instead of overly reworked and rewritten to make the community cut. There is no feeling so rewarding for those still learning, as seeing your change go straight into a major piece of software. We need to keep a close eye on that as a software development community.

Thank you for reading

That's my story of the switch that I had to go trough almost a week ago now. Find more experiences by other developers on the Wikitech mailing list. And a shout-out to the Signpost for once again doing proper summarizing of a complicated topic in last week's edition.

Saturday, September 17, 2011

2011 and the Y2K bug

It has almost been 12 years since we all had to worry about the Y2K bug right ? Well you'd think. Over the past few weeks I have been bothered by a problem with session management in one of the apps that I'm writing. I couldn't figure out why stuff was behaving so unexpectedly. At some point the hints became clearer and clearer that the dated cookies of the session were for some reason not being expired. The iOS URLConnection and the android http lib seemed to continue to send them along to the server after logging out. This was hard to confirm though, because both platforms hide the Cookie header from you when you make the request, the connection was https and I didn't have physical access to the server.

It made no sense however that iOS would have a fundamental Cookie management bug. So I build a small server and started testing cookie management on the iPhone. Everything looked just fine. Then I decided that I would copy the actual cookies the server was sending to the clients. I could get these values, because the Set-Cookie headers from the response (unlike the actual Cookie header in the requests) was visible. So I switch the values of my test server to the actual values from the server and suddenly I was able to reproduce the problem. The Set-Cookie that was supposed to expire the cookie seemed to turn the cookie into an undated cookie (so scoped to the session of the client instance).

I'm switching back to my old values and stuff starts working again. Again I copy the original server values. I select the text and suddenly I notice it.... Expires=Sat, 01-Jan-00 00:00:00 GMT;  No... that can't be it. Could it ? I switch my test server to issue the year 1970 instead. Poof, suddenly it works. So first of all, 12 years after 2000 there is still a server sending a broken date format. And two, it seems the Y2K parsing support in iOS is broken. Experimentation shows that iOS can only parse double digit years in cookies between 70 and 99. So any double digit year before 1970 (epoch) cannot be converted into an actual year. And what happens if the date cannot be parsed ? Then the date is removed from the cookie altogether, and your cookie becomes a session cookie :D

Monday, May 9, 2011

How IE6 is still causing headaches and bothering the rest of us

So you have this well known security issue called content sniffing in MS IE 6. No one really cares about that anymore right? Unfortunately, when you are a top 5 website, then you kinda have to care, since 3,46% of the readers of Wikipedia, so a whopping 13.88 million of the unique monthly visitors still use Microsoft Internet Explorer 6.

You try to fix this bug. Three times, causing three software releases (1.16.3, 1.16.4, 1.16.5) in 4 weeks. And then by accident, it becomes so strict that it breaks many of the requests for all Internet Explorer versions, simply because the url contains a dot. Sigh....

THIS is why you should help all your friends to get rid of IE6.

Thursday, February 17, 2011


We kept running into a kAMDReceiveMessageError in our company, when trying to install adHoc iPhone apps with the iPhone Configuration Utility for Windows. Everything was fine if people tried to install using Windows iTunes.

After much time it was tracked to the addition of UIRequiredDeviceCapabilities to the Info.plist. For some very strange reason it seems that the Windows ICU doesn't like that property at all and fails to install any app that carries it.

Not sure if this will affect AppStore submission, iTunes handles it and so does the Mac version of ICU it seems, but it is at the very least mildly annoying that testers can't install our application using Windows ICU.

Sunday, January 16, 2011

Dutch 2011 Hack-a-ton a great success

Wikipedia birthday cakes during the celebrations
 in Amsterdam (Derk-Jan Hartman, CC-BY-SA 3.0)
So here we are... One day after the 10 year anniversary of Wikipedia and I think that I'm not the only Wikipedian who will testify that it has been a great couple of days. Lots of online friends meeting in real life at one of the 450 or so events, lots of very nice press attention for our once so humble project and just all out fun.

Myself I participated in the first Dutch Hackathon. The day kicked of Friday 14th, at 10 in the morning in the offices of Kennisland in Amsterdam. Since I was working on friday, I joined in on the fun at around 18:30 during pizza-time. There were about 15 or so developers as well as a dozen or so Wikimedians and people from a Wikipedia editing workshop that took place during the day. They assisted in the brainstorming, provided feedback and were kind enough to drink beers with us :D

Several projects had been selected in advance and a great deal of work got done. A quick summary:

  • Husky and Krinkle created PhotoCommons, a plugin for WordPress that makes it easy to search and embed files from Wikimedia Commons into your WordPress website.
  • I myself built WikiSnaps, an iPhone application that allows you to upload photographs from your iPhone camera or image library directly to Wikimedia Commons.
  • Tag-cloud visualizations of the statistics we collect of the usage of the GLAM materials. Making the usage of these GLAM materials in Wikipedia visible to the institutions is very important. (links will follow at a later time)
  • Pywikipediabot and Europeana were added to by siebrand and RobertL.
  • Bryan built a pywikipediabot named fancy-uploader to facilitate uploading of large batches of files to Wikimedia Commons.
  • JanPaul123 presented and further improved his revolutionary Sentence-level editor (demo) for wikitext. This shows great promise to improve the editing experience for many Wikipedians.
  • Groundbreaking work has been done by Roan, Krinkle and Bryan on getting the licenses and attribution information of files into the database. Currently this information is only present in wikitext, which makes it difficult to reuse this information outside of Wikimedia. This will eventually greatly improve the reusability of the Wikimedia Commons materials. 
  • Functionality has been developed for Open Images that allows their videos to be directly imported to Wikimedia Commons with the click of a button.

The projects were presented at the Amsterdam Museum-event on Saturday and they were enthusiastically greeted by the crowd of some 150 people. We evaluated the event during the reception of the 10 year Wikipedia party and quickly concluded that this is definitely worth repeating. The efficiency of working and thinking together in a single room, with short and dedicated projects was clear to all of us.

I want to thank all the people of Wikimedia NL, Kennisland, Amsterdam Museum and Beelden voor de toekomst, who sponsored and helped organize the events. You were all terrific and really organized something special.