Our team is really happy that, after a long gestation, we've managed to roll out a new mobile version of our news site, https://i.stuff.co.nz. In doing this, we had to solve a number of problems, many of which we knew we would face, including transitioning to HTTPS (our homepage is served over HTTPS, though our article pages aren't as yet), integration of third-party content, and meeting the requirements of the various stakeholders (our editorial team, our users, and our advertisers).
However, there were couple of issues that appeared late in the piece in Safari on iOS devices that we didn't expect, and which caused us significant amounts of concern and frustration.
In this post we'll cover the iOS pipelining bug, and we'll leave our experiences with Apple's application of the
upgrade-insecure-requests directive in Content Security Policy for a later post.
Apple's iOS pipelining bug
It turns out there's been an iOS pipelining bug since at least iOS 5, which manifests itself in images being occasionally and (randomly swapped) on the page, and which has been a source of frustration for other developers.
This problem results in two or more images changing places with each other, and is made more insidious in that, when inspecting the rendered HTML, the URL is correct, though the image that appears is not the image that the URL actually resolves to — the images get swapped, but the URLs do not.
Our team had heard of this issue previously, but we'd assumed that, since it was originally noticed in iOS 5 (i.e. five iOS generations ago), that it had been solved by now — however, this turned out to be not so.
In the testing we did internally before launching our new site in controlled beta to 10% of our users, we hadn't seen (or possibly hadn't noticed) this problem, and we launched our new site into beta blissfully unaware that we'd soon encounter this issue.
Within a day of the beta launch, we started to get a small handful of users reporting the images swapping on our home page, and as a result of further investigation we discovered that the iOS pipelining bug was still current, and a long-running thread on Apple's discussion forum contained accounts of people still struggling with this issue.
The image-swapping was a worry to our editorial team, not just because of the possibility that the wrong image would appear against the wrong headline and be aesthetically wrong, but also as we're a news organisation we have a higher obligation to report stories accurately (in both a journalistic but also legal sense) than do other non-news sites.
To compound things, Apple released iOS 10 at about the same time as we launched our beta, and we weren't sure if the problem we were seeing was an existing problem in previous versions of iOS, or if it had just appeared / resurfaced in iOS 10.
The most frustrating kind of bug
If you talk to software developers about bugs, you'll probably find a consensus that the most maddening type of bug is one that is intermittent, and only occasionally appears — we spent a couple of days reloading pages in iOS many hundreds of times, only to find the problem occurring maybe around half a dozen times over the two days.
What we did notice however, is that when one of us found the problem, another person on the team would often also see the problem, and it therefore seemed to have something to do with network conditions at the time.
Our homepage is larger than we'd ideally want in terms of delivered size, including over 200 requests, of which around 100 are images (including data URIs). Of these, around 45 are actual editorial content, with the remainder being made up of advertising images, tracking pixels, icons and the like.
So, given the relatively heavy page load (in terms of number of requests, and number of images), we suspect that we're providing the conditions for the iOS pipelining bug to manifest itself. It's possible that Apple hasn't solved this bug previously because not enough sites had encountered it; however our site, though rather large in terms of the number and overall size of requests, is not so atypical when compared to other large news media sites:
total requests: 164
image requests: 76
total requests: 130
image requests: 40
total requests: 151
image requests: 69
The search for a solution
Given that our new mobile site incorporated a number of different techniques that hadn't been employed in the previous version, there are quite a few variables that could possibly have some bearing on making the pipelining bug / image swapping manifest itself, and so we had a few things to try to mitigate / eliminate the bug:
Content Security Policy
Content Security Policy is a mechanism that allows a site to instruct the browser via a HTTP header as to what type of security policy to apply to various parts of the page.
There are many directives that can be used to make up the CSP header, as detailed here, and our suspicion was that some of the directives we employed in our CSP header might have been causing problems for Apple.
It turns out that we were actually correct in our assumption, though the problem caused by CSP was not to do with image-swapping — more on this in a later post.
As it appeared that the problem was with images being shoved down the same TCP connection, one simple approach to mitigate this would be to shard our images onto different sub-domains, meaning that a given TCP connection would have fewer images pipelined down it.
The problem with this approach of course is that sharding the images across more sub-domains means creating more TCP connections, which in turn will result in a slower load of the page as we'll hit the limit of simultaneous TCP connections that a browser can support more often (remember, we're downloading around 200 resources, so the more re-use we get out of TCP connections the better, if we're aiming to have a reasonably fast-loading page).
Until recently, domain-sharding would have been the likeliest solution, and there's an apocryphal story that another part of our organisation previously considered sharding images across up to ten subdomains in order to defeat this pipelining bug.
One suggestion that was made on Apple's discussion forum was to disable
Keep-alive, meaning that a given TCP connection could only be used to deliver a single asset. Although this would work, we'd end up with many more TCP connections needed to be open (i.e. around 200 in total), which would be a worse solution than domain-sharding our images in that our page load would be even slower, so this wasn't a viable solution.
One theory that we formed was, given that the iOS pipelining bug had been around for quite a while and probably involved operations performed at a relatively-low level of the network stack, it's likely that it was a problem baked into Apple's use of HTTP 1.1.
The newer HTTP 2 protocol offers many benefits over HTTP 1.1, and possibly if we used HTTP 2 to serve images rather than HTTP 1.1, we could find that we'd nicely side-step the bug, as HTTP 2 uses a different network implementation at quite a low level of the network stack.
Thankfully, our hunch (and timing, given that HTTP 2 is now ready for adoption) seemed to be correct, and by serving our images from a single subdomain using HTTP 2, we found that we no longer see the problem in iOS.
HTTP 2 does mean that the page has to be served via HTTPS, but thankfully this is something we'd already decided to do for our home page and section pages; our article pages are still being served via HTTP, but because there's far fewer images being served on an article page, we're unlikely to encounter the image pipelining bug there.
Moving to HTTPS is sensible as it provides better security, and allows use of HTTP 2. However, for news organisations such as ours, it can also be problematic as we rely on third party providers for certain page content such as advertising and analytics, and these third parties may not yet be able to serve their content appropriately over HTTPS — if our page is served over HTTPS, and a third-party provider is loading content onto our page using HTTP, we may end up with mixed-content warnings, meaning that the browser shows that our page is not secure.
Wired.com recently wrote a good post about their long journey towards using HTTPS on their site, and covered some of the pain that they encountered with getting content from third party providers to work properly on HTTPS.
Interestingly, after digging a bit deeper into the practices of other news sites, we notice that some of these — such as the New York Times and The Washington Post — are also serving their images (and in some cases, such as the New York Times, only their images) over HTTP 2, so it's possible that their teams have similarly come across this solution to the same problem.
As you'll gather from the above, we're glad we could use HTTP 2 to solve the iOS pipelining bug.
We'll be looking to make more use of HTTP 2 in future, though it's going to take a little while to ensure that the entirety of our site can be served over HTTPS.
Although we're suitably proud of getting our new mobile site out, the real motivation for writing this post is to help anyone in future struggling with the image-swapping issue. As mentioned above, we are now finding sites out there serving their images over HTTP2, presumably to solve this issue. When we encountered this issue, we would have been very happy to have found someone who had a similar experience, and had found a solution that worked.