Is undetectable ad blocking possible?

This announcement by the Princeton University is making its rounds in the media right now. What the media seems to be most interested in is their promise of ad blocking that websites cannot possibly detect, because the website can only access a fake copy of the page structures where all ads appear to be visible. The browser on the other hand would work with the real page structures where ads are hidden. This isn’t something the Princeton researchers implemented yet, but they could have, right?

First of all, please note how I am saying “hidden” rather than “blocked” here — in order to fake the presence of ads on the page you have to allow the ads to download. This means that this approach won’t protect you against any privacy or security threats. But it might potentially protect your eyes and your brain without letting the websites detect ad blocker usage.

Can we know whether this approach is doable in practice? Is a blue pill for the website really possible? The Princeton researchers don’t seem to be aware of it but it has been tried before, probably on a number of occasions even. One such occasion was the history leak via the :visited CSS pseudo-class — this pseudo-class is normally used to make links the user visited before look differently from the ones they didn’t. The problem was, websites could detect such different-looking links and know which websites the user visited — there were proof-of-concept websites automatically querying a large number of links in order to extract user’s browsing history.

One of the proposals back then was having getComputedStyle() JavaScript API return wrong values to the website, so that visited and unvisited links wouldn’t be distinguishable. And if you look into the discussion in the Firefox bug, even implementing this part turned out very complicated. But it doesn’t stop here, same kind of information would leak via a large number of other APIs. In fact, it has been demonstrated that this kind of attack could be performed without any JavaScript at all, by making visited links produce a server request and evaluating these requests on the server side.

Hiding all these side-effects was deemed impossible from the very start, and the discussion instead focused on the minimal set of functionality to remove in order to prevent this kind of attack. There was a proposal allowing only same-origin links to be marked as visited. However, the final solution was to limit the CSS properties allowed in a :visited psedo-class to those merely changing colors and nothing else. Also, the conclusion was that APIs like canvas.drawWindow() which allowed websites to inspect the display of the page directly would always have to stay off limits for web content. The whole process from recognizing an issue to the fix being rolled out took 8 (eight!) years. And mind you, this was an issue being addressed at the source — directly in the browser core, not from an extension.

Given this historical experience, it is naive to assume that an extension could present a fake page structure to a website without being detectable due to obvious inconsistencies. If at all, such a solution would have to be implemented deep in the browser core. I don’t think that anybody would be willing to limit functionality of the web platform for this scenario, but the solution search above was also constrained by performance considerations. If performance implications are ignored a blue pill for websites becomes doable. In fact, a fake page structure isn’t necessary and only makes things more complicated. What would be really needed is a separate layout calculation.

Here is how it would work:

  • Some built-in ad hiding mechanism would be able to mark page elements as “not for display.”
  • When displaying the page, the browser would treat such page elements as if they had a “visibility:hidden” style applied — all requests and behaviors triggered by such page elements should still happen but they shouldn’t display.
  • Whenever the page uses APIs that require access to positions (offsetTop, getBoundingClientRect etc), the browser uses a second page layout where the “not for display” flag is ignored. JavaScript APIs then produce their results based on that layout rather than the real one.
  • That second layout is necessarily calculated at the same time as the “real” one, because calculating it on demand would lead to delays that the website could detect. E.g. if the page is already visible, yet the first offsetTop access takes unusually long the website can guess that the browser just calculated a fake layout for it.

Altogether this means that the cost of the layout calculation will be doubled for every page, both in terms of CPU cycles and memory  — only because at some point the web page might try to detect ad blocking. Add to this significant complexity of the solution and considerable maintenance cost (the approach might have to be adjusted as new APIs are being added to the web platform). So I would be very surprised if any browser vendor would be interested in implementing it. And let’s not forget that all this is only about ad hiding.

And that’s where we are with undetectable ad blocking: possible in theory but completely impractical.

Comments

  • Wilfried Surreyns

    … and then we didn’t even talk about the time it takes to snatch all the garbage. If I now visit some quite known sites they load in a snap, if I disable my ad-blocker, the garbage downloading make the browser stall, if not forever, than at least for multiple minutes.

    The web without an ad-blocker is one giant cesspool these days, kinda like the halftime show at the superbowl, there’s a lot of rumor and paper filling around it the actual content leaves a very bad taste in your mouth.