Using a Firefox extension to work around Selenium WebDriver’s limitations

My Google search link fix extension had a bunch of regressions lately and I realized that testing its impact on the search pages manually isn’t working — these pages are more complicated than it looks like, and there are lots of configuration options affecting them. So I decided looking into Selenium WebDriver in order to write integration tests that would automate Firefox. All in all, writing the tests is fairly simple once you get used to the rather arcane API. However, the functionality seems to be geared towards very old browsers (think IE6) and some features are nowhere to be found.

One issue: there is no way to focus an element without clicking it. Clicking isn’t always an option, since it might trigger a link for example. That issue turned out to be solved fairly easily:

driver.execute_script("arguments[0].focus()", element)

The ability to pass elements as parameters to WebDriver.execute_script is very useful, so it is surprising that it doesn’t seem to be documented properly anywhere.

But what about working with tabs or middle-clicking links? It seems that tabbed browsing wasn’t invented yet back when that API was designed, so it only has a concept of windows — not very useful. So WebDriver will only let you work with the currently selected tab, inactive tabs are off limits. And WebDriver.execute_script isn’t any help here either, it won’t let you run privileged code.

After briefly considering using send_keys functionality to open Web Console on about:config and typing code into it (yes, it looks like that would actually work), I decided to go with a less crazy solution: install an additional extension to implement the necessary functionality. So if a test wants the element to be middle-clicked it can trigger a custom event:

driver.execute_script('''
  var event = document.createEvent("Events");
  event.initEvent("testhelper_middleclick", true, false);
  arguments[0].dispatchEvent(event);
''', element)

And the extension listens to that event:

window.gBrowser.addEventListener("testhelper_middleclick", function(event)
{
  let utils = event.target.ownerDocument.defaultView
                   .QueryInterface(Ci.nsIInterfaceRequestor)
                   .getInterface(Ci.nsIDOMWindowUtils);
  let rect = event.target.getBoundingClientRect();
  utils.sendMouseEvent("mousedown", rect.left + 1, rect.top + 1, 1, 1, 0);
  utils.sendMouseEvent("mouseup", rect.left + 1, rect.top + 1, 1, 1, 0);
}, false, true);

This works nicely, but what if you want to get data back? For example, I want to know which URLs were requested at the top level — in particular, whether there was a redirect before the final URL. Selenium only allows you to get notified of URL changes that were initiated by Selenium itself (not very helpful) or poll driver.current_url (doesn’t work). The solution is to have the extension register a progress listener and write all URLs seen to the Browser Console:

window.gBrowser.addTabsProgressListener({
  onStateChange: function(browser, webProgress, request, flags, status)
  {
    if (!(flags & Ci.nsIWebProgressListener.STATE_IS_WINDOW))
      return;
    if (!(flags & Ci.nsIWebProgressListener.STATE_START) && !(flags & Ci.nsIWebProgressListener.STATE_REDIRECTING))
      return;
    if (request instanceof Ci.nsIChannel)
      Cu.reportError("[testhelper] Loading: " + request.URI.spec);
  }
});

You can use driver.get_log("browser") to retrieve the full list of console messages. Each message also has a timestamp which allows for example only extracting the URLs seen after the previous check.

Side-note: I first considered using MozMill for this. However, it is geared very much towards Firefox development and much of the Selenium functionality would have to be reimplemented (locating installed Firefox instance, default Firefox preferences for a test profile, dismissing alerts on web pages and so on).

Edit (2014-09-08): Another gotcha manifested itself only after a while — Firefox 32 has been released and the Selenium WebDriver extension is no longer compatible! Turns out, it relies on binary XPCOM components for some reason so the compatibility info has to be adjusted on each Firefox release. No WebDriver update so far, editing install.rdf manually in the webdriver.xpi file works however. These XPCOM components no longer work of course, but then again — they weren’t compatible with OS X in the first place, no idea what they are good for.

Comments

  • Andreas Tolfsen

    It seems that tabbed browsing wasn’t invented yet back when that API was designed, so it only has a concept of windows — not very useful. So WebDriver will only let you work with the currently selected tab, inactive tabs are off limits.

    WebDriver’s definition of a window isn’t analogous to the graphical browser window; think of it more as a DOM window. This means that it doesn’t distinguish between the browser’s tabs or (UI) windows.

    To switch to a tab you call the switch_to_window function with that tab’s DOM window’s name (window.name) or the internal window handle which WebDriver has assigned to it. You can list all windows using the get_window_handles command.

    The specification also has some prose around this: https://dvcs.w3.org/hg/webdriver/raw-file/tip/webdriver-spec.html#controlling-windows

    With regards to middle clicking you’ll be happy to hear that we’re in the process of specifying a new low-level actions API that will not only make it possible to click any mouse button, but which will also support any type of device or input mechanism (touch, stylus, controller, &c.).

    Wladimir Palant

    Yes, I’ve seen the advise to close “windows” in order to close tabs. Not sure about other implementations but Selenium’s Firefox WebDriver doesn’t work this way. It will always return a single window regardless of how many tabs it contains. I’ve actually checked the source code of their Firefox extension to verify that it simply enumerates all windows of type navigator:browser, it doesn’t look at the tabs at all.

  • Gijs

    I remember having to deal with Selenium at a previous job. It was pretty terrible. I think my favourite issue at the time was how they pretended to support xpath, but then didn’t use the browser’s implementation and only allowed some return values, so if you wanted to look at text inside elements or know the number of matches for a particular query… tough! They were sad times – hopefully at least those bits are better now?

    Wladimir Palant

    So far I only located elements by CSS selector and that definitely works via element.querySelector() (verified it in the source code). The XPath locator is more convoluted, the Firefox extension still contains a custom XPath processor. However, my understanding is that it will only be used if there is no predefined document.evaluate() method – meaning never. As to result type, only elements can be returned, this doesn’t seem to have changed. I guess that one can use driver.execute_script() and call document.evaluate() yourself to get other result types.