Can Mozilla be trusted with privacy?

2014-12-19 mozilla 2 mins 16 comments

A year ago I would have certainly answered the question in the title with “yes.” After all, who else if not Mozilla? Mozilla has been living the privacy principles which we took for the Adblock Plus project and called our own. “Limited data” is particularly something that is very hard to implement and defend against the argument of making informed decisions.

But maybe I’ve simply been a Mozilla contributor way too long and don’t see the obvious signs any more. My colleague Felix Dahlke brought my attention to the fact that Mozilla is using Google Analytics and Optimizely (trusted third parties?) on most of their web properties. I cannot really find a good argument why Mozilla couldn’t process this data in-house, insufficient resources certainly isn’t it.

And then there is Firefox Health Report and Telemetry. Maybe I should have been following the discussions, but I simply accepted the prompt when Firefox asked me — my assumption was that it’s anonymous data collection and cannot be used to track behavior of individual users. The more surprised I was to read this blog post explaining how useful unique client IDs are to analyze data. Mind you, not the slightest sign of concern about the privacy invasion here.

Maybe somebody else actually cared? I opened the bug but the only statement on privacy is far from being conclusive — yes, you can opt out and the user ID will be removed then. However, if you don’t opt out (e.g. because you trust Mozilla) then you will continue sending data that can be connected to a single user (and ultimately you). And then there is this old discussion about the privacy aspects of Firefox Health Reporting, a long and fruitless one it seems.

Am I missing something? Should I be disabling all feedback functionality in Firefox and recommend that everybody else do the same?

Side-note: Am I the only one who is annoyed by the many Mozilla bugs lately which are created without a description and provide zero context information? Are there so many decisions being made behind closed doors or are people simply too lazy to add a link?

Comments

an User 2014-12-19 17:54

Short answer:

No

Long answer:

No, you should never trust a corporation that make millions every year, management can change, need for more money arises, etc. Not saying that Mozilla is doing bad thigs right now, but it could change, and at that moment you already gave them too much.

Wladimir Palant

Mozilla isn’t just some corporation – if it were I wouldn’t ask that question. Despite some issues, it is still an open project with people that are passionate about Mozilla’s goals (money helps fund the project but it isn’t one of these goals). These people spent a month discussing this issue at length, I linked to the discussion. I’m just surprised that apparently nothing better came out of it.
Anonymous Coward 2014-12-19 18:21

And let’s not forget the adverts on about:newtab, which now pings Mozilla every single time it gets displayed.

(I can appreciate the desire to collect impression data, but doing so immediately every single time – thus continuously revealing your location and the exact times you opened new tabs – seems a bit excessive when you could simply aggregate the data locally and send it back daily.)
muf nufie 2014-12-19 18:26

Interesting points. I ask myself similar questions. It would be nice to have an answer.
Vladan 2014-12-19 23:40

Summary of my reply:

- Telemetry does not collect privacy-sensitive data
- You do not have to trust us, you can verify what data Telemetry is collecting in Firefox’s about:telemetry page and in aggregate form in our Telemetry dashboards
- Telemetry is an opt-in feature on release channels and a feature you can easily disable on other channels
- The new Telemetry clientID does not track users, it tracks Telemetry performance & feature-usage metrics across sessions

Hi Wladimir,

My name is Vladan, I am a member of the Firefox Performance team that works on Telemetry. I wanted to clear up a few points about Telemetry.

1) First off, Telemetry does NOT collect privacy-sensitive information about your browsing. And it NEVER will.

Telemetry does not collect URLs, nor IPs, nor search queries, nor locations, nor the websites you visited. We do NOT want to collect any privacy-sensitive data.

Examples of data we do collect:

- Performance data: how long it took your Firefox to start up, the animation smoothness of the tab-open animation
- Feature usage: whether you have enabled Electrolysis or not (Electrolysis is multi-process firefox)
- Hardware configuration: the number of cores your CPU has, the GPU your computer has, etc

This information helps my team understand how well Firefox is performing on different systems and which performance fixes to prioritize. It also helps other teams understand how well their features are working, and which Firefox features are popular with our users and which are not.

2) You do not have to trust us blindly with your Telemetry data! We show you all the measurements we are collecting.

If you have enabled Telemetry, just go to about:telemetry and you’ll see all the metrics we are reporting.
If you want to see that data in aggregate form for the entire Firefox Telemetry population, we built public dashboards and put the aggregated data on http://telemetry.mozilla.org/

3) Telemetry is an opt-in feature. Telemetry reporting is only on by default if you’ve installed a pre-release (e.g. beta) version of Firefox.

We understand many users are not comfortable sending performance and feature-usage data to Mozilla, and that’s fine. So we leave it disabled by default on the versions of Firefox used by the general public.

Telemetry is enabled by default on our Beta, Aurora and Nightly channels because these are pre-release builds and these users have gone out of their way to install a pre-release version of Firefox to get the latest features and help us test it out. These users are also generally more tech-savvy.

4) Everyone at Mozilla cares about user privacy.

You can see Mozilla’s stance on privacy here: http://blog.mozilla.org/privacy/2014/11/11/mozillas-data-privacy-principles-revisited/

I don’t think it’s fair to say Roberto was showing a lack of concern for user privacy in his blog post. He was excited about a new way to remove biases from Telemetry data so he wrote a quick blog post about it for other developers & statisticians.

—-

With regards to the UUID we added to Telemetry in bug 1064333:

- This is a randomly-generated UUID tied to a Firefox profile
- You will always be anonymous. The UUID does NOT correspond to an individual or even a computer, it only tracks the Firefox installation. I have over 30 different Telemetry UUIDs on my work laptop because I use different Firefox profiles for testing, and many more on my other computers. These UUIDs are not linked in any way
- The UUID is not used for tracking your behavior! The ID is associated with FHR & Telemetry metrics. We simply do not collect data about your browsing behavior because it would be a privacy violation.
- The client ID is used to correct reporting biases in Telemetry data. As Roberto explained, if we only know about Firefox sessions and not Firefox installations, then Telemetry data from short/frequent sessions is over-represented compared to Telemetry data from very long sessions

In summary:

- Telemetry does not collect privacy-sensitive data
- You do not have to trust us, you can verify what data Telemetry is collecting in the about:telemetry page in your browser and in aggregate form in our Telemetry dashboards
- Telemetry is an opt-in feature on release channels and a feature you can easily disable on other channels
- The new Telemetry clientID does not track users, it tracks Telemetry performance & feature-usage metrics across sessions

Wladimir Palant

Vladan, thank you for the detailed answer. It seems however that I didn’t bring my point across.

I didn’t even look at the data being sent there – it doesn’t matter. What matters is a unique identifier associated with my browser profile and regularly being sent to Mozilla. The IP addresses of these requests are enough to get a complete movement profile for my laptop (a scenario Ben Bucksch already brought up in the discussion from 2012). And while you might argue that this profile identifier cannot be associated with me as a person – I don’t think that’s true. If you have enough data samples belonging to the same user, there are always ways to tell who that user is – and be it because I logged in on AMO from the same IP address. So now it’s Google, NSA and Mozilla who can track me.

It’s not that I don’t trust Mozilla, I still assume that people working there wouldn’t allow anything like that (assuming that the servers are never compromised) – but I’d very much prefer that Mozilla wouldn’t even put itself into a position to do that. Having been on the other side of this discussion, I know that there usually is another way – it’s merely harder to come up with. E.g. removing bias can also be done via a “time since last report” field (I think the crash reporter actually does it this way).

Wladimir Palant

Forgot replying to one of your points: I am aware that Telemetry is opt-in for stable releases, yet FHR isn’t. And my understanding is that FHR is using the same UUID. While I certainly have lots of different profiles on my computer as well, there is still one profile that I use for my main browsing and which is online most of the time (typical setup I assume) – that’s the profile I’m concerned about.

Note that I didn’t really blame Roberto for anything. It is pretty obvious that he wasn’t involved in the development of the feature. It was merely my starting point looking for somebody who actually cared.
Vladan 2014-12-20 00:17

Thank you for your response, Wladimir. There are two points I’d like to respond to:

1) Telemetry does not collect the user’s IP address. We don’t report it in the Telemetry packet and we don’t store it on the Telemetry server side.
2) The “time since last report” alternative you suggested would not allow for the same kinds of analyses. For example, it wouldn’t be able to tell us if a performance issue happens at random or if it affects only a subset of users who share some unusual hardware/OS configuration.

Wladimir Palant

1) Sure, telemetry doesn’t collect the IP address directly. However, getting the IP address isn’t really avoidable as long as the IP protocol is being used for communication. Of course, one could make sure that the IP address isn’t being stored in web server logs, but AFAIK Mozilla doesn’t have any such policy.

2) Sorry but I don’t really get that. Hardware/OS configuration is part of each data packet – you don’t need to reconstruct all submissions for a user to see correlations (again, crash reporter does exactly that). And the “time since last report” field allows estimating how common that particular configuration really is.
Vladan 2014-12-20 02:02

With respect to your point #2: Telemetry actually doesn’t collect a lot of hardware & OS information, and certainly not all the potentially relevant configuration information — there are simply too many potentially relevant configuration options & system characteristics on a desktop system.
Mook 2014-12-20 03:07

I still trust the people currently at Mozilla, but I can no longer blindly assume their policies are written such that I can trust it by default. Was trying to move to new sync a while back, and reading their policy realized that they have nothing written down about not being able to read your synced passwords. Currently they can’t do so because of an implementation detail, but not due to any sort of policy :(
kats 2014-12-20 04:39

@Vladan: I’m with Wladimir here. There’s a difference between what Mozilla does and what Mozilla can do with any given piece of data. You’re arguing that Mozilla doesn’t do evil, therefore it can be trusted. Wladimir is arguing that Mozilla can potentially be doing evil, therefore it should not be trusted.

Imagine for example an NSA spy infiltrated Mozilla and had root access to all of the datacenters – what could this malicious actor do with all this data? Certainly he could start logging IP addresses of incoming telemetry pings and correlate them with this clientID. And that means the user has to extend their trust not only to Mozilla’s software (easy enough to build yourself), but to the integrity of their corporate network as well (impossible to verify). It is this extension in trust that is objectionable.
Michael Kelly 2014-12-20 09:23

Going to cherry pick a statement that wasn’t necessarily the main thesis because I hear it a lot and disagree:

“…Mozilla is using Google Analytics and Optimizely (trusted third parties?) on most of their web properties. I cannot really find a good argument why Mozilla couldn’t process this data in-house, insufficient resources certainly isn’t it.”

We have contracts with Google and Optimizely that outline how we expect them to use the data they get access to via our use of them. The Google one in particular has a special exception that the data we send them can’t be used by any other accounts except ours (normally, there’s some sort’ve magic around data across all of GA being sort’ve shared, maybe for ads or something, that we considered a hard blocker for our use of GA). Not only is it cost-effective for us to use GA over running our own solution (like Piwik or something), but it gives us some leverage to make Google Analytics better for everyone (we were able to get Google to add an opt-out for certain data sharing that they didn’t have previously, and made it available to all GA users).

Improving user privacy on the internet is an important part of our mission, but as with most things, we need to find a balance between extremes to forward the mission. In this case, the people responsible decided that we could do more total good by relying on these services than we could spending resources to process the data in-house. Personally I’m not convinced that not using GA would actually improve user privacy on the internet at all. People would still use GA everywhere and we’d just be spending resources to make ourselves feel good about how pure we are. At least this way we can convince Google to change the service in a positive way, and we can also use the resource saved by relying on them to forward the mission in other ways.
flod 2014-12-20 09:53

Google Analytics: here’s the initial discussion on the adoption, subject comes out cyclically
https://groups.google.com/forum/#!msg/mozilla.governance/9IQvIubDOXU/0tWVVlrUJOQJ

Does it make sense to invest resources in building and maintaining your own internal tool for feature X when there’s an acceptable alternative on the market that complies with Mozilla’s privacy principles? IMO it doesn’t. Note also that it’s not a standard Google Analytics account.
Ben Basson 2014-12-20 12:59

I think Mozilla as an organisation has good intentions and really believes in their privacy commitments, this is what I’ve seen from them consistently over the years and I don’t think much, if anything, has changed in this regard.

What I do have concerns about is their operational competency to maintain the same standards going forward, especially as they’ve got larger in recent years, especially with the more rapid release schedule, and especially as more cloud features emerge.

When all Firefox did in terms of phoning home was update itself, my add-ons and update the “bad websites” list, it was obviously credible to say that your privacy is being respected and protected. Now there’s telemetry (“how I use the browser”), the health report, open tabs and bookmark synchronisation – a lot more is going on, and a lot more is being stored – somewhere, for who knows how long and with what identifying information. This is all great for functionality, but to pretend there is no risk would be crazy.

It’s easy to have a policy about not being evil, and another thing entirely to ensure that all parts of the organisation are not only on board with that policy, but actually carrying it out and protecting the data they’re charged with.

Personally, I chose to opt-out of all of these features apart from reporting crash data.
Jan 2014-12-20 13:46

Thanks for speaking out on this!

1) I wasn’t aware of the ClientID in my telemetry, and firmly believed the collected data was aggregate from the beginning, not just in the final public Telemetry Dashboard. I now know that Mozilla has more data on me than I previously thought. I still trust Mozilla, but this came as a slight shock.

2) This resonates with another privacy concern I have with Mozilla, namely the Firefox Account. I feel that Mozilla’s narrative changed from “Never entrust anyone with your data, why would you do this?” to “Well, you can entrust us with a little data because we’re the good guys!”. I can see the huge benefits of Accounts for the future: easily syncing tabs, bookmarks and history across all my devices, taking Firefox from browser to “universal user agent”, but this was nonetheless a culture change that sounded awkward and was heralded with some dishonesty.
Patrick Cloke 2014-12-20 15:20

Interesting post! I had no idea this level of detail was collected and although I knew Telemetry was opt-out, it would be nice to explain to users (even of beta/aurora/nightly builds) exactly what is collected. Or maybe it’s just been so long since I’ve created a profile…

I’m really responding to re-iterate your “Side-note”: I find this extremely frustrating and would really like to find a way to stop this. I’m unsure of the root cause: is it lazy people? Information Mozilla feels they “can’t” release? New employees not understanding that anonymous people want to be able to read Bugzilla and find context in work that’s occurring?
Merike 2014-12-20 18:37

Regarding your side-note, I have also found that annoying a few times when looking at a bug consisting only of “implement x” and then reviews without any reasoning or links to more context as to why x is needed or why x has the property of y in the implementation. I’m surprised I haven’t seen it brought up on newsgroups yet.
Arpit Kumar 2014-12-20 18:55

@Vladan: Thanks for your detailed answers, agree with you.
Yoric 2014-12-20 21:58

I understand your concern, Wladimir. Thanks for voicing it, because I believe that this deserves a discussion.

Mozilla is now sitting on more data than it used to. We have a number of internal processes and policies to make sure that the data itself infringe on user privacy (e.g. the privacy review mechanism), but you are right that, if a high-profile third-party (e.g. a government agency, or something nastier) manages to get their hands on the data and cross-reference it with other sources, there is a risk of privacy leak.

This data is critical to Mozilla – without this data, we have considerable difficulties finding out / reproducing problems that impact our users before they become critical. Therefore, I suspect that we are going to proceed with our current policies, or variants thereof.

However, there is a reason for which Telemetry requires opt-in (on Firefox Release) or allows opt-out (on Beta, Aurora, Nighty), it’s because we want people to be able to say « no » to this risk. So, if anyone feels that you do not want to share this data with Mozilla, by all means, please go ahead and opt-out/don’t opt-in. Also, if anyone has ideas on how we can further clarify the risk or decrease it, please share these ideas. You will be helping us improve Firefox / Mozilla.

By the way, Wladimir, you remark that FHR doesn’t follow Telemetry’s preferences/policies. If my memory serves, we expect to fix this (among many FHR improvements) by Firefox 39.

Can Mozilla be trusted with privacy?

See Also:

Comments

Leave a comment