Issue Details (XML | Word | Printable)

Key: MBS-5709
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Normal Normal
Assignee: Robert Kaye
Reporter: Wieland Hoffmann
Votes: 3
Watchers: 3
Operations

If you were logged in you would be able to see more operations.
MusicBrainz Server

Inclusion of Google Analytics is in violation of the privacy policy

Created: 23/Dec/12 12:12 AM   Updated: 07/Jun/13 02:16 PM   Resolved: 07/Jun/13 02:16 PM
Component/s: Templates
Affects Version/s: None
Fix Version/s: None


 Description  « Hide

http://musicbrainz.org/doc/About/Privacy_Policy says that "The only third-party content loaded by MusicBrainz web pages are the album cover art images [...]" which is not true since the google analytics js is pulled from Googles servers and loads a file called "__utm.gif" from Google on every page load.



Sort Order: Ascending order - Click to sort in descending order
Robert Kaye added a comment - 28/Dec/12 12:15 AM

I would like to hear suggestions on how to fix this. Two obvious ones:

1. Change the privacy policy to allow analytics
2. Stop using analytics

While our use of analytics is under-used right now, I do think it has value. Thoughts?


Ian McEwen added a comment - 28/Dec/12 01:33 AM

I'm not sure what the correct solution is. However, third option:

3. Run our own analytics software (e.g. Piwik – http://piwik.org/ – which I use personally)


nikki added a comment - 28/Dec/12 06:18 AM

I blocked Google Analytics in /etc/hosts a long time ago, so I would vote for #2 or #3.

Note that we also load data from Acoustid on the fingerprints tab and reCAPTCHA on the sign up screen.


PATATE12 added a comment - 30/Dec/12 10:31 PM

Same as nikki.


Wieland Hoffmann added a comment - 04/Jan/13 11:10 AM

Not knowing the value gained by Google Analytics for MusicBrainz (it doesn't seem much), I vote for either #2 or #3 (or a combination of both

Regarding reCAPTCHA: Protecting the sign up screen is a good thing to do in light of the recent work-creating spammers. I also doubt Google gains any real tracking value from the sign up screen except "oh, this guy was thinking about creating a MusicBrainz account once".

Regarding Acoustid: The acoustid server is open source, so we can tell people to just look at that


Ian McEwen added a comment - 04/Jan/13 08:48 PM

I think our primary decision is whether we care about having analytics at all. I think the consensus is then #3 if we do, #2 if we don't. Personally, I've never run a site where analytics was hugely useful, just personal sites.

I wonder if Rob (who mentioned above he sees value) and Pavan (who IIRC added the analytics stuff we have now) could chime in on what they think we can get from analytics; what I can see, maybe we care about search engine keywords and referrals, and it's good to know what browsers we're seeing (though we can, of course, get that from logs). Tools like campaigns and goals are less relevant to us, though we could use campaigns to do things like track who follows links from any sort of post at a third-party site where we can suggest what URL to use (e.g. something via one of our board members), or via our twitter/blog, etc. Piwik does also include site-search tracking, so we could start looking at what people search for using our search box. Finally, we might have some set of custom variables that we care about, which some providers support. Some of these things might be covered by last summer's Splunk work, assuming that ever gets shipped and starts being used.

re: reCaptcha/AcoustID: what Wieland said, with a note that we probably should update the privacy policy to include them. If we can word it, we might include language that allows for the future inclusion of reasonable additions like these – for reCaptcha, something about site security perhaps; less sure about things like AcoustID, given that something like "improves the site experience" can apply to just about anything


Frederik "Freso" S. Olesen added a comment - 05/Jan/13 03:55 PM - edited

Or we can add a note (if there isn't one already) that the policy is subject to change at any moment without warning and just remember to change it when we "need" to add more 3rd party services?

Edit: Oh, we also pull in Gravatar images btw. IIRC, one can opt-out of using ones own Gravatar, but you can't opt-out of fetching other editors' Gravatar icons.


Robert Kaye added a comment - 22/Jan/13 09:33 PM

Navap: Can you please chime in on the analytics portion? How much value are we really deriving from this?


Pavan Chander added a comment - 23/Jan/13 04:40 AM

I'm opposed to removing analytics, but unfortunately I don't have a very compelling reason to keep it other than "it's cool to look at" (eg. real time stats) and "we might have something we want to look up in the future".

I do think it's important to have some form of analytics running, whether we're using google or something self hosted because then we can look things up whenever the need may arise. If we don't have anything running then we have no data.

An example of something that analytics can provide is confirmation that English, German, and French are the most prevalent reported languages of our visitors. Also, after the US, Germany, and the UK (yes, in that order), the next top country our visitors come from is the Philippines.

A few other random stats:

  • OS: 65% use Windows, followed by Mac (16%) and Linux (14%)
  • Browser: 38% use Chrome, 33% Firefox, 12% IE, 11% Safari, 4% Opera
  • Mobile: 55% use Apple, 40% use Android
  • Our average visit duration is 9:48
  • The average visit duration from the Philippines is.....wait for it.....45:20!!! There is something seriously fishy about the Philippines
  • Our average number of page views per visit is 10 (it's 45 in the Philippines)
  • A few other regions with high page views and/or high visit duration are: Pakistan, Kenya, Singapore (hi Voice!), Estonia (hi Reo!), Bangladesh, India, Bahrain
  • Google reports exactly 1 user from São Tomé and Príncipe since we started keeping track. That one user spent 22 min and viewed 6 pages

Are we actually using Google Analytics right now for anything, I don't think so. Is any of its data absolutely necessary, probably not.

But is it interesting? I think so.


Kuno Woudt added a comment - 25/Jan/13 02:17 PM

I would vote for removing third party analytics. Any third party analytics solution used by many sites obviously allows that third party to track visitors, so they seem like a bad idea in general. It is also illegal in europe unless you ask permission of the users before setting or allowing a third party to install a tracking cookie on the user's computer.

The analytics have limited value for MusicBrainz, we rarely look at them.

The best solution (IMO) would be we just log the bits of information we're interested in ourselves. The only thing I care about as a developer is:

  1. browser + version (so IE10, Firefox 17, etc..)
  2. window size / screen resolution (1024x768)

(browser + version can be had from nginx logs if we log the user agent string, but screen or window size has to be done through javascript).

Alternatively we could switch to piwik, but that means we have another thing to host, and piwik still tracks a bunch of things about our users which we do not need to know (and which are quite probably illegal in europe without the appropriate opt-in mechanism).


Robert Kaye added a comment - 12/Mar/13 08:49 PM

I'm working on this bug here:

http://wiki.musicbrainz.org/User:RobertKaye/Revised_Privacy_Policy#Third-party_content

The third party content section has now been largely rewritten.


Robert Kaye added a comment - 28/Mar/13 07:15 PM

Robert Kaye added a comment - 07/Jun/13 02:16 PM

Closed and transcluded live on the site.