After a reasonably long conversation with a developer who spends a lot of his time taking clients through the intricacies of Google Analytics and how it can be used to improve their sites tracking and workflow, it was very interesting to hear him say that he believes Google Analytics is under-representing http server access logs but up to 50% in some cases.
This is a pretty stunning idea considering the level of decision making within the online industry that currently happens based on the statistics that are displayed in Google Analytics.
After talking to a few others about this issue, it turns out there may well be a growing question surrounding the importance that Google Analytics has in the decision making process. There’s this case study completed by Michael Martinez on SEOMOZ which, while a few years old, asks some very pertinent questions. How often have you questioned your Google Analytics package?
Thinking back 5 or 6 years in the online industry (prior to Google entering the web analytics vertical through acquisition) webmasters would regularly question anyone who quoted page views or page hits as it was widely accepted that the stats were particularly varied and often inaccurate.
With a company like Google taking on the mantle of free web analytics and effectively taking over the free market with a sophisticated product, the question just doesn’t get asked as often as it used to be. Do too many online professionals blindly trust Google to give accurate data through what is an imperfect tracking medium?
There is an interesting article available by Brian Clifton who took two sites and compared their tracking between Google Analytics, Yahoo Web Analytics and Nielsen Site Census. Here was the basic conclusion of his findings;
The methodology of page tagging with JavaScript in order to collect visit data has now been well established over the past 8 years or so. Given a best practice deployment of Google Analytics, Nielsen SiteCensus or Yahoo Web Analytics, high level metrics remain comparable. That is, can be expected to lie between 10-20% of each other. This is surprisingly close given the plethora of accuracy assumptions that need to considered when comparing different web analytics tools.
As tracking becomes more detailed – for example the tracking of transactions, custom variables, events and outbound links, the greater the discrepancies of metrics will be between the web analytics tools.
General discussions with a couple of industry people has suggested using a product like AWStats as a test against the particular product you’re using (e.g. Google Analytics) to make sure you’re not missing a part of the bigger picture through the use of only a single analytics package.
Do you have any data on the subject?
#1 by grant on November 16th, 2009
Interesting. Our software can do combining of data from different sources and automatically do correlations. Had not thought of trying it against something like Google Analytics, but it would be interesting to see.
With the privacy protection Google imposes, I would not expect a 1-to-1 match between GA and server logs, but a good correlation over largish volume of traffic. 50% different on the other hand, I find difficult to swallow – but I haven’t done the test yet…
#2 by gary on November 17th, 2009
I agree, 50% sounds inordinately high however the developer I was talking to is very talented and deeply involved in online metrics.
My guess is that the 50% level might come up where there is a blanked assumption that does not correlate between the two systems e.g. something like a report on one analytics tool that includes uniques matched across different sub-domains when they are treated as separate on another.
#3 by vans on November 18th, 2009
There will be a big difference across sites comparing server side counts, and client side (google analytics). For example visits across a shared ip by multiple users wouldn’t be picked up as easily by servers but using cookies analytics packages can.
#4 by Dee on March 25th, 2010
This is a very pertinent article for us.
We’ve got a product site over at http://www.epiphanyrisknetwork.com, we’re currently disputing the data we are receiving from a major advertising site, versus the hits recorded at our end by Google Analytics.
The advertising site vendor argues that they are pushing 2-300 hits per month in our direction, yet Google Analytics has recorded only 6 of these hits arriving from their site in the same time period. Obviously, that is a significant discrepancy.
We’re not expecting tens of thousands of hits, as we’re offering risk management software to the enterprise market, but it is the variance between their figures and ours that is confusing here.
The ad site is a major industry player – a site with a mass of daily news articles, white papers, and other big advertisers on their site (professional services etc). They state their webstats for users etc online, and appear very credible.
However, we can’t seem to reconcile how they can report a click through rate from their site to ours that is 40-50 times greater than we are seeing in our Google Analytics stats?
One of the challenges with web advertising is the lack of an independent body to oversee published stats. The newspaper industry is audited, and can published figures that are referenced against the auditing body. However, most websites can say whatever they want, and how can it be verified?
In this instance, the ad site is very credible, and has been around for a long time. So how do we account for the difference between their figures and ours?
Be very interested in your views?
#5 by Gary Jensen on March 27th, 2010
That’s a difficult situation to be in.
I see this quite regularly however it is usually only single figure percentage points of difference between click tracking systems and Google Analytics referrals.
Coming from the other side of the coin myself (working as a publisher) I would be saying to you that GA is not always 100% correct, that we track every click and that web statistics themselves are an imperfect science.
That holds true for a discrepancy of a few percentage points however it doesn’t work with the level of difference you’re looking at.
You need to ask a few questions, both of yourself and your advertisers;
- Is the click tracking system being used changing where the referral appears to be coming from? This could account for the difference in your stats.
- Are some of the clicks coming from bots, not real humans? Some publishers don’t proactively remove bot traffic from click reports although respectable publishers will.
- Finally, is your Google Analytics accurate? For instance are you certain you have the code on all pages.
Hopefully that helps.
I’d suggest putting in a secondary web statistics program onto your site and compare the results for a month to check that the statistics match.
#6 by Adrian on April 10th, 2010
I would suggest you check out my new free tool called YMMV real web stats, which can give a Google Analytics accuracy number, but also gives you adblock and noscript information. All you need is PHP! It sounds like it should do the trick
#7 by Sheldon Nesdale on April 25th, 2010
The inaccuracy only matters if it changes from month to month. If it’s consistent, it’s not a problem. What do we mainly use Google Analytics for? Making decisions from month to month, right? Right.
#8 by Gary Jensen on April 25th, 2010
Hi Sheldon
I agree that the most significant use of Google Analytics is through trend analysis and that’s what it has always been pushed as however I’m guessing that you yourself look at specific clicks and specific keywords and specific in-bound traffic from referral sites?
This is the interesting thing. All web analytics were once used as trend analysis only however now I don’t think they are.
I know I make business decisions every day that in some part rely on Google Analytics.
The question doesn’t get asked often enough as to whether or not those decisions could be misguided by relying on these statistics.
#9 by Gil Nmaur on May 1st, 2010
Thanks for this excellent article.
I too am very interested in this subject and have just run part 1 of a test to try to quantify the delta between GA and others that measure traffic in similar ways.
How many page-views are you really getting? Part 1 – The WordPress Stats Test
http://www.synaptici.com/2010/how-many-page-views-are-you-really-getting-part-1-the-wordpress-stats-test/
Hope this is of interest and would love your feedback.
Cheers,
Gil
#10 by Harold on July 25th, 2010
I also read that AWstats and generally software using logfiles is not very reliable as it does not deal with all the bots cloaking as browsers very well at all. Especially those which accept cookies. And there are many: besides pixscout, websense, munax, cyveilance etc there are also tons of scrapers, spammers, botnetworks etc.
So how easy is it actually to compare AWstats with any js based method which generally at least can ignore the above mentioned groups of “fake visitors”?
#11 by Harold on July 25th, 2010
We compared e.g. GA with Stumbleupon data and there was a difference of 17 %, so well within that 10-20% range the quoted article was talking about.