Google, Click-Fraud, Insufficient Metrics

by Pete Abilla on December 11, 2006

Enjoy this article? Please SUBSCRIBE to receive all the FREE updates!

Google came out today with a click-fraud report, claiming that less than 2% of all clicks on both adwords and adsense are considered click-fraud. Shuman Ghosemajumder, Product Manager on Google’s Trust and Safety Team, claims that this is true because of Google’s 4-step filtering system.

According to Ghosemanjumder, below is an accurate picture of click-fraud and invalid clicks based on Googler’s internal data:

shmula.com, click fraud, google

Google uses a 4-step filtering system, as pictured below:

google, click-fraud, shmula.com

In Ghosemajumder’s words:

The first layer is purely automatic and is used to filter clicks from both “search” and AdSense partners (contextual ads). This filter is able to detect invalid clicks in real-time, with the goal of removing them before they ever show up in the AdWords console.

The second and third layers are aimed at filtering only AdSense clicks. The second layer is what Google calls its “flagging system” and is an automatic process to remove invalid clicks from the AdWords system. The third layer of filtering is a “manual review” process with more than two dozen Google employees manually reviewing and removing any suspicious clicks.

Google’s goal is to have the first three layers of filtering identify 100% of all invalid and fraudulent clicks. Those clicks that manage to escape Google’s filters are what causes many advertisers to raise concerns and has spawned the growth of many so-called click fraud detection companies. The fourth layer of click fraud detection falls to these advertisers and detection companies and is what Google calls “requested investigations”.

Ghosemajumder goes on to explain that not all click-fraud is a bonafide click-fraud, citing examples from multiple clicks from the same IP Address might be from a corporate site, etc. He concludes that the current numbers of 20% or above click-fraud are inflated and untrue.

Google & Measurement System Analysis (MSA)

I have several friends working at Google; I believe that they hire smart people. But, a blanket percentage number for this type of phenomena is insufficient. These types of scenarios are subject to intra-subject and inter-subject variability. That is, how does one know that a click is valid or invalid? That question alone points to the need of additional metrics such as specificity and sensitivity:

Specificity = [(number of true negatives) / (number of true negatives + number of false positives)]

The specificity metric gives us an idea of how accurate the testing measurement tool is — in this case, the accuracy of declaring a click is not an invalid click; without that metric, blanket percentages declared by Google don’t have much meaning.

Another metric that is important to know is the sensitivity of the test:

Sensitivity = [(number of true positives) / (number of true positives + number of false negatives)]

The sensitivity metric gives us an idea of the accuracy of the test for demonstrating true click-fraud. Again, without this measurement, blanket percentages purported by Google don’t carry much meaning.

The two measurements above give rise to 2 more metrics that will give us a better picture into the true Click-fraud rate and the accuracy of the measurement system in question:

False Positive Rate = (Number of False Positives / Number of True Negatives)

This metric gives us an idea of the proportion of negative instances that were incorrectly reported as positive. On the other side, we can also derive the following:

False Negative Rate = (Number of False Negatives / Number of Positive Instances)

That metric gives us an idea of the proportion of positive instances that were reported as negative. Below is a helpful table for reference:

shmula.com, type I and type II error, google, click-fraud

Without the reporting of the 4 measurements above, it is truly difficult — if not academically possible — for Google to claim very much.

Google, Click-Fraud, and The Liar’s Paradox

Because Google really didn’t present much today in terms of meaningful data to help the audience reach a conclusion the the accuracy of the measurement system or the true numbers that make up invalid clicks and click-fraud, Google is reduced to the Liar’s Paradox.

Following my previous post on axiomatizing majority rule, I present how Google claim today is a Liar’s Paradox:

With the absence of meaningful data, Google claims:

Statement One: 2% of clicks constitute click-fraud.

Statement Two: Statement One is False.

Statement Three: Statement Two is True.

Statement Four: Statement One is both True and False

Statement Five: Statement Four is a contradiction

QED

I’m stretching the Liar’s paradox here a little bit, but I do it to demonstrate that Google’s statement today, without the metrics I described above, presented us with nothing meaningful.

+++++

If you enjoyed this article, consider subscribing to my feed and reading these other related articles:

Enjoy this article? Please SUBSCRIBE to receive all the FREE updates!

{ 1 comment… read it below or add one }

wioota December 12, 2006 at 2:32 am

I read Andy Beal’s post and baulked instantly. Fortunately I can focus on getting some work done tonight as you’ve brought a some sober analysis to the table already. In general the blogosphere seems to give numbers levels of meaning previously unseen in other mediums – numbers are regularly quoted [without/incorrectly] even quoting the metric they quantify.

Leave a Comment

{ 1 trackback }