All processes are subject to some variation — this variation can either be inherent in the process or imposed on the process by some outside force. By observing the variability in the process output and comparing this to statistically calculated limits, objective decisions about when to take action can be made. Without truly understanding the cause of the process variation, resources may be wasted reacting to variation that is normal.
When I say ALL, I really mean ALL: manufacturing processes, where the process requires inputs, something is done to those inputs to form outputs. This is a process and there will be variation in this process. The challenge is to statistically determine what is “normal” versus when some corrective action or more attention is required; internet processes, Statistical Process Control is used to determine click-fraud also, which I’ll explain shortly; and any process in any industry — Statistical Process Control can help one better understand whether or not the process is performing as statistically expected. If not, then it is a signal for corrective action or more attention.
Types of Variation
Common Cause Variation is fluctuation caused by many random factors resulting in random distribution of the output around a mean. Common cause variation is a measure of the process’s potential or how well the process will perform when all the special cause variation is removed. Common cause variation is also called random variation, noise, non-controllable variation, within-group variation, inherent variation, or an in statistical control process.
Special Cause Variation is caused by a specific factor that results in a non-random distribution of output. Special Cause Variation can cause a shift or trend in the output and can usually be reduced or eliminated through local actions. Special Cause variation is also referred to as “exceptional” or “assignable” variation. Variation due to an identifiable out-of-the-ordinary event, not a usual part of the process.
Statistical Process Control Visualization
Control Charts are often used as part of process control systems. They were developed by W. A. Shewhart in 1924 while working for Bell Telephone Laboratories. Control Charts consists of a center line and two boundary lines placed above and below the center line (the control limits). Control limits are based on the variability within the data. Values are plotted to determine the state of the process. Control Charts tell you how the process is performing – they do not contain Specification Limits.
Control charts can be viewed as a distribution plotted on its side – if you created a histogram of the points, you would expect this to show a normal distribution (assuming the process stays in control). Below is an example:
Given the chart above, if the outputs of a process fall within the control limits — in this case the Lower Control Limit (LCL) and Upper Control Limit (UCL), then the process is said to be performing as statistically expected. But, more attention needs to be had if the outputs of this process go beyond the LCL and UCL.
Elements of a Control Chart
Control Charts are plots of one or more summary statistic from samples taken sequentially in time (usually sample proportions or mean and range). Control Charts have means and Upper & Lower Control Limits. Control Limits are set to determine whether or not a particular average (or range) is “within acceptable limits” of random variation. Control Limits try to distinguish between common cause and special cause variation. Control Limits (LCL and UCL) are typically based on ±3σ.
Interpreting a Control Chart
Below is a general guideline for interpreting a Control Chart:
An Example: Click-Fraud
Google, Yahoo, and other CPC shops are very concerned with the quality of their ads — the sites that run them, the traffic that sees them, and the population that clicks on them. Companies like these are concerned because they have a fudiciary duty to the advertisers and also have a financial incentive to make sure that quality is high in order for Yahoo!, Microsoft, Google or any other CPC shop to be an attractive destination for advertisers to advertise with.
One way CPC shops can monitor the quality of their ads and detect click-fraud is by implementing a Process Control System.
Suppose a blogger places adsense ads on her blog. Google monitors the performance of those ads by tracking the unique IP Address of the blogger and anyone who clicks on them. For the “clicking process”, there is a statistically expected number of clicks. The data will show this. There are, however, anomalies. For example, suppose this blogger writes a great blog post and that article gets slashdot-ted or dugg to the digg front page, then traffic will come and most likely ads will get clicked. The data will show this and Google will have to determine this state-of-affairs as a special-cause variation, because the spike in traffic is outside of the expected range and hence clicks will also be outside of the expected range.
Click-fraud, in this example, is detected when the same IP Address or the gang-effect of IP Addresses are detected as clicking beyond the Upper and Lower Control Limits. This is evidence of click-fraud, and Google is able to shut down the blogger’s or publisher’s account.
Why Should You Care?
Statistical Process Control is used everywhere, behind the scenes. It is a verifiable way to bettern understand and make sense of variation.
The data types in your process and the sample size will determine the type of Control Chart you should use. In the near future, I’ll explain the types of Control Charts and the science behind them.