In this module, we’ll discuss the following various statistical approached utilized in Six Sigma.
1. The Purpose of Basic Statistics
The purpose for statistics in Six Sigma are the following:
- Provide a numerical summary of the data being analyzed.
- Provide the basis for making inferences about the future.
- Provide the foundation for assessing process capability.
- Provide a common language to be used throughout the organization.
Statistics is the basic language of Six Sigma. A good understanding of statistics is the foundation upon which many of the subsequent tools will be used.
1.2 Statistical Notation Cheat Sheet
Don’t bother memorizing any of this, but refer to this as needed.
1.3. Parameters vs Statistics
Let’s go through a few definitions, with the help of the chart below:
- Population: All the items that have the “property of interest” under study.
- Frame: An identifiable subset of the population.
- Sample: A significantly smaller subset of the population used to make an inference.
1.4. Purpose of Sampling
To get a sufficiently accurate inference for considerably less time, money, and other resources, and also to provide a basis for statistical inference; if sampling is done well, and sufficiently, then the inference is that what we see in the sample is representative of the population
A population parameter is a numerical value that summarizes the data for an entire population, a sample has a corresponding numerical value called a statistic.
The population is a collection of all the individuals of interest. It must be defined carefully, such as all the trades completed in 2001. If for some reason there are unique subsets of trades it may be appropriate to define those as a unique population, such as, all sub custodial market trades completed in 2001, or emerging market trades.
Sampling frames are complete lists and should be identical to a population with every element listed only once. It sounds very similar to population, and it is. The difference is how it is used. A sampling frame, such as the list of registered voters could be used to represent the population of adult general public. Maybe there are reasons why this wouldn’t be a good sampling frame. Perhaps a sampling frame of licensed drivers would be a better frame to represent the general public.
The sampling frame is the source for a sample to be drawn.
It is important to recognize the difference between a sample and a population because we typically are dealing with a sample of the what the potential population could be in order to make an inference. The formulas for describing samples and populations are slightly different. In most cases we will be dealing with the formulas for samples.
2. Types of Data
The nature of data is important to understand. As we discussed in the video Data Types in Six Sigma, knowing the data type gives you the option to utilize different analysis.
2.1. Attribute Data (Qualitative)
Attribute data is always binary – only two possible values.
- Go/No Go
2.2. Variable Data (Quantitative)
Discrete Data is data that can be counted, categorized, and classified based on counts. For example:
- Number of Defects
- Number of Defective Units
- Number of Customer Returns
Continuous Data is data that can be measured on a continuum. It has decimal subdivisions, for example:
- Time, pressure, conveyor speed, material feed rate
Here are several real-world examples that may help you:
2.3. Scaled Data
Knowing how to represent data can affect the types of statistical tests available to you. Here are a few scales to keep in mind:
- Nominal Scale: Data consists of names, labels, or categories. These cannot be arranged in an ordering scheme and no arithmetic operations are performed on this type of data.
- Ordinal Scale: Data is arranged in some order, but differences between data values either cannot be determined or are not meaningful.
- Interval Scale: data can be arranged in some order and for which differences in data values are meaningful. The data can be arranged in an ordering scheme and differences can be interpreted.
- Ratio Scale: data that can be ranked and for which all arithmetic operations including division can be performed. (division by zero is of course excluded) Ratio level data has an absolute zero and a value of zero indicates a complete absence of the characteristic of interest.
Now let’s go through several examples of scaled data.
At this point, I recommend you review the section on Distributions in Six Sigma to give you a better idea of how data can look given the data type you have.
In the next section, we’ll introduce you to Z-Values.