Introduction To Statistics

Getting Acquainted with Statistics

· Define basic statistical terminology

· Recognize the different types of data and their uses

· Identify different types of sampling

Introduction

It is a common notion that statistics can provide support for anything. Only through misuse is this true. Most introductory statistics courses focus on how to use statistics rather than how to avoid misusing statistics. Textbooks offer recipes for statistical analyses but often neglect to advise of the dangers of leaving out some important ingredient in the application of the statistical test.

In statistics, garbage in – garbage out, has a significant and real meaning. Some of the more common errors in the application of statistical tests include the failure to use representative data. Representative data are those that accurately reflect some process, whether that involves time or quantity, and the data must be free of measurement and sampling bias. Application of the wrong statistical test or tool or using them incorrectly can lead to gross errors in interpretation of the data. Accurate interpretation of data is dependent upon valid analysis. At times, data may be over-interpreted or under-interpreted. One needs to become intimate with the data to determine whether the results of some analysis are important beyond the level of statistical significance or to identify the benefits of statistical significance.

Basic Terminology

In any practical application, it is important to understand the basic terminology used in statistics in order to make appropriate decisions about applying and interpreting statistical tests.

· Observation – A single data point or datum

· Population – A collection of all possible observations sharing some common set of characteristics

· Census – An investigation of all the individual observations making up a population

· Sample – A representative subset of a population; a sample can be the entire population

· Sampling – The process of collecting a representative sample from which conclusions about the population may be drawn

· Parameter – Computation based on all members of a population

· Statistic – Computation based on a sample of a population

The Sampling Process

In statistics, one collects a representative sample of a population and bases an interpretation of the population on that or several samples. It is far easier and cost effective to manage a representative sample of data points or observations than it is to manage and manipulate every member/data point/observation in a population. A population can be evaluated more quickly through sampling and the risk of damage to the population as a whole is reduced with careful sampling. Therefore, it is imperative that the sampling process permits collection of a representative sample, otherwise the population is misrepresented and the results of any analysis become invalid and uninformative.

Sampling is determined form a sampling frame; a list of elements from which a sample may be drawn. A sampling frame is a complete and correct list of population members. The source of the sampling frame should be representative of the population so that it cannot contribute to any bias in the results of a statistical analysis.

The stages in sample selection are as follows (Cooper, 2001):

• Define the target population

• Select a sampling frame

• Choose probability or non-probability sampling method

• Determine sample size

• Choose a data collection technique

• Select sample

Types of Samples

There are many types of samples, and the decision of the type of sample to collect should be based on the type of analysis that will be performed. A probability sample is one in which items are selected on the basis of known probabilities. The advantages of probability sampling are that biases in the data are minimized and that the results tend to be more generalized. However, probability sampling may be more costly and time consuming. Probability samples include simple random samples, stratified random sample, systematic random sample, and cluster random sample. A non-probability sample is one in which items are selected without regard to the probability of their occurrence. Non-probability sampling is often more convenient as the sampling sources are more readily available, such as shopping mall surveys or surveys of students on a college campus. However, sampling in this way lends itself to bias and subjectivity, and the results of an analysis may not be generally applied. Non-probability samples include convenience samples, judgment samples, quota samples, snowball samples, and voluntary samples.

Probability Samples

· Simple Random Sample – A sampling procedure that ensures that each element of the population has an equal chance of being included in the sample. Random drawing techniques or using a random number table helps to ensure that the data are collected randomly.

· Systematic Sample – A sample in which every n^th individual is selected from a population. For example, one can systematically select every 25^th name from a list of company employees or every 50^th number in the phone book

· Stratified Sample – A subsample drawn from samples collected from different strata that are essentially equal with respect to some characteristic. For example, randomly selecting 20% of the automobile dealers from 10 randomly-selected states is a stratified sample where the states are different strata.

· Proportional Stratified Sample – A sample in which the size of the sample drawn from each stratum is in proportion to the relative population size of that stratum. In automobile dealer example, if State A had 15% of the total number of dealers in the ten stratum, then 15% of the total sample size would be selected from the State A sample frame.

· Cluster Sample – A sample in which the primary sampling unit is not an individual element in the population but a large cluster of elements. The most common type of cluster sample is an “area” sample in which the primary sampling unit is a geographic area. One can also randomly select several small clusters and choose all elements in the clusters.

Some important distinctions exist between stratified and cluster samples (Zikmund, 2000). Stratified samples are collected from a population that is divided into a few subgroups, each with many elements that are randomly selected. There is homogeneity within subgroups and heterogeneity among groups. Cluster samples are collected from a population that is divided into many subgroups, each with few elements. There is heterogeneity within subgroups but homogeneity among subgroups. Subgroups, rather than elements within subgroups, are randomly selected.

Non-probability samples

· Convenience Sample – A sample of items that are most readily available.

· Judgment Sample – A sample selected by an experienced researcher based upon some appropriate characteristic.

· Quota Sample – A sample that ensures that certain characteristics of the population will be represented to the exact intended extent. For example, a quota sample may require selecting a sample of 100 residents of a specified metro area or deciding that 59% of study participants must be male. The selection process is initially random in nature, but as the quota is met, respondents are not included in the sample.

· Snowball Sample – A sample in which initial respondents are selected using probability methods, and then additional respondents are obtained from information provided by the initial respondents, such as a survey of 10 shoppers selected at random and asking each of them for the names of five friends.

Sample Sizes

A sample does not have to be large to be useful, as long as it is representative of the characteristics of the population. Some factors should be considered in choosing the sampling size of a sample. Is it a percentage of the population? Is population size a factor? Is there a magic minimum? According to Dr. George Gallup, “You do not need a large sampling proportion to do a good job if you first stir the pot well.”

Data Collection Technique

The table below can be useful for assisting a researcher in deciding how to sample a population. Ideally, one should list all factors that might contribute to the sampling process.

Yes

Maybe

No

Yes

Presentation of visual stimuli

High

High

Low

Low

Severity of non-response bias

Low

Low

Med

High

Degree of interviewer bias

No

No

Maybe

Yes

Interaction with respondents

Yes

No

Maybe

Yes

Reaches special locations

Yes

Yes

Maybe

No

Reaches widespread sample

Med

Low

Med

High

Data quantity per respondent

Low

High

Low

Medium

Time required

Very low

Low

Med

High

Costs

E-Mail

Mail

Phone

Personal

Pitfalls throughout the Sampling Hierarchy

1) With every step of the process of sampling and statistical analysis there are chances in which errors can be made that will adversely affect the outcome of a study. One can make a sampling frame error in which certain elements of the population are not included in the sampling frame. This may include unwanted units OR exclude desired units. For example, using a telephone book to define the sample frame for residents of a particular neighborhood may not contain accurate or correct information. Alf Landon was incorrectly predicted to win the 1936 Presidential election – why? (see page 18 of textbook for answer). One can commit a random sampling error in which the difference between the result of a sample and the result of a census is due solely to observations chosen that do not represent the population. For example, 75% of a selected sample might be male when only 40% of the population is male. Error may also be caused by sampling bias (i.e., one tends to favor certain data over others). Non-response errors are those that cause the sample to be less than representative of the population, perhaps because a disproportionately large group of males responds to a questionnaire or respondents are unavailable or refuse to cooperate. These pose as the most serious limitations of surveys. It is important not to confuse response rate with sample size.

2) A statistician must be cautious of voluntary samples as these types of samples may introduce irreparable damage to an analysis. Voluntary samples might include

a) 900 number surveys

b) 800 number surveys

c) Opinions site at malls

d) News / sports polls on Web

e) Talk shows

f) Websites with voting options

Voluntary surveys may bring large response totals (not the same as response rate), but do the larger sample size may not be representative of the population under study and the size of the sample will not compensate for this bias.

Making Sense of Data -- Learning to Think Statistically

Respecting Ockham’s Razor

Ockham’s Razor is the principle in investigative research that states that when two hypotheses make the same predictions and that data cannot distinguish between them, the simpler of the explanations should be prioritized for further investigation as it is more likely to be correct. The razor is the sharp edge that cuts excesses out of a hypothesis. Modern statistics is often in need of a shave! The simplest procedures that can be used to solve a problem are preferred as deliberately complicating solutions is a misuse of statistics – it obscures the analysis.

Designing the Process

In all research involving data collection and analysis, it is important to approach the problem scientifically. Careful consideration must be taken in sampling or collecting data. Data sheets are essential for managing data, and researchers need to consider the factors that are necessary to evaluate to make valid interpretations of the data. A causal relationship is often misevaluated when one attempts to determine the root cause of a problem. In order to rigorously test causation, the causal agent must be removed and the new data need to be collected and reevaluated. It is important not to blindly accept the results of a statistical analysis. Contradictions to historically well supported hypotheses need to be carefully considered and investigated. Look at data intelligently. Identify the source of the data and always ask if the data and the results of an analysis make sense. Follow some basic thinking about statistical analysis: beware of incorrect analyses, do not jump to conclusions, and perhaps most importantly, check the math!

References

Cooper, D. R., & Schindler, P. S. (2001). Business research methods (7th ed.). Boston: McGraw-Hill Irwin.

Zikmund, W. G. (2000). Business research methods (6th ed.). Ft. Worth, TX: Dryden.