# Sampling Errors

Suppose we are interested in the value of a population parameter, the true value of which is $\theta$ but is unknown. The knowledge about $\theta$ can be obtained either from sample data or from population data. In both cases, there is a possibility of not reaching the true value of the parameter. The difference between the calculated value (from the sample data or from population data) and the true value of the parameter is called an error.

Thus, error is something which cannot be determined accurately if the population is large and the units of the population are to be measured. Suppose we are interested in finding the total production of wheat in Pakistan in a certain year. Sufficient funds and time are at our disposal and we want to get the ‘true’ figure of the production of wheat. The maximum we can do is contact all the farmers, and suppose all the farmers cooperate completely and supply the information as honestly as possible. But the information supplied by the farmers will have errors in most cases, so we may not be able to identify the ‘true’ figure. In spite of all efforts, we shall be in the dark.

The calculated or observed figure may be good for all practical purposes but we can never claim that a true value of the parameter has been obtained. If the study of the units is based on counting,we can possibly get the true figure of the population parameter. There are two kinds of errors, (i) sampling errors or random errors and (ii) non-sampling errors.

Sampling Errors

Sampling errors occur due to the nature of sampling. The sample selected from the population is one of all possible samples. Any value calculated from the sample is based on the sample data and is called a sample statistic. The sample statistic may or may not be close to the population parameter. If the statistic is $\widehat \theta$ and the true value of the population parameter is $\theta$, then the difference $\widehat \theta - \theta$ is called the sampling error. It is important to note that a statistic is a random variable and it may take any value.

A particular example of sampling error is the difference between the sample mean $\overline X$ and the population mean $\mu$. Thus sampling error is also a random term. The population parameter is usually not known; therefore the sampling error is estimated from the sample data. The sampling error is due to the fact that a certain part of the population is incorporated in the sample. Obviously, one part of the population cannot give the true picture of the properties of the population. But one should not get the impression that a sample always gives a result which is full of errors. We can design a sample and collect sample data in a manner so that sampling errors are reduced. Sampling errors can be reduced by the following methods: (1) by increasing the size of the sample (2) by stratification.

Reducing Sampling Errors

1. Increasing the size of the sample: The sampling error can be reduced by increasing the sample size. If the sample size n is equal to the population size $N$, then the sampling error is zero.
2. Stratification: When the population contains homogeneous units, a simple random sample is likely to be representative of the population. But if the population contains dissimilar units, a simple random sample may fail to be representative of all kinds of units in the population. To improve the result of the sample, the sample design is modified. The population is divided into different groups containing similar units, and these groups are called strata. From each group (stratum), a sub-sample is selected in a random manner. Thus all groups are represented in the sample and the sampling error is reduced. This method is called stratified-random sampling. The size of the sub-sample from each stratum is frequently in proportion to the size of the stratum.
Suppose a population consists of 1000 students, out of which 600 are intelligent and 400 are unintelligent. We are assuming here that we do have much information about the population. A stratified sample of size $n =$100 is to be selected. The size of the stratum is denoted by ${N_1}$ and ${N_2}$ respectively, and the size of the samples from each stratum may be denoted by ${n_1}$ and ${n_2}$. It is written as:

 Stratum # Size of stratum Size of sample from each stratum 1 ${N_1} = 600$ ${n_1} = \frac{{n \times {N_1}}}{N} = \frac{{100 \times 600}}{{1000}} = 60$ 2 ${N_2} = 400$ ${n_2} = \frac{{n \times {N_2}}}{N} = \frac{{100 \times 400}}{{1000}} = 40$ ${N_1} + {N_2} = N = 1000$ ${n_1} + {n_2} = n = 100$

The size of the sample from each stratum has been calculated according to the size of the stratum. This is called proportional allocation. In the above sample design, the sampling fraction in the population is $\frac{n}{N} = \frac{{100}}{{1000}} = \frac{1}{{10}}$ and the sampling fraction in both the strata is also $\frac{1}{{10}}$. Thus this design is also called a fixed sampling fraction. This modified sample & sign is frequently used in sample surveys. But this design requires some prior information about the units of the population, and the population is divided into different strata based on this information. If the prior information is not available then the stratification is not applicable.