INTRODUCTION
Accurate, valid and reliable information about any aspect of life or populace is the key to developing good insight into any phenomena. Such information ultimately helps us to take the most appropriate and practical decisions at the right time. Such information is gathered by organized studies which follow certain basic principles of statistics. Statistics is a branch of mathematics dealing with the collection, analysis, interpretation, presentation, and organization of data. Some experts consider it as an independent discipline of science. In fact, all numerical data are part of statistics. Best data relating to any population would obviously emerge if the whole population were to be systematically studied. However, that may often be very difficult and impractical because of consideration of time and resource needed. By focusing only on a subset of the whole population—a sample—the information gathering process is made more practical, simple, cheap and fast.
Sampling refers to the principles and methods employed in defining and using a sample for study, drawn from a much larger population. A sample should be so defined and selected that it remains truly representative of the whole universe of phenomena to be studied, technically called a population. As mentioned earlier, the ideal situation, of course, would be if the whole of the population were to be studied for coming to a conclusion. However, it is most often practically, logistically and even theoretically impossible. Theoretically impossible because similar phenomena of the past or the future cannot be accessed at the present time for observation and study. Only the present can be studied. Here also the practical and logistic considerations most often make it mandatory that only a subset of the whole population is accessed for 54study. Therefore, the principles and process of sampling to select a predefined limited number of a representative set of individual-units of a population is of utmost and crucial importance for any study.
The concept of sampling has to be understood against that of the census. Sampling refers to the study of a proportionately small subset of a population, generally for reasons of convenience and cost. This is opposed to a census which attempts to obtain information and data from every unit-member of the population.
KEY TERMS
Population: It refers to all the persons/bodies/units/objects about which the study intends and plans to know something. So, the definition of the population will depend on the nature and goals of the planned project or study. The aspect of the study that is proposed to be studied is known as a parameter concerning the whole population.
Sampling Unit: It refers to one member of those individual entities which together constitute the whole population, on which any observation or measurement is to be done. It could be a person, an animal, any object or body.
Sampling Frame: It refers to a subset of the population which is fully accessible, can be completely defined, and from which the sample is to be drawn. This is also referred to as accessible population as against theoretical population which refers to the total population. Total population may quite often be very difficult or almost impossible to outline or define. However, sometimes sampling frame and total population may be the same.
Sample: Sample refers to the final subset of population drawn from the sampling frame, either by random or non-random method, from which data are collected by defined methods of observation. We start with the intended sample based on our inclusion and exclusion criteria but end up with the actual sample from which the desired data are drawn. In between, some attrition takes place because of non-response, 55non-cooperation, dropouts and several other reasons. These events, if not kept to the minimum, add to the sources of sampling errors.
MAJOR STEPS
Major steps to be taken in the sampling process are as follows:
- Specifying and defining the population to be studied.
- Outlining the ‘sampling frame’ from which the sample is to be drawn.
- Specifying the sampling size.
- Deciding the sampling method.
- Executing the sampling plan.
- Making observations, taking measurements, and collecting data.
Before proceeding with all these steps, the research question has to be properly formulated because it has a bearing on all the steps of sampling. Research question will decide as to what would be the sampling population and also the sampling frame, sampling size, and sampling method.
TYPES AND SUBTYPES
Broadly, the sampling methods can be divided into two: Probability sampling and Nonprobability sampling.
Probability Sampling refers to the method where every member of a population has equal opportunity or known probability of being picked up for inclusion in the sample. The advantage of this method is that it is much more likely to be truly representative of the whole population of study and therefore the conclusions drawn would justifiably and reasonably apply to the whole population.
One of the disadvantages of random sampling is that it requires complete information about the size of the population and also the number of units included in that population. This advantage may not be available for most or many such planned studies.
Nonprobability Sampling refers to that method of collecting sample where units of a population do not have a known 56probability of being selected in the study sample. However, in many situations, non-probability sampling may be the only available choice. The main criticism of such data is they are not applicable to the whole population in a predictable manner. Therefore, they cannot be used reliably or reasonably for planning or predictive and extrapolatory purposes. The advantage, on the other hand, is that some information is available for the issue at hand which is relevant and related. It is said that a nonprobability sample is not representative of the whole population, but this may not be true. It may as well be representative of the population, even if by chance. However, we cannot be confident about its degree of representativeness. However, in case of a probability sample, it may be said with quite a confidence, may be 95% or more, that the chosen sample will be representative of the whole population. However, it is said that in many areas of applied social research, a nonprobability sample may be the only practical, feasible, and theoretically sound option to be used. Therefore, the nonprobability sample also continues to be relevant in special circumstances.
Subtypes of Probability Sampling
Simple Random Sampling shows the highest degree of randomization because it is conceptually designed in such way that every individual unit of a sampling frame or population has an equal chance of being selected for inclusion in the sample. For this reason, the generalizability of conclusions based on this method of sampling is likely to be very high.
Various methods of randomization (Random Number generation) that can be used are: 1. Blindly selecting numbered balls out of a bag by lottery; 2. Using online random generators like www.random.org/integers; 3. Using Excel RAND and Excel RANDBETWEEN functions.
Systematic Random Sampling requires that every member-unit of a sampling frame is listed first; later the first sample unit to be included in the study is selected by any random method, and subsequently, every kth member is selected for inclusion in the sample as previously decided. An illustrative example can be considered as below in a stepwise fashion:57
- Let the number of units in the sampling frame be from 1 to N.
- Then the size of the sample, ‘n’, is decided based on considerations of requirements of the study and the resources available.
- The interval size, ‘k’, is decided by dividing N by n (N/n).
- Subsequently, any integer is randomly selected between 1 and k.
- Further, every kth unit is selected, which together constitute the sample for the study under consideration.
An advantage of Systematic Random Sampling is that it is much simpler to execute because one has to select only one number in a random manner. This method may also come to help in certain situations where Simple Random Sampling may be nearly impossible to employ. However, one should ensure that the original enlistment of units in the sampling frame has not been made according to any overt or covert order, and it should be reasonable to assume that they are in random order. Because, if there is any possibility of covert pre-existing order, then it is likely to introduce serious bias in the selection of this kind of sample.
Stratified Sampling is used when it is reasonable to assume, based on expert knowledge, that certain sections of the population differ from each other significantly enough to differentially influence the variable to be measured. This also is done when any population is thought to be very heterogeneous with regards to distribution of a variable. Therefore, to create homogenous groups, the whole population is arranged in suitable strata. Therefore, for the sake of better and more realistic exposition of the variable under consideration, the whole population is partitioned into various groups which are called strata. Then, the subpopulation of each stratum is subjected to the process of simple random sampling to get the desired segment of the sample from that stratum.
Cluster Sampling is also known as Area Sampling. This method of sampling is used when the population under consideration is scattered over widely disparate areas, which makes it highly inconvenient and resource-intensive to apply simple 58random sampling to the whole population. In such situations, the whole population is divided into clusters, generally on geographical considerations. Some representative clusters, which form the sampling frame, are selected, generally in a random manner, depending on the demands of a particular study. From this subpopulation, sample-units are selected based on simple random sampling or every member of the sample may be studied. The sample is then subjected to the study protocol to get the desired data. A cluster is a subunit of the population which contains all the representative heterogeneity of the said population. On the other hand, strata are so defined that they partition the whole population into different relatively homogenous groups in different strata. To summarise, cluster sampling method requires adherence to the following steps:
- The total population is divided into clusters, generally on geographical line which contain all the characteristic heterogeneity of the population
- Depending on the requirements of the study, a few clusters are selected, generally on a random basis.
- Later, principles of simple random sampling are applied to select the actual sample-units for study, or, more often, all units within a sampled cluster are studied.
Multistage Sampling refers to that method of sampling where more than one method of previously described probability sampling are applied in stages.
Subtypes of Nonprobability Sampling
Voluntary Sampling is said to have been used when individuals come forward on their own free will in response to an invitation to participate in any study and allow observations to be made on them or their opinion to be recorded. In this type, the sample is not chosen by the person conducting the study, but it gets formed by people who come forward to participate voluntarily.
Convenience Sampling is when a sample is selected based on the ease of availability and access to its individual members 59to make an observation or record their views. An illustrative example would be when views of people coming out of a cinema hall are recorded on any matter. Here, the most characteristic thing is the ease of availability of sample units.
Purposive Sampling is said to have been done when the researchers start with a set purpose and specified criteria for inclusion. There is no predefined sampling frame. They would include anyone and everyone in their study based on first found-first taken paradigm from the population, till they complete the pre-decided desired number of units for their study. It is obvious, however, that such a sample would be prone to be biased. However, it does give some information about the target group from within the population. Purposive sampling may be of various subtypes, some of which are discussed below:
Modal Instance Sampling is a kind of sampling used for informal surveys which have the nature of a preliminary study. The procedure for this kind of sampling is that, first, based on life experience and common knowledge, we hypothesize and conjecture about what would be the most typical attributes and characteristics of a sample unit in that population. Then, anyone who meets those criteria is included in the sample for the study. Since ‘Mode’ in statistics refers to the most frequently occurring value or characteristics in a distribution curve, therefore, any sampling done based on ‘modal’ characteristics of a population is known as ‘Modal Instance Sampling’.
Expert Sampling refers to obtaining the opinion of a panel or group of experts of known and acknowledged expertise and knowledge in specific areas which relate to the current research questions under consideration. Their observations and opinion may match the sample characteristics. However, they may also go wrong. However, the ease with which such kind of sampling can be executed makes it worth resorting to in certain circumstances. Another advantage of this kind of sampling is that it provides certain guidelines for modal-instance sampling.60
Quota Sampling is a method that can be considered the non-probabilistic equivalent of stratified random sampling. Here, the population is divided into various groups based on age, gender, education, race, religion, job, etc. and then in every group, a select number of sample-units are selected based on the principles of purposive sampling. If the number of sample-units included in each group is designed to match the same proportion of percentage that is seen in the general population, then it is called ‘proportional quota sampling’. However, if the consideration of proportional representation is not followed as one of the principles of sampling, then it is called ‘non-proportional quota sampling’.
Snowball Sampling begins by identifying a person or unit who meets the criteria for inclusion in the study. This identified first person is then used as a source of information regarding other sample-units who may also meet the criteria for inclusion in the study. For this reason, it is also known as chain referral sampling. At times, this may be the only method feasible to get access to the difficult-to-reach and difficult-to-involve hidden populations who need to be studied. Examples include substance abusers, HIV patients, homeless people, etc. who remain inaccessible because of reasons of stigma and fear of social exclusion or legal consequences. A variant of snowball sampling is respondent-driven sampling.
Heterogeneity Sampling is used when the purpose of sampling is to give representation to the full spectrum of heterogeneity present in any population without any consideration for the proportion in which it actually exists in that population. This type is also known as sampling for identifying the diversity. It may be conceptualized as being just the opposite of modal instance sampling. To achieve this goal, one may have to include in the sample people of all diverse shades and variety of opinion. It is also known as maximum variation sampling or maximum heterogeneity sampling. It provides representative information when population information is not available.61
MISCELLANEOUS CONCEPTS
Sampling Size: Decision about the size of the sample depends on the nature of the study, size of the population, and the degree of precision and accuracy demanded by the circumstances. Generally, larger the size of the sample, higher would be the accuracy and validity of the outcome. However, there are several statistical formulae available to calculate the desirable size of the sample to obtain the targeted confidence interval and level of significance so that the sample has sufficient statistical power. Other determinants of sample size are the requirement and availability of various resources in terms of human resources, money, material and time, which also are very important practical considerations.
Margin of Error: Margin of error is a measure which tells us as to how reliable the findings of a particular sample survey are. This means that if another survey on the same population is done with similar criteria and method, how much the maximum difference is likely to be. It is a measure of the reliability of the obtained data. It does not say anything about the other sources of bias or other errors. The margin of error is inversely proportional to the square root of the size of the sample. Larger the sample size, lower is the margin of error. The margin of error is independent of the size of the population. In other words, if the sample size is 1000, the margin of error would be the same irrespective of the fact as to whether the population size is 50 thousand or 50 lakhs. For a sample size of 1000 (n), the margin of error for the sample would be one divided by square root of 1000. This would equal to approximately 0.03 or about 3%. Since the margin of error is inversely proportional to the square root of sample size, if the sample size is increased four times, the margin of error will be reduced by half. This means that the reliability will increase two-fold.
The interpretation of margin of error is that, for a sample of a defined size from a defined population, the difference between the obtained sample value and the true population value on a particular parameter will remain within the margin of error at least 95% of the times.62
|
It is obvious from Table 1 that the maximum reduction in the margin of error occurs between the sample size of 200 and 1500 and thereby, there is a significant increase in reliability. After the sample size of 1500, the rate of reduction of the margin of error is not in proportion to the degree of increase in the size of the sample. This example seems to follow the law of diminishing return. The gains in terms of enhanced reliability do not happen in proportion to the amount of time, effort and resources required for the same.
External Validity: The concept of external validity refers to the truth quotient, the degree of truthfulness, of the conclusions of a study with respect to its generalizability. It is a measure of the applicability of the conclusions of the study to other similar people at a different place at another time. The external validity or the generalizability of the conclusions can be improved by certain additional measures such as using truly random sample selection, keeping non-participation and dropout rates to the minimum, and by replicating the study with a different set of people at different places and at different times.
Statistics: Once a sample is defined and determined, it is subjected to observations and measurements as per the 63methodology mentioned in the research protocol. The observations, measurements, and responses obtained are accurately recorded and systematically organized before subjecting them to appropriate and suitable statistical analysis with the goal of coming to a valid conclusion which can be generalized to the whole population with confidence. The statistical terms used for a sample are mean, median, mode, etc. However, similar calculations of observations made on the entire population are not known by the same terminologies, but they are referred to as parameters of the population.
Relation between Sample and Population: A sample is a representative of the population, but it does not reflect the population in a 100% fool-proof manner. There is always some difference because of factors related to chance and inaccuracies in the method of sampling. The relationship between a sample and a population is understood and represented through various technically sound statistical concepts. This includes Sampling distribution, Standard deviation and Standard error which is also known as Sampling error in the context of sampling. The graphical distribution of the statistics of an infinite number of samples of the same size taken from the same population of the study is known as sampling distribution. A standard deviation is the measure of the spread of individual measurements within a sample, i.e. how different are individual measurements from one another in a collective manner within a sample. On the other hand, the standard error refers to the spread of large number or theoretically infinite number of averages of equinumber samples drawn from the same population, around the average of all such averages in a sampling distribution. A standard error is also known as sampling error, which is supposed to give a fair estimate of the precision of the statistics of a single sample. Together, all these concepts give us an estimate of the external validity of the conclusions drawn based on any sampling method. The standard error alias sampling error is calculated from the standard deviation of any individual sample itself because, in actual life, we cannot possibly study an infinite number of samples drawn from the same population. Larger the standard deviation, larger is likely to be the size of standard error. 64This tendency to enhanced standard error can be countered by increasing the size of the sample. In other words, larger the sample size, lower shall be the standard error or sampling error. Generally, the spread of any parameter takes the shape of a symmetric bell-shaped curve. Average values which lie within one standard deviation around the population average contain about 68% of all observed values. Similarly, when one goes two and three standard deviations above and below the population average, mathematically speaking, it subsumes 95% and 98% of individual units, respectively.
FINAL REMARKS
Well begun is half done. Sampling is the beginning of all investigative processes. A good beginning is reflective of sound preparation, clarity of vision, confidence, and conviction. Using sound sampling techniques would ensure high quality, accurate and valid outcomes of investigative endeavors, which can be used with confidence for improving the quality of life of masses, planning for the future, and efficiently implementing the projects. After all, knowledge is power and that would depend on the application of sound principles and techniques of sampling in the first place.
SUGGESTED READING
- Gupta SL, Gupta H. Research Methodology: Text and Cases with SPSS Applications 2nd ed. International Book House, New Delhi.
- Cochran WG. Samplign Techniques, 3rd edn. Wiley Publishers, India Edition.