A Primer of Research, Publication and Presentation Shahul Ameen, Sandeep Grover
INDEX
Page numbers followed by b refer to box, f refer to figure, fc refer to flowchart, and t refer to table.
A
Amisulpride 42
Analyze data 95
descriptive statistics 100
inferential statistics 103
statistics 95
Animal preclinical studies 193
Annual conferences of various medical specialties 209
ANOVA
analysis of variance 109
interpretation, one way 107
output of 108f
two way 108
Article
submitting 215
types of 157
Author
corresponding 133
first 133
ghost 132
Authorship, order of 133
Automatic term mapping 30
Automatically exploding 27
B
Bias
administration 67
attrition 172
confirmation 192
construct 67
detection 172
instrument 68
item 68
method 67
performance 172
publication 172, 192
reporting 172
sample 67
selection 9, 172
types of 67, 172t
understanding 67
Bibliographic databases 18
Body mass index 121
Boolean or logical operators 21
C
Case conference 254
aim of presentation 255
behavior 260
chief complaints 255
demographic details 255
family history 257
general appearance 260
introductory statement 255
mental status examination 259
past history 257
personal history 257
physical examination 259
premorbid personality 258
presentation 255
presenting history 255
Case presentation
common pitfalls during 264
formulation 263
management 263
Case reports 149, 193
abstract 152
concluding statements 153
description of 152
discussion 153
structure of 152
Case series 151
characteristics of 155b
formal 151
informal 151
Categorical data, comparison of 111fc
Categorical variables 109
Central tendency 102
Chi-square test 110, 111
SPSS output in 112f
Citation searching and author searching 24
Clinical practice guideline 193
Cochran's Q test 110
Cochran-Mantel-Haenszel test 110
Coding 128
Coefficient of variance 102
Common journals, impact factor of 212t
Comorbid illnesses 45
Comparing groups 109
Conference, present a case 254
Confounding 9
Consciousness, level of 261
Content 224
Continuous scale 84
Correlation 111
D
Data
analysis (or processing) 122
and variable 96
assessing normality of 97
born digital 126
budget 124
cleaning 127
collection and assessments 10
collection of 86
designing coding plan for 87
documentation or recording 126
element 121
extraction 169
imputation 128
in SPSS, check normality of 100f
information 124
ingestion, transformation, and analysis 122
into software, entering 91
making calculations 92
normality 122
policies 124
pooling of 183
rechecking 91
recoding 92
sharing 128
sheets 125
standardization 128
storage 124
to generate frequency, analyze 104f
types of 178
view, location of 88f
visualization 122
Data entry
double 127
operators 127
Data lifecycle 123
components of 123
analyse 123
assure 123
collect 123
describe 123
discover 123
integrate 123
plan 123
preserve 123
Data management
and statistical analysis 11
plan 124
components of 124t
Dealing with biases 9
Decision letters, examples of 207t
Decision-making ability 262
Depression 223
Diagnostic studies 193
Discrete scale 84
Double-blind 42
study 46
Down's syndrome, search with 22f
Duncan's test 107
Dunnett's test 107
E
Economic evaluation database 165
Economic evaluations 193
Editorial decision 196
Equivalence and bias 66
Ethical issues 264
Ethics in publishing 131
conflict of interest 134
misconduct 135
fabrication 135
falsification 135
plagiarism 135
reporting conflicts of interest 134
Ethics of peer review 136
Evaluating the results 24
Excel sheet 90f
F
Filtered and unfiltered resources 17
Fisher's LSD test 107
Font 222
bad 223
good 222
Foreground and background questions 16
Forest plot 179
interpretation of 186
Formulating clear question 18
Friedman's test 109
G
Getting full-text articles 38
Gift authorship 132
Google scholar 33
button 33
Google search settings 36
Graphs 225
Greenhouse-Geisser factors 108
H
Halo effects 68
Handouts 229
Health research 1
Heterogeneity 181
assessment of 184
History tab 32
Hunyh-Feldt factors 108
I
Illness, severity of 45
Inferential statistics 104t
Information
bias 9
primary, sources of 15
secondary, sources of 15
tertiary, sources of 15
Ingelfinger rule 214
Insight 262
Instruments 65
adaptation 65, 71
methods 71
and scales 75
(non-)disclosure of limitations of 78
administration of 77
ethical issues in use of 76
ethical use of 79
in psychiatric research, use of 75
inappropriate 76
informed consent 78
proxy observations 78
qualification and training of 77
supervision of the research staff 77
validity and reliability of 76
and scales, use of 75, 78
culturally insensitive 76
pilot study 78
back translation 71
cognitive interviewing 71
conceptual equivalence 70
content equivalence 69
criterion equivalence 69
development of 65
establishing cultural equivalence 69
expert panel translation 71
final version 71
forward translation 71
pretesting 71
semantic equivalence 69
technical equivalence 69
translation 65
variety of 65
Intelligence 67, 262
tests of 262
Intelligent character recognition 127
Internet search, carry out proper 15
Interval scale 85
J
Journal
and publish article, select 209
contents of 214, 215t
indexing agencies for 213t
of Australia 150
peer review process 190
Journal club 232
audit 237
choosing articles for discussion 234
critical rules and regulations 232
effective 232
format 235
movie-based 236
research into 237
skype-based 236
Judgment 262
K
Kendall's tau 114
Keywords, combining 21
Kolmogorov-Smirnov test 100, 101f
Kruskal-Wallis test 109
L
Language difference 68
Layout
bad 220
good 221
Levene's test 105, 106
Limits and specialized filters 24
Line diagrams 226
Literature, grey 17
Logistic regression, SPSS output for 117f
M
Magnetic ink character recognition 127
Man versus machine 230
Mann-Whitney U test 108
Manuscript
in journals, rejection of 203t
review 201
structure of 137
discussion 142
methods 138
results 141
submitted to biomedical journals 214
types of 157
writing good 137
Mapi research institute's methodology 72
Mauchly's test 108
McNemar's test 110
Measurement 75
Media and reporting 136
Medical Council of India 212
Medical subject headings (MESH) 26
Medknow 35
Mentor, finding 210
Mesh database 29
Mesh search 29f
Mesh system, tree structure of 28f
Meta-analysis
basic steps in 187
basic terminologies in 179
carry out 177
choice of effect size 181
combines 177
commonly used effect sizes in 182t
data extraction 178
effect estimate 179
effect size 179
forest plot 179
individual patient data 178
network 178
pooling of effect size 179
precision 179
range of 189
software packages for 187
tabulation 178
traditional 178
types of 178
data 178
Metadata 121, 124
process 122
Metasearch engines 35
Microsoft academic 33
Microsoft excel 89
Microsoft office, part of 89
Microsoft word 89
Misconduct related to authorship, types of 132
Mood and affect 260
Murphy's law 247
N
NCBI bookshelf 32
Neuroimaging and biomarkers in depression 221
Neurological basis of depression 219
Newman-Keul's test 107
Non-parametric tests 108
O
Observational studies 193
Olanzapine, randomized controlled trial of 42
Open-label studies 46
Optical character recognition 127
Optical mark recognition 127
Oral cancer increases with smoking, risk of 42
Organize data prior to analysis 83
Original article
common mistakes in 147b
how to write 137
Original data 121
Original research, conducting 137
Orthopedics articles on bipolar splint 23f
P
Paid resources 37
clinical key 37
EMBASE 37
psycINFO 37
science direct 37
scopus 37
web of science 37
Paper and authors, title of 42
Parametric statistics, assumptions of 108
Pearson correlation
analysis 114
output of 115f
test 119
Pearson's test 110
Peer review 201
closed 191
cycle 190
do's and don'ts in 198t
ethical principles in 197
open 191
training in 199
Perception 261
Phi and Cramer's test 111
Photos and videos 228
Phrase searching 21
Pico question, component of 35f
Pie chart 227
Pisa syndrome, clozapine-associated 152
Planning research 1
Population 53, 54
accessible 54
theoretical 54
PowerPoint 89, 230
advent of 218
presentation 219
layout 220
message of the talk 220
plan your time 219
prepare and make 218
Pre-journal club meeting preparation 233
Preparing talk 228
Presentation skills 263
Prisma guidelines 173
Probability distributions 99f
Professional publications 210
types of 210t
Protocol 193
outline of 166t
uses of 166t
Proximity operators 23
Psychiatry, statistical tests in 118
Psychomotor activity 260
Publication bias 181
assessment of 185
Publication from same data 135
Publication, duplicate 135
Publication, part 135
PubMed
clinical queries 32
introduction to 25
on automatically exploding search 29f
search
builder 29f
builder box 28
information box 31f
through libraries 32
through vendors 33
Q
Qualitative studies 193
Quality improvement studies 193
R
Randomized trials 193
Ratio scale 85
Read and analyze paper 40
Reading 209
essentials for good 50
professional publications 209
tips for good 51b
Recent activity 32
Recruitment process 10
Regression 115
Relationship between variables 98fc
Reporting guideline 193
Reporting research, guidelines for 193t
Re-review 196
Research data 121
management 121
Research notes 125
Research paper
abstract 144
acknowledgments 145
conflicts of interest 146
elements of the original 144
figures 146
references 144
tables 146
title and title page 145
Research project
data
analysis 122
collection 125
dynamics 125
lifecycle 123
management planning 123
security 129
storage 129
manage data of 121
publication and data sharing 128
Research protocol, developing 3
abbreviations and acronyms 4
aims and objectives 6
appendices 14
ethical aspects 12
finance and resources 13
hypothesis/research questions 5
introduction/background 4
methodology 7
references 13
review of literature 4
title 3
Research topic, selecting 209
Researchers
guideline for 131
tips for junior 216t
Review 158, 190
accepting to 191
article 158, 191
essentials of writing 162
history of 160
steps in publishing 163t
types of 159, 159t
blind 191
double-blind 191
guidelines for 192
hidden agenda 197
mixed-signal 197
report
comments for authors 195
comments for editors 194
structure of 194
systematic 193
Reviewer
comments 202
recognition 199
respond to 201
responsibilities of 197
Rule of 6's 220
S
Sample 54
and population, relation between 63
size and margin of error 62t
Sample T-test
output of
independent 107f
paired 106f
paired 106
Sampling 9, 53
area 57
chain referral 60
cluster 57
context of 63
convenience 58
distribution 63
error 63
expert 59
external validity 62
frame 54
heterogeneity 60
margin of error 61
maximum
heterogeneity 60
variation 60
modal instance 59
multistage 58
nonprobability 55
non-proportional quota 60
probability 55
proportional quota 60
purposive 59
quota 60
simple random 56
size 61
snowball 60
statistics 62
stratified 57
subtypes of
nonprobability 58
probability 56
systematic random 56
technique 53
final remarks 64
key terms 54
major steps 55
miscellaneous concepts 61
subtypes 55
types 55
unit 54
voluntary 58
Saving and recording 25
Scatter diagrams 228
Scatter plot 113f
Scheffe's test 107, 118
Schizophrenia
and genetics, search with 22f
negative symptoms of 42
Scientific article, basic facts
about writing 161
to kept in mind while writing 162t
Scientific paper 210
aims and objectives 44
anatomy of 42
characteristics of good 40b
conclusions 49
ethical aspects 47
instruments used 46
materials 45
methodology 45
methods 45
miscellaneous segments 49
references 49
results 47
sample 45
size 45
technique 45
segment of 48
statistical analysis 46
study design 45
Search
advanced 36
engines 167t
academic 18
academic 18
terms, try changing order of 36
tips for efficient online 36b
Semantic inference 34
Seminar 239
arriving early 248
audience 246
being with the audience 242
controlling nervousness 250
depth of presentation 241
emphasis in presentations 242
eye contact 249
gathering the literature 242
guidelines for typography, color, and layout 243
handling questions 251
handouts 244
improve delivery 248
making the right speech 245
occasion 246
organization of presentation 241
paying attention 248
to time 249
preparing and delivering 240f
presenting it right 247
purpose 246
rehearsing 248
seeking supervision 241
source of words 247
structuring 240
taking feedback 252
use correct logic 246
Sensitive/taboo topics 68
Shapiro-Wilk test 100, 101f
Single citation matcher 32
Single-blind studies 46
Skewed distribution
negatively 99f
positively 99f
Skills 67
Slides 243
Snapshot verdicts 197
Social desirability 68
Social sciences, statistical
package for 83, 95
program for 126
Sort ascending and sort descending options 92f
Speech 260
SPSS
output for
checking normalcy of data 101f
multiple linear regression 117f
worksheet, variable view in 88f
Standard deviation 102
Statistical analysis software 83, 95
Student's T-test 106
Study
design 7
procedure 10
types 193
Switch to ‘private’ browsing 35
Systematic errors 9
Systematic review 157, 167
characteristics of 160, 161t
commonly extracted data for 170t
critical appraisal of studies 170
data synthesis 172
picos model for writing 165t
uses of 160
write 157
T
Tables 224
Template of consort 141fc
Thinking 261
Truncation symbols 21
Tukey's test 107, 118
V
Variable
and data, type of 83
binary 85
comparing different types of 104t
confounding 9
correlations between different types of 105t
dependent 104
dichotomous 85
independent 104
nominal 84
numerical 84
ordinal 85
second type of 84
type of 105
W
Wilcoxon signed-rank test 108, 109
Wildcard symbols 21
World Health Organization 71
Write research protocol 1
Writing article 214
Writing protocol 165
Writing report/review 173
×
Chapter Notes

Save Clear


Sampling Techniques4

PK Singh
 
INTRODUCTION
Accurate, valid and reliable information about any aspect of life or populace is the key to developing good insight into any phenomena. Such information ultimately helps us to take the most appropriate and practical decisions at the right time. Such information is gathered by organized studies which follow certain basic principles of statistics. Statistics is a branch of mathematics dealing with the collection, analysis, interpretation, presentation, and organization of data. Some experts consider it as an independent discipline of science. In fact, all numerical data are part of statistics. Best data relating to any population would obviously emerge if the whole population were to be systematically studied. However, that may often be very difficult and impractical because of consideration of time and resource needed. By focusing only on a subset of the whole population—a sample—the information gathering process is made more practical, simple, cheap and fast.
Sampling refers to the principles and methods employed in defining and using a sample for study, drawn from a much larger population. A sample should be so defined and selected that it remains truly representative of the whole universe of phenomena to be studied, technically called a population. As mentioned earlier, the ideal situation, of course, would be if the whole of the population were to be studied for coming to a conclusion. However, it is most often practically, logistically and even theoretically impossible. Theoretically impossible because similar phenomena of the past or the future cannot be accessed at the present time for observation and study. Only the present can be studied. Here also the practical and logistic considerations most often make it mandatory that only a subset of the whole population is accessed for 54study. Therefore, the principles and process of sampling to select a predefined limited number of a representative set of individual-units of a population is of utmost and crucial importance for any study.
The concept of sampling has to be understood against that of the census. Sampling refers to the study of a proportionately small subset of a population, generally for reasons of convenience and cost. This is opposed to a census which attempts to obtain information and data from every unit-member of the population.
 
KEY TERMS
Population: It refers to all the persons/bodies/units/objects about which the study intends and plans to know something. So, the definition of the population will depend on the nature and goals of the planned project or study. The aspect of the study that is proposed to be studied is known as a parameter concerning the whole population.
Sampling Unit: It refers to one member of those individual entities which together constitute the whole population, on which any observation or measurement is to be done. It could be a person, an animal, any object or body.
Sampling Frame: It refers to a subset of the population which is fully accessible, can be completely defined, and from which the sample is to be drawn. This is also referred to as accessible population as against theoretical population which refers to the total population. Total population may quite often be very difficult or almost impossible to outline or define. However, sometimes sampling frame and total population may be the same.
Sample: Sample refers to the final subset of population drawn from the sampling frame, either by random or non-random method, from which data are collected by defined methods of observation. We start with the intended sample based on our inclusion and exclusion criteria but end up with the actual sample from which the desired data are drawn. In between, some attrition takes place because of non-response, 55non-cooperation, dropouts and several other reasons. These events, if not kept to the minimum, add to the sources of sampling errors.
 
MAJOR STEPS
Major steps to be taken in the sampling process are as follows:
  1. Specifying and defining the population to be studied.
  2. Outlining the ‘sampling frame’ from which the sample is to be drawn.
  3. Specifying the sampling size.
  4. Deciding the sampling method.
  5. Executing the sampling plan.
  6. Making observations, taking measurements, and collecting data.
Before proceeding with all these steps, the research question has to be properly formulated because it has a bearing on all the steps of sampling. Research question will decide as to what would be the sampling population and also the sampling frame, sampling size, and sampling method.
 
TYPES AND SUBTYPES
Broadly, the sampling methods can be divided into two: Probability sampling and Nonprobability sampling.
Probability Sampling refers to the method where every member of a population has equal opportunity or known probability of being picked up for inclusion in the sample. The advantage of this method is that it is much more likely to be truly representative of the whole population of study and therefore the conclusions drawn would justifiably and reasonably apply to the whole population.
One of the disadvantages of random sampling is that it requires complete information about the size of the population and also the number of units included in that population. This advantage may not be available for most or many such planned studies.
Nonprobability Sampling refers to that method of collecting sample where units of a population do not have a known 56probability of being selected in the study sample. However, in many situations, non-probability sampling may be the only available choice. The main criticism of such data is they are not applicable to the whole population in a predictable manner. Therefore, they cannot be used reliably or reasonably for planning or predictive and extrapolatory purposes. The advantage, on the other hand, is that some information is available for the issue at hand which is relevant and related. It is said that a nonprobability sample is not representative of the whole population, but this may not be true. It may as well be representative of the population, even if by chance. However, we cannot be confident about its degree of representativeness. However, in case of a probability sample, it may be said with quite a confidence, may be 95% or more, that the chosen sample will be representative of the whole population. However, it is said that in many areas of applied social research, a nonprobability sample may be the only practical, feasible, and theoretically sound option to be used. Therefore, the nonprobability sample also continues to be relevant in special circumstances.
 
Subtypes of Probability Sampling
Simple Random Sampling shows the highest degree of randomization because it is conceptually designed in such way that every individual unit of a sampling frame or population has an equal chance of being selected for inclusion in the sample. For this reason, the generalizability of conclusions based on this method of sampling is likely to be very high.
Various methods of randomization (Random Number generation) that can be used are: 1. Blindly selecting numbered balls out of a bag by lottery; 2. Using online random generators like www.random.org/integers; 3. Using Excel RAND and Excel RANDBETWEEN functions.
Systematic Random Sampling requires that every member-unit of a sampling frame is listed first; later the first sample unit to be included in the study is selected by any random method, and subsequently, every kth member is selected for inclusion in the sample as previously decided. An illustrative example can be considered as below in a stepwise fashion:57
  1. Let the number of units in the sampling frame be from 1 to N.
  2. Then the size of the sample, ‘n’, is decided based on considerations of requirements of the study and the resources available.
  3. The interval size, ‘k’, is decided by dividing N by n (N/n).
  4. Subsequently, any integer is randomly selected between 1 and k.
  5. Further, every kth unit is selected, which together constitute the sample for the study under consideration.
An advantage of Systematic Random Sampling is that it is much simpler to execute because one has to select only one number in a random manner. This method may also come to help in certain situations where Simple Random Sampling may be nearly impossible to employ. However, one should ensure that the original enlistment of units in the sampling frame has not been made according to any overt or covert order, and it should be reasonable to assume that they are in random order. Because, if there is any possibility of covert pre-existing order, then it is likely to introduce serious bias in the selection of this kind of sample.
Stratified Sampling is used when it is reasonable to assume, based on expert knowledge, that certain sections of the population differ from each other significantly enough to differentially influence the variable to be measured. This also is done when any population is thought to be very heterogeneous with regards to distribution of a variable. Therefore, to create homogenous groups, the whole population is arranged in suitable strata. Therefore, for the sake of better and more realistic exposition of the variable under consideration, the whole population is partitioned into various groups which are called strata. Then, the subpopulation of each stratum is subjected to the process of simple random sampling to get the desired segment of the sample from that stratum.
Cluster Sampling is also known as Area Sampling. This method of sampling is used when the population under consideration is scattered over widely disparate areas, which makes it highly inconvenient and resource-intensive to apply simple 58random sampling to the whole population. In such situations, the whole population is divided into clusters, generally on geographical considerations. Some representative clusters, which form the sampling frame, are selected, generally in a random manner, depending on the demands of a particular study. From this subpopulation, sample-units are selected based on simple random sampling or every member of the sample may be studied. The sample is then subjected to the study protocol to get the desired data. A cluster is a subunit of the population which contains all the representative heterogeneity of the said population. On the other hand, strata are so defined that they partition the whole population into different relatively homogenous groups in different strata. To summarise, cluster sampling method requires adherence to the following steps:
  1. The total population is divided into clusters, generally on geographical line which contain all the characteristic heterogeneity of the population
  2. Depending on the requirements of the study, a few clusters are selected, generally on a random basis.
  3. Later, principles of simple random sampling are applied to select the actual sample-units for study, or, more often, all units within a sampled cluster are studied.
Multistage Sampling refers to that method of sampling where more than one method of previously described probability sampling are applied in stages.
 
Subtypes of Nonprobability Sampling
Voluntary Sampling is said to have been used when individuals come forward on their own free will in response to an invitation to participate in any study and allow observations to be made on them or their opinion to be recorded. In this type, the sample is not chosen by the person conducting the study, but it gets formed by people who come forward to participate voluntarily.
Convenience Sampling is when a sample is selected based on the ease of availability and access to its individual members 59to make an observation or record their views. An illustrative example would be when views of people coming out of a cinema hall are recorded on any matter. Here, the most characteristic thing is the ease of availability of sample units.
Purposive Sampling is said to have been done when the researchers start with a set purpose and specified criteria for inclusion. There is no predefined sampling frame. They would include anyone and everyone in their study based on first found-first taken paradigm from the population, till they complete the pre-decided desired number of units for their study. It is obvious, however, that such a sample would be prone to be biased. However, it does give some information about the target group from within the population. Purposive sampling may be of various subtypes, some of which are discussed below:
Modal Instance Sampling is a kind of sampling used for informal surveys which have the nature of a preliminary study. The procedure for this kind of sampling is that, first, based on life experience and common knowledge, we hypothesize and conjecture about what would be the most typical attributes and characteristics of a sample unit in that population. Then, anyone who meets those criteria is included in the sample for the study. Since ‘Mode’ in statistics refers to the most frequently occurring value or characteristics in a distribution curve, therefore, any sampling done based on ‘modal’ characteristics of a population is known as ‘Modal Instance Sampling’.
Expert Sampling refers to obtaining the opinion of a panel or group of experts of known and acknowledged expertise and knowledge in specific areas which relate to the current research questions under consideration. Their observations and opinion may match the sample characteristics. However, they may also go wrong. However, the ease with which such kind of sampling can be executed makes it worth resorting to in certain circumstances. Another advantage of this kind of sampling is that it provides certain guidelines for modal-instance sampling.60
Quota Sampling is a method that can be considered the non-probabilistic equivalent of stratified random sampling. Here, the population is divided into various groups based on age, gender, education, race, religion, job, etc. and then in every group, a select number of sample-units are selected based on the principles of purposive sampling. If the number of sample-units included in each group is designed to match the same proportion of percentage that is seen in the general population, then it is called ‘proportional quota sampling’. However, if the consideration of proportional representation is not followed as one of the principles of sampling, then it is called ‘non-proportional quota sampling’.
Snowball Sampling begins by identifying a person or unit who meets the criteria for inclusion in the study. This identified first person is then used as a source of information regarding other sample-units who may also meet the criteria for inclusion in the study. For this reason, it is also known as chain referral sampling. At times, this may be the only method feasible to get access to the difficult-to-reach and difficult-to-involve hidden populations who need to be studied. Examples include substance abusers, HIV patients, homeless people, etc. who remain inaccessible because of reasons of stigma and fear of social exclusion or legal consequences. A variant of snowball sampling is respondent-driven sampling.
Heterogeneity Sampling is used when the purpose of sampling is to give representation to the full spectrum of heterogeneity present in any population without any consideration for the proportion in which it actually exists in that population. This type is also known as sampling for identifying the diversity. It may be conceptualized as being just the opposite of modal instance sampling. To achieve this goal, one may have to include in the sample people of all diverse shades and variety of opinion. It is also known as maximum variation sampling or maximum heterogeneity sampling. It provides representative information when population information is not available.61
 
MISCELLANEOUS CONCEPTS
Sampling Size: Decision about the size of the sample depends on the nature of the study, size of the population, and the degree of precision and accuracy demanded by the circumstances. Generally, larger the size of the sample, higher would be the accuracy and validity of the outcome. However, there are several statistical formulae available to calculate the desirable size of the sample to obtain the targeted confidence interval and level of significance so that the sample has sufficient statistical power. Other determinants of sample size are the requirement and availability of various resources in terms of human resources, money, material and time, which also are very important practical considerations.
Margin of Error: Margin of error is a measure which tells us as to how reliable the findings of a particular sample survey are. This means that if another survey on the same population is done with similar criteria and method, how much the maximum difference is likely to be. It is a measure of the reliability of the obtained data. It does not say anything about the other sources of bias or other errors. The margin of error is inversely proportional to the square root of the size of the sample. Larger the sample size, lower is the margin of error. The margin of error is independent of the size of the population. In other words, if the sample size is 1000, the margin of error would be the same irrespective of the fact as to whether the population size is 50 thousand or 50 lakhs. For a sample size of 1000 (n), the margin of error for the sample would be one divided by square root of 1000. This would equal to approximately 0.03 or about 3%. Since the margin of error is inversely proportional to the square root of sample size, if the sample size is increased four times, the margin of error will be reduced by half. This means that the reliability will increase two-fold.
The interpretation of margin of error is that, for a sample of a defined size from a defined population, the difference between the obtained sample value and the true population value on a particular parameter will remain within the margin of error at least 95% of the times.62
Table 1   Sample size and margin of error.
Sample size (n)
Margin of error (M.E.)
200
7.1%
400
5.0%
700
3.8%
1000
3.2%
1200
2.9%
1500
2.6%
2000
2.2%
3000
1.8%
4000
1.6%
5000
1.4%
It is obvious from Table 1 that the maximum reduction in the margin of error occurs between the sample size of 200 and 1500 and thereby, there is a significant increase in reliability. After the sample size of 1500, the rate of reduction of the margin of error is not in proportion to the degree of increase in the size of the sample. This example seems to follow the law of diminishing return. The gains in terms of enhanced reliability do not happen in proportion to the amount of time, effort and resources required for the same.
External Validity: The concept of external validity refers to the truth quotient, the degree of truthfulness, of the conclusions of a study with respect to its generalizability. It is a measure of the applicability of the conclusions of the study to other similar people at a different place at another time. The external validity or the generalizability of the conclusions can be improved by certain additional measures such as using truly random sample selection, keeping non-participation and dropout rates to the minimum, and by replicating the study with a different set of people at different places and at different times.
Statistics: Once a sample is defined and determined, it is subjected to observations and measurements as per the 63methodology mentioned in the research protocol. The observations, measurements, and responses obtained are accurately recorded and systematically organized before subjecting them to appropriate and suitable statistical analysis with the goal of coming to a valid conclusion which can be generalized to the whole population with confidence. The statistical terms used for a sample are mean, median, mode, etc. However, similar calculations of observations made on the entire population are not known by the same terminologies, but they are referred to as parameters of the population.
Relation between Sample and Population: A sample is a representative of the population, but it does not reflect the population in a 100% fool-proof manner. There is always some difference because of factors related to chance and inaccuracies in the method of sampling. The relationship between a sample and a population is understood and represented through various technically sound statistical concepts. This includes Sampling distribution, Standard deviation and Standard error which is also known as Sampling error in the context of sampling. The graphical distribution of the statistics of an infinite number of samples of the same size taken from the same population of the study is known as sampling distribution. A standard deviation is the measure of the spread of individual measurements within a sample, i.e. how different are individual measurements from one another in a collective manner within a sample. On the other hand, the standard error refers to the spread of large number or theoretically infinite number of averages of equinumber samples drawn from the same population, around the average of all such averages in a sampling distribution. A standard error is also known as sampling error, which is supposed to give a fair estimate of the precision of the statistics of a single sample. Together, all these concepts give us an estimate of the external validity of the conclusions drawn based on any sampling method. The standard error alias sampling error is calculated from the standard deviation of any individual sample itself because, in actual life, we cannot possibly study an infinite number of samples drawn from the same population. Larger the standard deviation, larger is likely to be the size of standard error. 64This tendency to enhanced standard error can be countered by increasing the size of the sample. In other words, larger the sample size, lower shall be the standard error or sampling error. Generally, the spread of any parameter takes the shape of a symmetric bell-shaped curve. Average values which lie within one standard deviation around the population average contain about 68% of all observed values. Similarly, when one goes two and three standard deviations above and below the population average, mathematically speaking, it subsumes 95% and 98% of individual units, respectively.
 
FINAL REMARKS
Well begun is half done. Sampling is the beginning of all investigative processes. A good beginning is reflective of sound preparation, clarity of vision, confidence, and conviction. Using sound sampling techniques would ensure high quality, accurate and valid outcomes of investigative endeavors, which can be used with confidence for improving the quality of life of masses, planning for the future, and efficiently implementing the projects. After all, knowledge is power and that would depend on the application of sound principles and techniques of sampling in the first place.
SUGGESTED READING
  1. Gupta SL, Gupta H. Research Methodology: Text and Cases with SPSS Applications 2nd ed. International Book House,  New Delhi.
  1. Cochran WG. Samplign Techniques, 3rd edn. Wiley Publishers,  India Edition.