 Which of the following is NOT a part of continuum of natural history of the disease?
 Stage of Susceptibility
 Stage of preclinical
 Stage of prevention
 Stage of recovery
 Which of the following is also known as retrospective studies?
 Cohort studies
 Descriptive studies
 Experimental studies
 Case control studies
 Total number of deaths reported during a given time interval from estimated midinterval population is called;
 death rate
 Crude death rate
 mortality rate
 proportional mortality
 Number of live births reported during a given time interval from estimated midinterval population is called;
 Birth Rate
 Growth Rate
 Crude Fertility rate
 Crude Birth Rate
 Number of live births reported during a given time interval from estimated number of women age 15 to 44 years mid interval is known as;
 Crude Fertility Rate
 Birth Rate
 Growth Rate
 Sex ratio
 Number of current cases(new and old) of specified disease identified over a given time interval from estimated population at mid interval is called;
 Prevalence
 Period Prevalence
 Point Prevalence
 Disease Prevalence
 Use of statistics to analyze characteristics or changes to a population is termed as;
 population Pyramid
 vital statistics
 Population statistics
 Population dynamics
 Which of the following term provides true representation of whole population?
 Sampling
 Random Sampling
 Case reporting
 Sample
 Measure of the frequency of occurrence of death in a defined population during a specified interval is called;
 Crude death rate
 Mortality Rate
 Death ratio
 Mortality
 Public health surveillance DOES NOT consists on the following step;
 Systematic collection
 Analysis
 Planning
 Interpretation
 Surveillance system information cycles include;
 Family and community
 Public, Health care provider and Health agencies
 None of the above
 Public, Health care provider only
 Epidemiology can be defined as follow EXCEPT;
 Distribution of health related states
 Community leaders and their family crises
 Determinant of health related events
 Apply to the control of health problems
 A state of disorder that results from communication ONLY by direct contact is termed as;
 Infectious disease
 Contamination
 Epidemic
 Contagious disease
 Which of the following is NOT a basic measurement in epidemiology;
 Rate
 Nominator
 Ratio
 Proportion
 Which of the following is usually expressed as percentage;
 Rate
 Nominator
 Ratio
 Proportion
 Measurement of disease, disability or death and converting this information in to rates and ratio is defined as;
 Specificity
 Screening
 Frequency
 Sensitivity
 Measurement of current status of disease is termed as;
 Prevalence
 Incidence
 Cumulative Incidence
 Mid interval population
 A person who harbors the microorganisms of a disease and excretes them without self suffering from symptoms is called;
 Reservoir
 Carrier
 Host
 Agent
 The modes of transmission of infectious diseases are as follow EXCEPT;
 Direct
 Indirect
 Physiological
 Biological
 The number of new cases occurring in a defined population during a specified period of time is called;
 Prevalence
 Incidence
 a and b
 Cumulative incidence
 Epidemiological methods can be categorized as follow;
 Descriptive, cohort and case control
 Descriptive, cross sectional and experimental
 Descriptive, prospective and experimental
 Descriptive, Analytical and experimental
 In descriptive epidemiology disease described in terms of;
 What, Why and How
 Host, Agent and Environment
 Time, Place and Person
 Agent, Place and Person
 Which of the following is also known as prospective study;
 Cohort studies
 Descriptive studies
 Experimental studies
 Case control studies
 In epidemiological triad environmental factors can be classified as;
 Physical
 Chemical
 Social
 Biological
 Which of the following ratio provide us an estimate of risk in case control study;
 Odd ratio
 Sex ratio
 Disease ratio
 Dependency ratio
 The entire group of people or elements that have at least one thing is common is known as;
 Sample
 Parameter
 Hypothesis
 Population
 Sampling done on the basis of some pre determined ideas and its result can not be generalized is defined as follow;
 Snow ball sampling
 Purposive sampling
 Probability sampling
 Nonprobability sampling
 Tertiary prevention includes;
 Disability limitation
 Prompt treatment
 Rehabilitation
 a and c
 a and b
 Agents such as vitamins, protein, fat etc. are an examples of;
 Physical Agents
 Nutritive Agents
 Chemical Agents
 All of the above
 Which of the following are key components of Epidemiological triangle,
 Host, Agent and Physical Environment
 Host, Genes and Physical Environment
 Host, Agent and Environment
 None of the above
 Tertiary prevention Does not includes;
 Disability limitation
 Prompt treatment
 Rehabilitation
 a and c
 Agents such as vitamins, protein, fat etc. are an examples of;
 Physical Agents
 Nutritive Agents
 Chemical Agents
 All of the above
 Which of the following are not key components of Epidemiological triangle,
 Host and Agent
 Host and Environment
 Host, Agent and Environment
 Time, Place and Person
 Which of the following is a part of continuum of natural history of the disease?
 Stage of health promotion
 Stage of prevention
 Stage of Recovery
 Stage of sampling
 Which of the following are also known as retrospective studies?
 Cohort studies
 Descriptive studies
 Experimental studies
 Case control studies
 A person who harbors the microorganisms of a disease and excretes them without self suffering from symptoms is called;
 Reservoir
 Carrier
 Host
 Agent
 The modes of transmission of infectious diseases are as follow EXCEPT;
 Direct
 Indirect
 Physiological
 Biological
 Total number of deaths reported during a given time interval from estimated midinterval population is called;
 death rate
 Crude death rate
 mortality rate
 proportional mortality
 Number of live births reported during a given time interval from estimated midinterval population is called;
 Birth Rate
 Growth Rate
 Crude Fertility rate
 Crude Birth Rate
 Number of live births reported during a given time interval from estimated number of women age 15 to 44 years mid interval is known as;
 Crude Fertility Rate
 Birth Rate
 Growth Rate
 Sex ratio
 Number of current cases(new and old) of specified disease identified over a given time interval from estimated population at mid interval is called;
 Prevalence
 Period Prevalence
 Point Prevalence
 Disease Prevalence
 Use of statistics to analyze characteristics or changes to a population is termed as;
 population Pyramid
 vital statistics
 Population statistics
 Population dynamics
 Measure of the frequency of occurrence of death in a defined population during a specified interval is called;
 Crude death rate
 Mortality Rate
 Death ratio
 Mortality
 Public health surveillance DOES NOT consists on the following step;
 Systematic collection
 Analysis
 Planning
 Interpretation
 Surveillance system information cycles include;
 Family and community
 Public, Health care provider and Health agencies
 None of the above
 Public, Health care provider only
 A state of disorder that results from communication ONLY by direct contact is termed as;
 Infectious disease
 Contamination
 Epidemic
 Contagious disease
 Which of the following is NOT a basic measurement in epidemiology;
 Rate
 Nominator
 Ratio
 Proportion
 Measurement of current status of disease is termed as;
 Prevalence
 Incidence
 Cumulative Incidence
 Mid interval population
 The number of new cases occurring in a defined population during a specified period of time is called;
 Prevalence
 Incidence
 a and b
 Cumulative incidence
 Which of the following is also known as prospective study;
 Cohort studies
 Descriptive studies
 Experimental studies
 Case control studies
 Which of the following ratio provide us an estimate of risk in case control study;
 Odd ratio
 Sex ratio
 Disease ratio
 Dependency ratio
 The entire group of people or elements that have at least one thing is common is known as;
 Sample
 Parameter
 Hypothesis
 Population
 Sampling done on the basis of some pre determined ideas and its result can not be generalized is defined as follow;
 Snow ball sampling
 Purposive sampling
 Probability sampling
 Nonprobability sampling
 Graphical illustration that shows the distribution of various age groups in population is known as;
 Dependency Ratio
 Age Ratio
 Population Pyramid
 Population Dynamics
 Ratio of population who are economically not active to those who are economically active can be defined as;
 Dependency Ratio
 Age Ratio
 Population Ratio
 Risk benefit ratio
 In which of the following sampling there is a minimum chance of bias and equally chances of being selected for study.
 Accidental Sampling
 Simple Random Sampling
 Purposive Sampling
 Snow ball Sampling
 In study if we are selecting every seventh subject it comes under which of the following sampling method?
 Stratified Sampling
 Quota Sampling
 Systematic Sampling
 Purposive Sampling
 Systematic errors produced by your sampling procedure is known as;
 Sampling bias
 Sampling errors
 Non sampling errors
 Random error
 The profile of single patient is reported in detail by one or more clinicians is called as follow;
 Case control study
 Case Series
 Investigation
 Case Report
 In which of the following study we compare one group among whom the problem is present and another group where problem is absent?
 Case control study
 Case Series
 Cohort study
 Case Report
Category Archives: Epidemiology
Epidemiology MCQs
1. Which of the following is NOT a part of continuum of natural history of the disease?
a) Stage of Susceptibility
b) Stage of preclinical
c) Stage of prevention
d) Stage of recovery
2. Which of the following is also known as retrospective studies?
a) Cohort studies
b) Descriptive studies
c) Experimental studies
d) Case control studies
3. Total number of deaths reported during a given time interval from estimated midinterval population is called;
a) death rate
b) Crude death rate
c) mortality rate
d) proportional mortality
4. Number of live births reported during a given time interval from estimated midinterval population is called;
a) Birth Rate
b) Growth Rate
c) Crude Fertility rate
d) Crude Birth Rate
5. Number of live births reported during a given time interval from estimated number of women age 15 to 44 years mid interval is known as;
a) Crude Fertility Rate
b) Birth Rate
c) Growth Rate
d) Sex ratio
6. Number of current cases(new and old) of specified disease identified over a given time interval from estimated population at mid interval is called;
a) Prevalence
b) Period Prevalence
c) Point Prevalence
d) Disease Prevalence
7. Use of statistics to analyze characteristics or changes to a population is termed as;
a) population Pyramid
b) vital statistics
c) Population statistics
d) Population dynamics
8. Which of the following term provides true representation of whole population?
a) Sampling
b) Random Sampling
c) Case reporting
d) Sample
9. Measure of the frequency of occurrence of death in a defined population during a specified interval is called;
a) Crude death rate
b) Mortality Rate
c) Death ratio
d) Mortality
10. Public health surveillance DOES NOT consists on the following step;
a) Systematic collection
b) Analysis
c) Planning
d) Interpretation
11. Surveillance system information cycles include;
a) Family and community
b) Public, Health care provider and Health agencies
c) None of the above
d) Public, Health care provider only
12. Epidemiology can be defined as follow EXCEPT;
a) Distribution of health related states
b) Community leaders and their family crises
c) Determinant of health related events
d) Apply to the control of health problems
13. A state of disorder that results from communication ONLY by direct contact is termed as;
a) Infectious disease
b) Contamination
c) Epidemic
d) Contagious disease
14. Which of the following is NOT a basic measurement in epidemiology;
a) Rate
b) Nominator
c) Ratio
d) Proportion
15. Which of the following is usually expressed as percentage;
a) Rate
b) Nominator
c) Ratio
d) Proportion
16. Measurement of disease, disability or death and converting this information in to rates and ratio is defined as;
a) Specificity
b) Screening
c) Frequency
d) Sensitivity
17. Measurement of current status of disease is termed as;
a) Prevalence
b) Incidence
c) Cumulative Incidence
d) Mid interval population
18. A person who harbors the microorganisms of a disease and excretes them without self suffering from symptoms is called;
a) Reservoir
b) Carrier
c) Host
d) Agent
19. The modes of transmission of infectious diseases are as follow EXCEPT;
a) Direct
b) Indirect
c) Physiological
d) Biological
20. The number of new cases occurring in a defined population during a specified period of time is called;
a) Prevalence
b) Incidence
c) a and b
d) Cumulative incidence
21. Epidemiological methods can be categorized as follow;
a) Descriptive, cohort and case control
b) Descriptive, cross sectional and experimental
c) Descriptive, prospective and experimental
d) Descriptive, Analytical and experimental
22. In descriptive epidemiology disease described in terms of;
a) What, Why and How
b) Host, Agent and Environment
c) Time, Place and Person
d) Agent, Place and Person
23. Which of the following is also known as prospective study;
a) Cohort studies
b) Descriptive studies
c) Experimental studies
d) Case control studies
24. In epidemiological triad environmental factors can be classified as;
a) Physical
b) Chemical
c) Social
d) Biological
25. Which of the following ratio provide us an estimate of risk in case control study;
a) Odd ratio
b) Sex ratio
c) Disease ratio
d) Dependency ratio
26. The entire group of people or elements that have at least one thing is common is known as;
a) Sample
b) Parameter
c) Hypothesis
d) Population
27. Sampling done on the basis of some pre determined ideas and its result can not be generalized is defined as follow;
a) Snow ball sampling
b) Purposive sampling
c) Probability sampling
d) Nonprobability sampling
28. Tertiary prevention includes;
a) Disability limitation
b) Prompt treatment
c) Rehabilitation
d) a and c
e) a and b
29. Agents such as vitamins, protein, fat etc. are an examples of;
a) Physical Agents
b) Nutritive Agents
c) Chemical Agents
d) All of the above
30. Which of the following are key components of Epidemiological triangle,
a) Host, Agent and Physical Environment
b) Host, Genes and Physical Environment
c) Host, Agent and Environment
d) None of the above
31. Tertiary prevention Does not includes;
a) Disability limitation
b) Prompt treatment
c) Rehabilitation
d) a and c
32. Agents such as vitamins, protein, fat etc. are an examples of;
a) Physical Agents
b) Nutritive Agents
c) Chemical Agents
d) All of the above
33. Which of the following are not key components of Epidemiological triangle,
a) Host and Agent
b) Host and Environment
c) Host, Agent and Environment
d) Time, Place and Person
34. Which of the following is a part of continuum of natural history of the disease?
a) Stage of health promotion
b) Stage of prevention
c) Stage of Recovery
d) Stage of sampling
35. Which of the following are also known as retrospective studies?
a) Cohort studies
b) Descriptive studies
c) Experimental studies
d) Case control studies
36. A person who harbors the microorganisms of a disease and excretes them without self suffering from symptoms is called;
a) Reservoir
b) Carrier
c) Host
d) Agent
37. The modes of transmission of infectious diseases are as follow EXCEPT;
a) Direct
b) Indirect
c) Physiological
d) Biological
38. Total number of deaths reported during a given time interval from estimated midinterval population is called;
a) death rate
b) Crude death rate
c) mortality rate
d) proportional mortality
39. Number of live births reported during a given time interval from estimated midinterval population is called;
a) Birth Rate
b) Growth Rate
c) Crude Fertility rate
d) Crude Birth Rate
40. Number of live births reported during a given time interval from estimated number of women age 15 to 44 years mid interval is known as;
a) Crude Fertility Rate
b) Birth Rate
c) Growth Rate
d) Sex ratio
41. Number of current cases(new and old) of specified disease identified over a given time interval from estimated population at mid interval is called;
a) Prevalence
b) Period Prevalence
c) Point Prevalence
d) Disease Prevalence
42. Use of statistics to analyze characteristics or changes to a population is termed as;
a) population Pyramid
b) vital statistics
c) Population statistics
d) Population dynamics
43. Measure of the frequency of occurrence of death in a defined population during a specified interval is called;
a) Crude death rate
b) Mortality Rate
c) Death ratio
d) Mortality
44. Public health surveillance DOES NOT consists on the following step;
a) Systematic collection
b) Analysis
c) Planning
d) Interpretation
45. Surveillance system information cycles include;
a) Family and community
b) Public, Health care provider and Health agencies
c) None of the above
d) Public, Health care provider only
46. A state of disorder that results from communication ONLY by direct contact is termed as;
a) Infectious disease
b) Contamination
c) Epidemic
d) Contagious disease
47. Which of the following is NOT a basic measurement in epidemiology;
a) Rate
b) Nominator
c) Ratio
d) Proportion
48. Measurement of current status of disease is termed as;
a) Prevalence
b) Incidence
c) Cumulative Incidence
d) Mid interval population
49. The number of new cases occurring in a defined population during a specified period of time is called;
a) Prevalence
b) Incidence
c) a and b
d) Cumulative incidence
50. Which of the following is also known as prospective study;
a) Cohort studies
b) Descriptive studies
c) Experimental studies
d) Case control studies
51. Which of the following ratio provide us an estimate of risk in case control study;
a) Odd ratio
b) Sex ratio
c) Disease ratio
d) Dependency ratio
52. The entire group of people or elements that have at least one thing is common is known as;
a) Sample
b) Parameter
c) Hypothesis
d) Population
53. Sampling done on the basis of some pre determined ideas and its result can not be generalized is defined as follow;
a) Snow ball sampling
b) Purposive sampling
c) Probability sampling
d) Nonprobability sampling
54. Graphical illustration that shows the distribution of various age groups in population is known as;
a) Dependency Ratio
b) Age Ratio
c) Population Pyramid
d) Population Dynamics
55. Ratio of population who are economically not active to those who are economically active can be defined as;
a) Dependency Ratio
b) Age Ratio
c) Population Ratio
d) Risk benefit ratio
56. In which of the following sampling there is a minimum chance of bias and equally chances of being selected for study.
a) Accidental Sampling
b) Simple Random Sampling
c) Purposive Sampling
d) Snow ball Sampling
57. In study if we are selecting every seventh subject it comes under which of the following sampling method?
a) Stratified Sampling
b) Quota Sampling
c) Systematic Sampling
d) Purposive Sampling
58. Systematic errors produced by your sampling procedure is known as;
a) Sampling bias
b) Sampling errors
c) Non sampling errors
d) Random error
59. The profile of single patient is reported in detail by one or more clinicians is called as follow;
a) Case control study
b) Case Series
c) Investigation
d) Case Report
60. In which of the following study we compare one group among whom the problem is present and another group where problem is absent?
a) Case control study
b) Case Series
c) Cohort study
d) Case Report
Answer key:
1. C
2. D
3. B
4. D
5. A
6. B
7. C
8. D
9. B
10. C
11. B
12. B
13. D
14. B
15. D
16. C
17. A
18. B
19. C
20. B
21. D
22. C
23. A
24. B
25. A
26. D
27. B
28. D
29. B
30. C
31. B
32. B
33. D
34. C
35. D
36. B
37. C
38. B
39. D
40. A
41. B
42. C
43. B
44. C
45. B
46. D
47. B
48. A
49. B
50. A
51. A
52. D
53. B
54. C
55. A
56. B
57. C
58. A
59. D
60. A
Inferential Statistics
Statistical inference is the procedure by which we reach a conclusion about a population on the basis of the information contained in a sample drawn from that population. It consists of two techniques:
 Estimation of parameters
 Hypothesis testing
ESTIMATION OF PARAMETERS
The process of estimation entails calculating, from the data of a sample, some statistic that is offered as an approximation of the corresponding parameter of the population from which the sample was drawn.
Parameter estimation is used to estimate a single parameter, like a mean.
There are two types of estimates
 Point Estimates
 Interval Estimates (Confidence Interval).
POINT ESTIMATES
A point estimate is a single numerical value used to estimate the corresponding population parameter.
For example: the sample mean ‘x’ is a point estimate of the population mean μ. the sample variance S^{2} is a point estimate of the population variance σ^{2}. These are point estimates — a single–valued guess of the parametric value.
A good estimator must satisfy three conditions:
 Unbiased: The expected value of the estimator must be equal to the mean of the parameter
 Consistent: The value of the estimator approaches the value of the parameter as the sample size increases
 Relatively Efficient: The estimator has the smallest variance of all estimators which could be used
CONFIDENCE INTERVAL (Interval Estimates)
An interval estimate consists of two numerical values defining a range of values that, with a specified degree of confidence, most likely includes the parameter being estimated.
Interval estimation of a parameter is more useful because it indicates a range of values within which the parameter has a specified probability of lying. With interval estimation, researchers construct a confidence interval around estimate; the upper and lower limits are called confidence limits.
Interval estimates provide a range of values for a parameter value, within which we have a stated degree of confidence that the parameter lies. A numeric range, based on a statistic and its sampling distribution that contains the population parameter of interest with a specified probability.
A confidence interval gives an estimated range of values which is likely to include an unknown population parameter, the estimated range being calculated from a given set of sample data
Calculating confidence interval when n ≥ 30 (Single Population Mean)
Example: A random sample of size 64 with mean 25 & Standard Deviation 4 is taken from a normal population. Construct 95 % confidence interval
We use following formula to solve Confidence Interval when n ≥ 30
Data
 = 25
= 4
n = 64
25 4/ . x 1.96
25 4/8 x 1.96
25 0.5 x 1.96
25 0.98
25 – 0.98 ≤ µ ≤ 25 + 0.98
24.02≤ µ ≤ 25.98
We are 95% confident that population mean (µ) will have value between 24.02 & 25.98
Calculating confidence interval when n < 30 (Single Population Mean)
Example: A random sample of size 9 with mean 25 & Standard Deviation 4 is taken from a normal population. Construct 95 % confidence interval
We use following formula to solve Confidence Interval when n < 30
(OR)
Data
 = 25
S = 4
n = 9
α/2 = 0.025
df = n – 1 (9 1 = 8)
t_{α/2,df} = 2.306
25 ± 4/√9 x 2.306
25 ± 4/3 x 2.306
25 ± 1.33 x 2.306
25 ± 3.07
25 – 3.07 ≤ µ ≤ 25 + 3.07
21.93 ≤ µ ≤ 28.07
We are 95% confident that population mean (µ) will have value between 21.93 & 28.07
Hypothesis:
A hypothesis may be defined simply as a statement about one or more populations. It is frequently concerned with the parameters of the populations about which the statement is made.
Types of Hypotheses
Researchers are concerned with two types of hypotheses
 Research hypotheses
The research hypothesis is the conjecture or supposition that motivates the research. It may be the result of years of observation on the part of the researcher.
 Statistical hypotheses
Statistical hypotheses are hypotheses that are stated in such a way that they may be evaluated by appropriate statistical techniques.
Types of statistical Hypothesis
There are two statistical hypotheses involved in hypothesis testing, and these should be stated explicitly.
 Null Hypothesis:
The null hypothesis is the hypothesis to be tested. It is designated by the symbol H_{o.} The null hypothesis is sometimes referred to as a hypothesis of no difference, since it is a statement of agreement with (or no difference from) conditions presumed to be true in the population of interest.
In general, the null hypothesis is set up for the express purpose of being discredited. Consequently, the complement of the conclusion that the researcher is seeking to reach becomes the statement of the null hypothesis. In the testing process the null hypothesis either is rejected or is not rejected. If the null hypothesis is not rejected, we will say that the data on which the test is based do not provide sufficient evidence to cause rejection. If the testing procedure leads to rejection, we will say that the data at hand are not compatible with the null hypothesis, but are supportive of some other hypothesis.
 Alternative Hypothesis
The alternative hypothesis is a statement of what we will believe is true if our sample data cause us to reject the null hypothesis. Usually the alternative hypothesis and the research hypothesis are the same, and in fact the two terms are used interchangeably. We shall designate the alternative hypothesis by the symbol H_{A }orH_{1.}
LEVEL OF SIGNIFICANCE
The level of significance is a probability and, in fact, is the probability of rejecting a true null hypothesis. The level of significance specifies the area under the curve of the distribution of the test statistic that is above the values on the horizontal axis constituting the rejection region. It is denoted by ‘α’.
Types of Error
In the context of testing of hypotheses, there are basically two types of errors:
 TYPE I Error
 TYPE II Error
Type I Error
 A type I error, also known as an error of the first kind, occurs when the null hypothesis (H_{0}) is true, but is rejected.
 A type I error may be compared with a so called false positive.
 The rate of the type I error is called the size of the test and denoted by the Greek letter α (alpha).
 It usually equals the significance level of a test.
 If type I error is fixed at 5 %, it means that there are about 5 chances in 100 that we will reject H_{0} when H_{0} is true.
Type II Error
 Type II error, also known as an error of the second kind, occurs when the null hypothesis is false, but erroneously fails to be rejected.
 Type II error means accepting the hypothesis which should have been rejected.
 A Type II error is committed when we fail to believe a truth.
 A type II error occurs when one rejects the alternative hypothesis (fails to reject the null hypothesis) when the alternative hypothesis is true.
 The rate of the type II error is denoted by the Greek letter β (beta) and related to the power of a test (which equals 1β ).
In the tabular form two errors can be presented as follows:
Null hypothesis (H_{0}) is true  Null hypothesis (H_{0}) is false  
Reject null hypothesis  Type I error False positive 
Correct outcome True positive 
Fail to reject null hypothesis  Correct outcome True negative 
Type II error False negative 
Graphical depiction of the relation between Type I and Type II errors
What are the differences between Type 1 errors and Type 2 errors?
Type 1 Error  Type 2 Error 


Reducing Type I Errors
 Prescriptive testing is used to increase the level of confidence, which in turn reduces Type I errors. The chances of making a Type I error are reduced by increasing the level of confidence.
Reducing Type II Errors
 Descriptive testing is used to better describe the test condition and acceptance criteria, which in turn reduces type ii errors. This increases the number of times we reject the null hypothesis – with a resulting increase in the number of type I errors (rejecting H_{0} when it was really true and should not have been rejected).
 Therefore, reducing one type of error comes at the expense of increasing the other type of error! The same means cannot reduce both types of errors simultaneously.
Power of Test:
Statistical power is defined as the probability of rejecting the null hypothesis while the alternative hypothesis is true.
Power = P(reject H_{0}  H_{1} is true)
= 1 – P(type II error)
= 1 – β
That is, the power of a hypothesis test is the probability that it will reject when it’s supposed to.
Distribution under H_{0}
Distribution under H_{1}
Power 
Factors that affect statistical power include
 The sample size
 The specification of the parameter(s) in the null and alternative hypothesis, i.e. how far they are from each other, the precision or uncertainty the researcher allows for the study (generally the confidence or significance level)
 The distribution of the parameter to be estimated. For example, if a researcher knows that the statistics in the study follow a Z or standard normal distribution, there are two parameters that he/she needs to estimate, the population mean (μ) and the population variance (σ^{2}). Most of the time, the researcher know one of the parameters and need to estimate the other. If that is not the case, some other distribution may be used, for example, if the researcher does not know the population variance, he/she can estimate it using the sample variance and that ends up with using a T distribution.
Application:
In research, statistical power is generally calculated for two purposes.
 It can be calculated before data collection based on information from previous research to decide the sample size needed for the study.
 It can also be calculated after data analysis. It usually happens when the result turns out to be nonsignificant. In this case, statistical power is calculated to verify whether the nonsignificant result is due to really no relation in the sample or due to a lack of statistical power.
Relation with sample size:
Statistical power is positively correlated with the sample size, which means that given the level of the other factors, a larger sample size gives greater power. However, researchers are also faced with the decision to make a difference between statistical difference and scientific difference. Although a larger sample size enables researchers to find smaller difference statistically significant, that difference may not be large enough be scientifically meaningful. Therefore, this would be recommended that researcher have an idea of what they would expect to be a scientifically meaningful difference before doing a power analysis to determine the actual sample size needed.
HYPOTHESIS TESTING
Statistical hypothesis testing provides objective criteria for deciding whether hypotheses are supported by empirical evidence.
The purpose of hypothesis testing is to aid the clinician, researcher, or administrator in reaching a conclusion concerning a population by examining a sample from that population.
STEPS IN STATISTICAL HYPOTHESIS TESTING
Step # 01: State the Null hypothesis and Alternative hypothesis.
The alternative hypothesis represents what the researcher is trying to prove. The null hypothesis represents the negation of what the researcher is trying to prove.
Step # 02: State the significance level, α (0.01, 0.05, or 0.1), for the test
The significance level is the probability of making a Type I error. A Type I Error is a decision in favor of the alternative hypothesis when, in fact, the null hypothesis is true.
Type II Error is a decision to fail to reject the null hypothesis when, in fact, the null hypothesis is false.
Step # 03: State the test statistic that will be used to conduct the hypothesis test
The appropriate test statistic for different kinds of hypothesis tests (i.e. ttest, ztest, ANOVA, Chisquare etc.) are stated in this step
Step # 04: Computation/ calculation of test statistic
Different kinds of hypothesis tests (i.e. ttest, ztest, ANOVA, Chisquare etc.) are computed in this step.
Step # 05: Find Critical Value or Rejection (critical) Region of the test
Use the value of α (0.01, 0.05, or 0.1) from Step # 02 and the distribution of the test statistics from Step # 03.
Step # 06: Conclusion (Making statistical decision and interpretation of results)
If calculated value of test statistics falls in the rejection (critical) region, the null hypothesis is rejected, while, if calculated value of test statistics falls in the acceptance (noncritical) region, the null hypothesis is not rejected i.e. it is accepted.
Note: In case if we conclude on the basis of pvalue then we compare calculated pvalue to the chosen level of significance. If pvalue is less than α, then the null hypothesis will be rejected and alternative will be affirmed. If pvalue is greater than α, then the null hypothesis will not be rejected
If the decision is to reject, the statement of the conclusion should read as follows: “we reject at the _______ level of significance. There is sufficient evidence to conclude that (statement of alternative hypothesis.)”
If the decision is to fail to reject, the statement of the conclusion should read as follows: “we fail to reject at the _______ level of significance. There is no sufficient evidence to conclude that (statement of alternative hypothesis.)”
Rules for Stating Statistical Hypotheses
When hypotheses are stated, an indication of equality (either = ,≤ or ≥ ) must appear in the null hypothesis.
Example:
We want to answer the question: Can we conclude that a certain population mean is not 50? The null hypothesis is
H_{o} : µ = 50
And the alternative is
H_{A} : µ ≠ 50
Suppose we want to know if we can conclude that the population mean is greater than
50. Our hypotheses are
H_{o}: µ ≤ 50
H_{A}: µ >
If we want to know if we can conclude that the population mean is less than 50, the hypotheses are
H_{o} : µ ≥ 50
H_{A}: µ < 50
We may state the following rules of thumb for deciding what statement goes in the null hypothesis and what statement goes in the alternative hypothesis:
 What you hope or expect to be able to conclude as a result of the test usually should be placed in the alternative hypothesis.
 The null hypothesis should contain a statement of equality, either = ,≤ or ≥.
 The null hypothesis is the hypothesis that is tested.
 The null and alternative hypotheses are complementary. That is, the two together exhaust all possibilities regarding the value that the hypothesized parameter can assume.
T TEST
Ttest is used to test hypotheses about μ when the population standard deviation is unknown and Sample size can be small (n<30).
The distribution is symmetrical, bellshaped, and similar to the normal but more spread out.
Calculating one sample ttest
Example: A random sample of size 16 with mean 25 and Standard Deviation 5 is taken from a normal population Test at 5% LOS that; : µ= 22
: µ≠22
SOLUTION
Step # 01: State the Null hypothesis and Alternative hypothesis.
: µ= 22
: µ≠22
Step # 02: State the significance level
α = 0.05 or 5% Level of Significance
Step # 03: State the test statistic (n<30)
ttest statistic
Step # 04: Computation/ calculation of test statistic
Data
 = 25
µ = 22
S = 5
n = 16
t _{calculated} = 2.4
Step # 05: Find Critical Value or Rejection (critical) Region
For critical value we find and on the basis of its answer we see critical value from tdistribution table.
Critical value = α/2(v = 161)
= 0.05/2(v = 15)
= (0.025, 15)
t _{tabulated }= ± 2.131
t _{calculated} = 2.4
Step # 06: Conclusion: Since t _{calculated} = 2.4 falls in the region of rejection therefore we reject at the 5% level of significance. There is sufficient evidence to conclude that Population mean is not equal to 22.
Z TEST
 Ztest is applied when the distribution is normal and the population standard deviation σ is known or when the sample size n is large (n ≥ 30) and with unknown σ (by taking S as estimator of σ).
 Ztest is used to test hypotheses about μ when the population standard deviation is known and population distribution is normal or sample size is large (n ≥ 30)
Calculating one sample ztest
Example: A random sample of size 49 with mean 32 is taken from a normal population whose standard deviation is 4. Test at 5% LOS that : µ= 25
: µ≠25
SOLUTION
Step # 01: : µ= 25
: µ≠25
Step # 02: α = 0.05
Step # 03:Since (n<30), we apply ztest statistic
Step # 04: Calculation of test statistic
Data
 = 32
µ = 25
= 4
n = 49
Z_{calculated} = 12.28
Step # 05: Find Critical Value or Rejection (critical) Region
Critical Value (5%) (2tail) = ±1.96
Z_{calculated} = 12.28
Step # 06: Conclusion: Since Z_{calculated} = 12.28 falls in the region of rejection therefore we reject at the 5% level of significance. There is sufficient evidence to conclude that Population mean is not equal to 25.
CHISQUARE
A statistic which measures the discrepancy (difference) between KObserved Frequencies f_{o}1, f_{o}2… f_{o}k and the corresponding ExpectedFrequencies f_{e}1, f_{e}2……. f_{e}k
The chisquare is useful in making statistical inferences about categorical data in whichthe categories are two and above.
Characteristics
 Every χ2 distribution extends indefinitely to the right from 0.
 Every χ2 distribution has only one (right sided) tail.
 As df increases, the χ2 curves get more bell shaped and approach the normal curve in appearance (but remember that a chi square curvestarts at 0, not at – ∞ )
Calculating ChiSquare
Example 1: census of U.S. determine four categories of doctors practiced in different areas as
Specialty  %  Probability 
General Practice  18%  0.18 
Medical  33.9 %  0.339 
Surgical  27 %  0.27 
Others  21.1 %  0.211 
Total  100 %  1.000 
A searcher conduct a test after 5 years to check this data for changes and select 500 doctors and asked their speciality. The result were:
Specialty  frequency 
General Practice  80 
Medical  162 
Surgical  156 
Others  102 
Total  500 
Hypothesis testing:
Step 01”
Null Hypothesis (H_{o}):
There is no difference in specialty distribution (or) the current specialty distribution of US physician is same as declared in the census.
Alternative Hypothesis (H_{A}):
There is difference in specialty distribution of US doctors. (or) the current specialty distribution of US physician is different as declared in the census.
Step 02: Level of Significance
α = 0.05
Step # 03:Chisquire Test Statistic
Step # 04:
Statistical Calculation
fe (80) = 18 % x 500 = 90
fe (162) = 33.9 % x 500 = 169.5
fe (156) = 27 % x 500 = 135
fe (102) = 21.1 % x 500 = 105.5
S # (n)  Specialty  f_{o}  f_{e}  (f_{o} – f_{e})  (f_{o} – f_{e})^{ 2}  (f_{o} – f_{e})^{ 2 }/ f_{e} 
1  General Practice  80  90  10  100  1.11 
2  Medical  162  169.5  7.5  56.25  0.33 
3  Surgical  156  135  21  441  3.26 
4  Others  102  105.5  3.5  12.25  0.116 
4.816 
χ^{2}_{cal}= = 4.816
Step # 05:
Find critical region using X^{2}– chisquire distribution table
χ^{2 } = χ^{2 }= χ^{2} = 7.815
^{tab} ^{(α,d.f) (0.05,3)}
(d.f = n – 1)
Step # 06:
Conclusion: Since χ^{2}_{cal }value lies in the region of acceptance therefore we accept the H_{O }and reject H_{A}. There is no difference in specialty distribution among U.S. doctors.
Example2: A sample of 150 chronic Carriers of certain antigen and a sample of 500 Noncarriers revealed the following blood group distributions. Can one conclude from these data that the two population from which samples were drawn differ with respect to blood group distribution? Let α = 0.05.
Blood Group  Carriers  Noncarriers  Total 
O  72  230  302 
A  54  192  246 
B  16  63  79 
AB  8  15  23 
Total  150  500  650 
Hypothesis Testing
Step # 01: H_{O}: There is no association b/w Antigen and Blood Group
H_{A}: There is some association b/w Antigen and Blood Group
Step # 02:α = 0.05
Step # 03:Chisquire Test Statistic
Step # 04:
Calculation
f_{e }(72) = 302*150/650 = 70
f_{e }(230) = 302*500/ 650 = 232
f_{e }(54) = 246*150/650 = 57
f_{e }(192) = 246*500/650 = 189
f_{e }(16) = 79*150/650 = 18
f_{e }(63) = 79*500/650 = 61
f_{e }(8) = 23*150/650 = 05
f_{e }(15) = 23*500/650 = 18
f_{o}  f_{e}  (f_{o} – f_{e})  (f_{o} – f_{e})^{ 2}  (f_{o} – f_{e})^{ 2 }/ f_{e} 
72  70  2  4  0.0571 
230  232  2  4  0.0172 
54  57  3  9  0.1578 
192  189  3  9  0.0476 
16  18  2  4  0.2222 
63  61  2  4  0.0655 
8  5  3  9  1.8 
15  18  3  9  0.5 
2.8674 
X^{2} = = 2.8674
X^{2}_{cal} = 2.8674
Step # 05:
Find critical region using X^{2}– chisquire distribution table
X^{2} = (α, d.f) = (0.05, 3) = 7.815
Step # 06:
Conclusion: Since X^{2}_{cal }value lies in the region of acceptance therefore we accept the H_{O }andreject H_{A}. Means There is no association b/w Antigen and Blood Group
WHAT IS TEST OF SIGNIFICANCE? WHY IT IS NECESSARY? MENTION NAMES OF IMPORTANT TESTS.
1. Test of significance
A procedure used to establish the validity of a claim by determining whether or not the test statistic falls in the critical region. If it does, the results are referred to as significant. This test is sometimes called the hypothesis test.
The methods of inference used to support or reject claims based on sample data are known as tests of significance.
Why it is necessary
A significance test is performed;
 To determine if an observed value of a statistic differs enough from a hypothesized value of a parameter
 To draw the inference that the hypothesized value of the parameter is not the true value. The hypothesized value of the parameter is called the “null hypothesis.”
Types of test of significance
 Parametric
 ttest (one sample & two sample)
 ztest (one sample & two Sample)
 Ftest.
 Nonparametric
 Chisquire test
 MannWhitney U test
 Coefficient of concordance (W)
 Median test
 KruskalWallis test
 Friedman test
 Rank difference methods (Spearman rho and Kendal’s tau)
P –Value:
A pvalue is the probability that the computed value of a test statistic is at least as extreme as a specified value of the test statistic when the null hypothesis is true. Thus, the p value is the smallest value of for which we can reject a null hypothesis.
Simply the p value for a test may be defined also as the smallest value of α for which the null hypothesis can be rejected.
The p value is a number that tells us how unusual our sample results are, given that the null hypothesis is true. A p value indicating that the sample results are not likely to have occurred, if the null hypothesis is true, provides justification for doubting the truth of the null hypothesis.
Test Decisions with pvalue
The decision about whether there is enough evidence to reject the null hypothesis can be made by comparing the pvalues to the value of α, the level of significance of the test.
A general rule worth remembering is:
 If the p value is less than or equal to, we reject the null hypothesis
 If the p value is greater than, we do not reject the null hypothesis.
If pvalue ≤ α reject the null hypothesis 
If pvalue ≥ α fail to reject the null hypothesis 
Observational Study:
An observational study is a scientific investigation in which neither the subjects under study nor any of the variables of interest are manipulated in any way.
An observational study, in other words, may be defined simply as an investigation that is not an experiment. The simplest form of observational study is one in which there are only two variables of interest. One of the variables is called the risk factor, or independent variable, and the other variable is referred to as the outcome, or dependent variable.
Risk Factor:
The term risk factor is used to designate a variable that is thought to be related to some outcome variable. The risk factor may be a suspected cause of some specific state of the outcome variable.
Types of Observational Studies
There are two basic types of observational studies, prospective studies and retrospective studies.
Prospective Study:
A prospective study is an observational study in which two random samples of subjects are selected. One sample consists of subjects who possess the risk factor, and the other sample consists of subjects who do not possess the risk factor. The subjects are followed into the future (that is, they are followed prospectively), and a record is kept on the number of subjects in each sample who, at some point in time, are classifiable into each of the categories of the outcome variable.
The data resulting from a prospective study involving two dichotomous variables can be displayed in a 2 x 2 contingency table that usually provides information regarding the number of subjects with and without the risk factor and the number who did and did not
Retrospective Study:
A retrospective study is the reverse of a prospective study. The samples are selected from those falling into the categories of the outcome variable. The investigator then looks back (that is, takes a retrospective look) at the subjects and determines which ones have (or had) and which ones do not have (or did not have) the risk factor.
From the data of a retrospective study we may construct a contingency table
Relative Risk:
Relative risk is the ratio of the risk of developing a disease among subjects with the risk factor to the risk of developing the disease among subjects without the risk factor.
We represent the relative risk from a prospective study symbolically as
We may construct a confidence interval for RR
100 (1 – α)%CI=
Where z_{α }is the twosided z value corresponding to the chosen confidence coefficient and X^{2}is computed by Equation
Interpretation of RR
 The value of RR may range anywhere between zero and infinity.
 A value of 1 indicates that there is no association between the status of the risk factor and the status of the dependent variable.
 A value of RR greater than 1 indicates that the risk of acquiring the disease is greater among subjects with the risk factor than among subjects without the risk factor.
 An RR value that is less than 1 indicates less risk of acquiring the disease among subjects with the risk factor than among subjects without the risk factor.
EXAMPLE
In a prospective study of pregnant women, Magann et al. (A16) collected extensive information on exercise level of lowrisk pregnant working women. A group of 217 women did no voluntary or mandatory exercise during the pregnancy, while a group of
238 women exercised extensively. One outcome variable of interest was experiencing preterm labor. The results are summarized in Table
Estimate the relative risk of preterm labor when pregnant women exercise extensively.
Solution:
By Equation
These data indicate that the risk of experiencing preterm labor when a woman exercises heavily is 1.1 times as great as it is among women who do not exercise at all.
Confidence Interval for RR
We compute the 95 percent confidence interval for RR as follows.
The lower and upper confidence limits are, respectively
= 0.65 and = 1.86
Conclusion:
Since the interval includes 1, we conclude, at the .05 level of significance, that the population risk may be 1. In other words, we conclude that, in the population, there may not be an increased risk of experiencing preterm labor when a pregnant woman exercises extensively.
Odds Ratio
An odds ratio (OR) is a measure of association between an exposure and an outcome. The OR represents the odds that an outcome will occur given a particular exposure, compared to the odds of the outcome occurring in the absence of that exposure.
It is the appropriate measure for comparing cases and controls in a retrospective study.
Odds:
The odds for success are the ratio of the probability of success to the probability of failure.
Two odds that we can calculate from data displayed as in contingency Table of retrospective study
 The odds of being a case (having the disease) to being a control (not having the disease) among subjects with the risk factor is [a/ (a +b)] / [b/ (a + b)] = a/b
 The odds of being a case (having the disease) to being a control (not having the disease) among subjects without the risk factor is [c/(c +d)] / [d/(c + d)] = c/d
The estimate of the population odds ratio is
We may construct a confidence interval for OR by the following method:
100 (1 – α) %CI=
Where is the twosided z value corresponding to the chosen confidence coefficient and X^{2} is computed by Equation
Interpretation of the Odds Ratio:
In the case of a rare disease, the population odds ratio provides a good approximation to the population relative risk. Consequently, the sample odds ratio, being an estimate of the population odds ratio, provides an indirect estimate of the population relative risk in the case of a rare disease.
 The odds ratio can assume values between zero and ∞.
 A value of 1 indicates no association between the risk factor and disease status.
 A value less than 1 indicates reduced odds of the disease among subjects with the risk factor.
 A value greater than 1 indicates increased odds of having the disease among subjects in whom the risk factor is present.
EXAMPLE
Toschke et al. (A17) collected data on obesity status of children ages 5–6 years and the smoking status of the mother during the pregnancy. Table below shows 3970 subjects classified as cases or noncases of obesity and also classified according to smoking status of the mother during pregnancy (the risk factor).
We wish to compare the odds of obesity at ages 5–6 among those whose mother smoked throughout the pregnancy with the odds of obesity at age 5–6 among those whose mother did not smoke during pregnancy.
Solution
By formula:
We see that obese children (cases) are 9.62 times as likely as nonobese children (noncases) to have had a mother who smoked throughout the pregnancy.
We compute the 95 percent confidence interval for OR as follows.
The lower and upper confidence limits for the population OR, respectively, are
= 7.12 and = = 13.00
We conclude with 95 percent confidence that the population OR is somewhere between
7.12 And 13.00. Because the interval does not include 1, we conclude that, in the population, obese children (cases) are more likely than nonobese children (noncases) to have had a mother who smoked throughout the pregnancy.
Measures of Dispersion
This term is used commonly to mean scatter, Deviation, Fluctuation, Spread or variability of data.
The degree to which the individual values of the variate scatter away from the average or the central value, is called a dispersion.
Types of Measures of Dispersions:
 Absolute Measures of Dispersion: The measures of dispersion which are expressed in terms of original units of a data are termed as Absolute Measures.
 Relative Measures of Dispersion: Relative measures of dispersion, are also known as coefficients of dispersion, are obtained as ratios or percentages. These are pure numbers independent of the units of measurement and used to compare two or more sets of data values.
Absolute Measures
 Range
 Quartile Deviation
 Mean Deviation
 Standard Deviation
Relative Measure
 Coefficient of Range
 Coefficient of Quartile Deviation
 Coefficient of mean Deviation
 Coefficient of Variation.
The Range:
1. The range is the simplest measure of dispersion. It is defined as the difference between the largest value and the smallest value in the data:
2. For grouped data, the range is defined as the difference between the upper class boundary (UCB) of the highest class and the lower class boundary (LCB) of the lowest class.
MERITS OF RANGE:
 Easiest to calculate and simplest to understand.
 Gives a quick answer.
DEMERITS OF RANGE:
 It gives a rough answer.
 It is not based on all observations.
 It changes from one sample to the next in a population.
 It can’t be calculated in openend distributions.
 It is affected by sampling fluctuations.
 It gives no indication how the values within the two extremes are distributed
Quartile Deviation (QD):
1. It is also known as the SemiInterquartile Range. The range is a poor measure of dispersion where extremely large values are present. The quartile deviation is defined half of the difference between the third and the first quartiles:
QD = Q_{3} – Q_{1}/2
InterQuartile Range
The difference between third and first quartiles is called the ‘InterQuartile Range’.
IQR = Q_{3} – Q_{1}
Mean Deviation (MD):
1. The MD is defined as the average of the deviations of the values from an average:
It is also known as Mean Absolute Deviation.
2. MD from median is expressed as follows:
3. for grouped data:
 The MD is simple to understand and to interpret.
 It is affected by the value of every observation.
 It is less affected by absolute deviations than the standard deviation.
 It is not suited to further mathematical treatment. It is, therefore, not as logical as convenient measure of dispersion as the SD.
The Variance:
 Mean of all squared deviations from the mean is called as variance
 (Sample variance=S^{2}; population variance= σ^{2}sigma squared (standard deviation squared). A high variance means most scores are far away from the mean, a low variance indicates most scores cluster tightly about the mean.
Formula
OR S^{2} =
Calculating variance: Heart rate of certain patient is 80, 84, 80, 72, 76, 88, 84, 80, 78, & 78. Calculate variance for this data.
Solution:
Step 1:
Find mean of this data
= 800/10 Mean = 80
Step 2:
Draw two Columns respectively ‘X’ and deviation about mean (X ). In column ‘X’ put all values of X and in (X ) subtract each ‘X’ value with .
Step 3:
Draw another Column of (X )^{ 2}, in which put square of deviation about mean.
X  (X )
Deviation about mean 
(X )^{2}
Square of Deviation about mean 
80
84 80 72 76 88 84 80 78 78 
80 – 80 = 0
84 – 80 = 4 80 – 80 = 0 72 – 80 = 8 76 – 80 = 4 88 – 80 = 8 84 – 80 = 4 80 – 80 = 0 78 – 80 = 2 78 – 80 = 2 
0 x 0 = 00
4 x 4 = 16 0 x 0 = 00 8 x 8 = 64 4 x 4 = 16 8 x 8 = 64 4 x 4 = 16 0 x 0 = 00 2 x 2 = 04 2 x 2 = 04 
∑X = 800
= 80 
∑(X ) = 0
Summation of Deviation about mean is always zero 
∑(X )2 = 184
Summation of Square of Deviation about mean 
Step 4
Apply formula and put following values
∑(X )^{ 2}= 184
n = 10
Variance = 184/ 101 = 184/9
Variance = 20.44
Standard Deviation
 The SD is defined as the positive Square root of the mean of the squared deviations of the values from their mean.
 The square root of the variance.
 It measures the spread of data around the mean. One standard deviation includes 68% of the values in a sample population and two standard deviations include 95% of the values & 3 standard deviations include 99.7% of the values
 The SD is affected by the value of every observation.
 In general, it is less affected by fluctuations of sampling than the other measures of dispersion.
 It has a definite mathematical meaning and is perfectly adaptable to algebraic treatment.
Formula:
OR S =
Calculating Standard Deviation (we use same example): Heart rate of certain patient is 80, 84, 80, 72, 76, 88, 84, 80, 78, & 78. Calculate standard deviation for this data.
SOLUTION:
Step 1: Find mean of this data
= 800/10 Mean = 80
Step 2:
Draw two Columns respectively ‘X’ and deviation about mean (X). In column ‘X’ put all values of X and in (X) subtract each ‘X’ value with.
Step 3:
Draw another Column of (X_{} )^{ 2}, in which put square of deviation about mean.
X  (X )
Deviation about mean 
(X )2
Square of Deviation about mean 
80
84 80 72 76 88 84 80 78 78 
80 – 80 = 0
84 – 80 = 4 80 – 80 = 0 72 – 80 = 8 76 – 80 = 4 88 – 80 = 8 84 – 80 = 4 80 – 80 = 0 78 – 80 = 2 78 – 80 = 2 
0 x 0 = 00
4 x 4 = 16 0 x 0 = 00 8 x 8 = 64 4 x 4 = 16 8 x 8 = 64 4 x 4 = 16 0 x 0 = 00 2 x 2 = 04 2 x 2 = 04 
∑X = 800
= 80 
∑(X ) = 0
Summation of Deviation about mean is always zero 
∑(X )2 = 184
Summation of Square of Deviation about mean 
Step 4
Apply formula and put following values
∑(X )2 = 184
n = 10
MERITS AND DEMERITS OF STD. DEVIATION
 Std. Dev. summarizes the deviation of a large distribution from mean in one figure used as a unit of variation.
 It indicates whether the variation of difference of a individual from the mean is real or by chance.
 Std. Dev. helps in finding the suitable size of sample for valid conclusions.
 It helps in calculating the Standard error.
DEMERITS
 It gives weightage to only extreme values. The process of squaring deviations and then taking square root involves lengthy calculations.
Relative measure of dispersion:
(a) Coefficient of Variation,
(b) Coefficient of Dispersion,
(c) Quartile Coefficient of Dispersion, and
(d) Mean Coefficient of Dispersion.
Coefficient of Variation (CV):
1. Coefficient of variation was introduced by Karl Pearson. The CV expresses the SD as a percentage in terms of AM:
————— For sample data
————— For population data
 It is frequently used in comparing dispersion of two or more series. It is also used as a criterion of consistent performance, the smaller the CV the more consistent is the performance.
 The disadvantage of CV is that it fails to be useful when is close to zero.
 It is sometimes also referred to as ‘coefficient of standard deviation’.
 It is used to determine the stability or consistency of a data.
 The higher the CV, the higher is instability or variability in data, and vice versa.
Coefficient of Dispersion (CD):
If X_{m} and X_{n} are respectively the maximum and the minimum values in a set of data, then the coefficient of dispersion is defined as:
Coefficient of Quartile Deviation (CQD):
1. If Q_{1} and Q_{3} are given for a set of data, then (Q_{1} + Q_{3})/2 is a measure of central tendency or average of data. Then the measure of relative dispersion for quartile deviation is expressed as follows:
CQD may also be expressed in percentage.
Mean Coefficient of Dispersion (CMD):
The relative measure for mean deviation is ‘mean coefficient of dispersion’ or ‘coefficient of mean deviation’:
——————– for arithmetic mean
——————– for median
Percentiles and Quartiles
The mean and median are special cases of a family of parameters known as location parameters. These descriptive measures are called location parameters because they can be used to designate certain positions on the horizontal axis when the distribution of a variable is graphed.
Percentile:
 Percentiles are numerical values that divide an ordered data set into 100 groups of values with at the most 1% of the data values in each group. There can be maximum 99 percentile in a data set.
 A percentile is a measure that tells us what percent of the total frequency scored at or below that measure.
Percentiles corresponding to a given data value: The percentile in a set corresponding to a specific data value is obtained by using the following formula
Number of values below X + 0.5
Percentile = ——————————————–
Number of total values in data set
Example: Calculate percentile for value 12 from the following data
13 11 10 13 11 10 8 12 9 9 8 9
Solution:
Step # 01: Arrange data values in ascending order from smallest to largest
S. No  1  2  3  4  5  6  7  8  9  10  11  12 
Observations or values  8  8  9  9  9  10  10  11  11  12  13  13 
Step # 02: The number of values below 12 is 9 and total number in the data set is 12
Step # 03: Use percentile formula
9 + 0.5
Percentile for 12 = ——— x 100 = 79.17%
12
It means the value of 12 corresponds to 79^{th} percentile
Example2: Find out 25^{th} percentile for the following data
6 12 18 12 13 8 13 11
10 16 13 11 10 10 2 14
SOLUTION
Step # 01: Arrange data values in ascending order from smallest to largest
S. No  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16 
Observations or values  2  6  8  10  10  10  11  11  12  12  13  13  13  14  16  18 
Step # 2 Calculate the position of percentile (n x k/ 100). Here n = No: of observation = 16 and k (percentile) = 25
16 x 25 16 x 1
Therefore Percentile = ——— = ——— = 4
100 4
Therefore, 25^{th} percentile will be the average of values located at the 4^{th} and 5^{th} position in the ordered set. Here values for 4^{th} and 5^{th} correspond to the value of 10 each.
(10 + 10)
Thus, P_{25} (=P_{k}) = ————– = 10
2
Quartiles
These are measures of position which divide the data into four equal parts when the data is arranged in ascending or descending order. The quartiles are denoted by Q.
Quartiles  Formula for Ungrouped Data  Formula for Grouped Data 
Q_{1} = First Quartile below which first 25% of the observations are present  
Q_{2} = Second Quartile below which first 50% of the observations are present.
It can easily be located as the median value. 

Q_{3} = Third Quartile below which first 75% of the observations are present 
Symbol Key:
Probability
Probability is used to measure the ‘likelihood’ or ‘chances’ of certain events (prespecified outcomes) of an experiment.
If an event can occur in N mutually exclusive and equally likely ways, and if m of these possess a trait E, the probability of the occurrence of E expressed as:
Number of favourable cases
=
Total number of outcome (sample Space)
Characteristics of probability:
 It is usually expressed by the symbol ‘P’
 It ranges from 0 to 1
 When P = 0, it means there is no chance of happening or impossible.
 If P = 1, it means the chances of an event happening is 100%.
 The total sum of probabilities of all the possible outcomes in a sample space is always equal to one (1).
 If the probability of occurrence is p(o)= A, then the probability of nonoccurrence is 1A.
Terminology
Random Experiment:
Any natural phenomenon, yielding some result will be termed as random experiment when it is not possible to predict a particular result to turn out.
An Outcome:
The result of an experiment in all possible form are said to be event of that experiment. e.g. When you toss a coin once, you either get head or tail.
A trial:
This refers to an activity of carrying out an experiment like tossing a coin or rolling a die or dices.
Sample Space:
A set of All possible outcomes of a probability experiment.
Example 1: In tossing a coin, the outcomes are either Head (H) or tail (T) i.e. there are only two possible outcomes in tossing a coin. The chances of obtaining a head or a tail are equal. It can be solved as follow;
n(s) = 2 ways
S = {H, T}
Example 2: what is sample space when single dice is rolled?
n(s) = 6 ways
S = {1, 2, 3, 4, 5, 6}
A Simple Event
In an experimental probability, an event with only one outcome is called a simple event.
Compound Events
When two or more events occur in connection with each other, then their simultaneous occurrence is called a compound event.
Mutually exhaustive:
If in an experiment the occurrence of one event prevents or rules out the happening of all other events in the same experiment then these event are said to be mutually exhaustive events.
Mutually exclusive:
Two events are said to be mutually exclusive if they cannot occur simultaneously.
Example: tossing a coin, the events head and tail are mutually exclusive because if the outcome is head then the possibilities of getting a tail in the same trial is ruled out.
Equally likely events:
Events are said to be equally likely if there is no reason to expect any one in preference to other.
Example: in a single cast of a fair die each of the events 1, 2, 3, 4, 5, 6 is equally likely to occur.
Favourable case:
The cases which ensure the occurrence of an event are said to be favourable to the events.
Independent event:
When the experiments are conducted in such a way that the occurrence of an event in one trial does not have any effect on the occurrence of the other events at a subsequent experiment, then the events are said to be independent.
Example:
If we draw a card from a pack of cards and again draw a second a card from the pack by replacing the first card drawn, the second draw is known as independent f the first.
Dependent Event:
When the experiments are conducted in such a way that the occurrence of an event in one trial does have some effect on the occurrence of the other events at a subsequent experiment, then the event are said to be dependent event.
Example:
If we draw a card from a pack and again draw a card from the rest of pack of cards (containing 51 cards) then the second draw is dependent on the first.
Conditional Probability:
The probability of happening of an event A, when it is known that B has already happened, is called conditional probability of A and is denoted by P (A/B) i.e.
 P(A/B) = conditional probability of A given that B has already occurred.
 P (A/B) = conditional Probability of B given that A has already occurred.
Types of Probability:
The Classical or mathematical:
Probability is the ratio of the number of favorable cases as compared to the total likely cases.
The probability of nonoccurrence of the same event is given by {1p (occurrence)}.
The probability of occurrence plus nonoccurrence is equal to one.
If probability occurrence; p (O) and probability of nonoccurrence (O’), then p(O)+p(O’)=1.
Statistical or Empirical
Empirical probability arises when frequency distributions are used. For example:
Observation ( X)  0  1  2  3  4 
Frequency ( f)  3  7  10  16  11 
The probability of observation (X) occurring 2 times is given by the formulae
RULES OF PROBABILITY
Addition Rule
 Rule 1: When two events A and B are mutually exclusive, then probability of any one of them is equal to the sum of the probabilities of the happening of the separate events;
Mathematically:
P (A or B) =P (A) +P (B)
Example: When a die or dice is rolled, find the probability of getting a 3 or 5.
Solution: P (3) =1/6 and P (5) =1/6.
Therefore P (3 or 5) = P (3) + P (5) = 1/6+1/6 =2/6=1/3.
2) Rule 2: If A and B are two events that are NOT mutually exclusive, then
P (A or B) = P(A) + P(B) – P(A and B), where A and B means the number of outcomes that event A and B have in common.
Given two events A and B, the probability that event A, or event B, or both occur is equal to the probability that event A occurs, plus the probability that event B occurs, minus the probability that the events occur simultaneously.
Example: When a card is drawn from a pack of 52 cards, find the probability that the card is a 10 or a heart.
Solution: P (10) = 4/52 and P (heart) =13/52
P (10 that is Heart) = 1/52
P (A or B) = P (A) +P (B)P (A and B) = 4/52 _ 13/52 – 1/52 = 16/52.
Multiplication Rule
 Rule 1: For two independent events A and B, then
P (A and B) = P (A) x P (B).
Example: Determine the probability of obtaining a 5 on a die and a tail on a coin in one throw.
Solution: P (5) =1/6 and P (T) =1/2.
P (5 and T) = P (5) x P (T) = 1/6 x ½= 1/12.
 Rule 2: When two events are dependent, the probability of both events occurring is P (A and B) =P (A) x P (BA), where P (BA) is the probability that event B occurs given that event A has already occurred.
Example: Find the probability of obtaining two Aces from a pack of 52 cards without replacement.
Solution: P (Ace) =2/52 and P (second Ace if NO replacement) = 3/51
Therefore P (Ace and Ace) = P (Ace) x P (Second Ace) = 4/52 x 3/51 = 1/221
Construct sample space, when two dice are rolled
n(s) = n_{1} x n_{2} = 6 x 6 = 36
(1,1)  (2,1)  (3,1)  (4,1)  (5,1)  (6,1) 
(1,2)  (2, 2)  (3, 2)  (4, 2)  (5, 2)  (6, 2) 
(1, 3)  (2, 3)  (3, 3)  (4, 3)  (5, 3)  (6, 3) 
(1, 4)  (2, 4)  (3, 4)  (4, 4)  (5, 4)  (6, 4) 
(1, 5)  (2, 5)  (3, 5)  (4, 5)  (5, 5)  (6, 5) 
(1, 6)  (2, 6)  (3, 6)  (4, 6)  (5, 6)  (6, 6) 
EXAMPLE OF FINDING PROBABILITY OF AN EVENT
If 3 coins are tossed together, construct a tree diagram & find the followings;
a) Event showing No head b) Event showing 01 head,
c) Event showing 02 heads d) Event showing 03 heads
n (s) = n_{1} x n_{2} x n_{3}
= 2 x 2 x2 = 8

 Event showing no head = P(X = 0)
Answer: TTT, 1/8 = 0.125

 Event showing 01 head = P(X = 1)
Answer: HTT, THT, TTH 3/8 = 0.375

 Event showing 02 heads = P(X = 2)
Answer: HHT, HTH, THH 3/8 = 0.375

 Event showing 03 heads = P(X = 3)
Answer: HHH 1/8 = 0.125
Complementary Events
Complementary events happen when there are only two outcomes, like getting a job, or not getting a job. In other words, the complement of an event happening is the exact opposite: the probability of it not happening.
The probability of not occurrence of an event.
The probability of an event A is equal to 1 minus the probability of its complement, which is written as Ā and
P (Ā) = 1 – P (A)
CONDITIONAL PROBABILITY &SCREENING TESTS
Sensitivity, Specificity, and Predictive Value Positive and Negative
In the health sciences field a widely used application of probability laws and concepts is found in the evaluation of screening tests and diagnostic criteria. Of interest to clinicians is an enhanced ability to correctly predict the presence or absence of a particular disease from knowledge of test results (positive or negative) and/or the status of presenting symptoms (present or absent). Also of interest is information regarding the likelihood of positive and negative test results and the likelihood of the presence or absence of a particular symptom in patients with and without a particular disease.
In consideration of screening tests, one must be aware of the fact that they are not always infallible. That is, a testing procedure may yield a false positive or a false negative.
False Positive:
A false positive results when a test indicates a positive status when the true status is negative.
False Negative:
A false negative results when a test indicates a negative status when the true status is positive.
Sensitivity:
The sensitivity of a test (or symptom) is the probability of a positive test result (or presence of the symptom) given the presence of the disease.
Specificity:
The specificity of a test (or symptom) is the probability of a negative test result (or absence of the symptom) given the absence of the disease.
Predictive value positive:
The predictive value positive of a screening test (or symptom) is the probability that a subject has the disease given that the subject has a positive screening test result (or has the symptom).
Predictive value negative:
The predictive value negative of a screening test (or symptom) is the probability that a subject does not have the disease, given that the subject has a negative screening test result (or does not have the symptom).
Summary of formulae:
Symbols
COUNTING RULES
1) FACTORIALS (number of ways)
The result of multiplying a sequence of descending natural numbers down to a number. It is denoted by “!”
Examples:
4! = 4 × 3 × 2 × 1×0! = 24
7! = 7 × 6 × 5 × 4 × 3 × 2 × 1 = 5040
Remember : 0! = 1
General Method:
n! = n (n 1) (n 2) (n 3)……….. (n – n)!
2) PERMUTATION RULES
All possible arrangements of a collection of things, where the order is important in a subset.
Repetition of same items with different arrangement is allowed.
Examples
 COMBINATIONS
The order of the objects in a subset is immaterial.
Repetition of same objects in not allowed with different arrangement
Examples:
Binomial distribution:
Binomial distribution is a probability distribution which is obtained when the probability ‘P’ of the happening of an event is same in all the trials and there are only two event in each trial.
Conditions:
 Each trial results in one of two possible, mutually exclusive, outcomes. One of the possible outcomes is denoted (arbitrarily) as a success, and the other is denoted a failure.
 The probability of a success, denoted by p, remains constant from trial to trial. The probability of a failure (1 – p) is denoted by q.
 The trials are independent; that is, the outcome of any particular trial is not affected by the outcome of any other trial.
 Parameter should be available; (n & p) are parameters.
Formula:
b (X: n, p) = ^{n}C_{x} p^{x} q^{n – x } (OR) f (x) = ^{n}C_{x} p^{x} q^{n – x}
Where
X = Random variable
n = Number of Trials
p = Probability of Success
q = Probability of Failure
Difference between “purpose”, “aim”, “target”, “goal”, “objective”, and “ambition”
These words are pretty similar and have only subtle differences and in spoken language many people might not be careful enough to use each of the words correctly. However I think the explanation from Longman Activator Thesaurus is quite helpful:
Purpose: what you want to achieve when you do something; the reason you do or plan something, and the thing you want to achieve when you do it: The games have an educational purpose.
Aim: something you hope to achieve by doing something: The main aim of the plan was to provide employment for local people.
Goal: something important that you hope to achieve in the future, even though it may take a long time: The country can still achieve its goal of reducing poverty by a third.
Target: the exact result that a person or organization intends to achieve by doing something, often the amount of money they want to get; a particular amount or total that you want to achieve: The company is on track to meet its target of increasing profits by 10%.
Objective: the specific thing that you are trying to achieve – used especially about things that have been officially discussed and agreed upon in business, politics, etc. and agreed upon in business, politics, etc.: Their main objective is to halt the flow of drugs.  We met to set the business objectives for the coming year.
Ambition: something that you very much want to achieve in your future career: Her ambition was to go to law school and become an attorney.  Earlier this year, he achieved his ambition of competing in the Olympic Games.
Biostatistics MCQsPartIII
61. The sum of the absolute deviation about mean for the values: 2, 4, 6, 8, and 10 is always:
a. Not equal to zero
b. 2
c. 10
d. Not possible.
62. The mean of a data is defined as:
a. The sum of the values is multiplied by the numbers of the values
b. The sum of the values divided by the numbers of the values
c. Divide every value by a constant number.
d. The square of values is divided by the numbers of the values.
63. The mean, median and mode the given values: 42, 42, 42, 42, 42, 42, are
a. Mean=42, median=44, mode=46
b. 12
c. The same value
d. 0
64. When we add or subtract any constant values in the original values then, it is known as:
a. Deviation about mean
b. Change of origin.
c. Change of scale.
d. Mean deviation
65. The square root of the mean of the square deviation about mean is known as:
a. The variance
b. Standard deviation
c. Central value.
d. The average value.
66. When pvalue is less than α (level of significance) then we: ———–
(a) Reject o H (b) accept o H
(c) None of these (d) Reject A H
67. The probability of any event is defined as the number of the favorable events divided by the number of the sample space. Sample space is defined as:
a. Even number of out comes.
b. Odd number of out comes.
c. All possible out comes of an Experiment.
d. None of all these.
68. A portion of the population selected for study is referred to as:
a. a sample
b. parameter .
c. Hypothesis.
d. Random variable.
69. A major purpose of doing research is to infer, or generalize, from a sample to a larger population this method is known as:
a. Sampling Design
b. Measures of dispersion.
c. Probability.
d. Testing of hypothesis.
70. Some characteristics are not capable of being measured in the sense that height, weight, and age are measured. These characteristics are categorized only, as for example, when an ill person is given a medical diagnosis, or a person is designated as belonging to an ethnic group. These variables are called:
a. Qualitative (categorical) variables
b. Random variable
c. Quantitative variable
d. Not possible.
71. If we have the values x1 = 80, x2 = 90, x3 = 100, x4 = 110, x5 =120.the mean of the data is:
a. 100
b. 0
c. 90
d. 120
72. The variance for the given values is:
xi  (xi – )2 
84  4 
95  121 
67  289 
92  64 
X = 
(a) 0 (b) 64
(c) 10 (d) 218.5
73. The coefficient of variation is a useful measure of relative spread in data and is used frequently in the biologic sciences. It is defined as the standard deviation divided by the mean times 100%. It produces a measure of relative variationvariation that is relative to the size of the mean. The formula is:
(a) Median *Mode
(b) S.d *mean
(c) Mean/ Variance
(d) sd/mean*100
74. The sum of the absolute deviation about mean is always:
a. Positive.
b. Negative
c. Zero and negative both at a time
d. Zero
75. If we add or subtract any constant value in the original data, this process is known as change of origin and similarly if we multiply or divide the original data by any constant then it is known as change of scale. The mean of the original observations is 10, if we add a constant 5 in each observation then mean will be:
a. 0
b. Same as 10
c. 15
d. 5
76. Which of the measures of variability is NOT dependent on the exact values of every measurement?
a. Mean deviation
b. Variance
c. Range
d. Standard deviation
77. The standard deviation divided by the mean of the measurements equals is known as:
a.
b. The coefficient of variation
c. 2
d. zero
78. Ztest is always used to test the population mean whether population variance is known or unknown when sample size n should be :—————
a. less than 30
b. equal or greater than 30
c. no condition
d. none of these
79. Using the given information’s
Groups  Mean  S.D  C.V 
A  80  12  15 
B  120  15  12.5 
The group is consistent.
(a) A (b) B
(c) A & B both. (d) Both are not consistent.
80. The mean of the absolute deviation about mean is known as:
a. variance
b. Standard deviation.
c. Mean deviation about mean.
d. Mean.
81. All possible outcomes of an experiment is known as sample space. When a coin is tossed 3 times then total sample space is
a. 0
b. 6
c. 8
d. 10
82. Two events A and B are said to be mutually exclusive events if and only if:
a. Both occur at a time.
b. only one occurs
c. Neither of them occurs
d. none of them
83. The probability of any event is defined as the number of the favorable events divided by the sample space.
a. The sum of the probabilities should be equal to one.
b. The probability of any event lies between 1 and +1.
c. The probability of any event can’t be negative.
d. The probability lies between 0 and 1.
84.
m 1 2
m 1
2f f f
(f f )* h
l is the formula for ——— for grouped data.
a. Mean
b. Median
c. Range
d. Mode
85. The minimum size of a Contingency table is : —————
a. 1×1
b. 2×2
c. 10×10
d. No minimum size
86. ttest is always used to test the population mean whether population variance is known or unknown when sample size n should be :—————
a. less than 30
b. equal or greater than 30
c. no condition
d. none of these
87. In a contingency table with 4 rows and 6 columns then degree of freedom is
a. 15
b. 24
c. 4
d. 6
88. The critical value for the Chisquare test with 2 degree of freedom at 5% level of significance is;
a. 2
b. 5.991
c. 0
d. 2.4
89. The ANOVA method is used to test the equality of more then two population means at a time the test statistic is used in this method is known as:———–
a. ttest
b. chisquare test
c. Ftest
d. ztest
90. In testing of hypothesis in order to test the equality of more than two population means at a time the ——————– method is used.
a. Analysis of variance
b. student ttest
c. Chisquare test
d. none of these
91. Random Sampling or Probability sampling includes all the following techniques, except:
a. Simple random sampling
b. Stratified random Sampling
c. Cluster sampling
d. Purposive Sampling
92. Gender, ageclass, religion, type of disease, and blood group are measured on;
a. Nominal Scale
b. Ordinal Scale
c. Interval Scale
d. Ratio Scale
93. Which scale of measurement has an absolute zero?
a. Nominal
b. Ordinal
c. Interval
d. Ratio
94. The variable which is influenced by the intervention of the researcher is called:
a. Independent
b. Dependent
c. Discrete
d. Extraneous
95. The statistical approach which helps the investigator to decide whether the outcome of the study is a result of factors planned within design of the study or determined by chance is called:
a. Descriptive statistics
b. Inferential statistics
c. Normal distribution
d. Standard deviation
96. Which of the following methods is a form of graphical presentation of data?
a. Line Diagram
b. Pie diagram
c. Bar diagram
d. Histogram
97. All the following are measures of central tendency, except:
a. Mean
b. Median
c. Mode
d. Variance
98. A measure of central tendency influenced by extreme scores & skewed distributions is;
a. Mean
b. Median
c. Mode
d. Range
99. A measure of central tendency in which is calculated by number arranging in numerical order is:
a. Standard deviation
b. Range
c. Median
d. Mode
100. The proportion of observations fall above the median is:
a. 68%
b. 50%
c. 75%
d. 95%
101. The indices used to measure variation or dispersion among scores are all, except:
a. Range
b. Variance
c. Standard deviation
d. Mean
102. A measure of dispersion of a set of observations in which it is calculated by the difference between the highest and lowest values produced is called:
a. Standard deviation
b. Variance
c. Range
d. Mode
103. A statistic which describes the interval of scores bounded by the 25th and 75th percentile ranks is:
a. Interquartile range
b. Confidence Interval
c. Standard deviation
d. Variance
104. The Median value is the:
a. 25th percentile
b. 50th percentile
c. 75th percentile
d. 95th percentile
105. Large standard deviations suggest that:
a. Scores are probably widely scattered.
b. There is very little deference among scores.
c. mean, median and mode are the same
d. The scores not normally distributed.
106. The formula given below is computational formula for:
a. Variance
b. Mean
c. Standard deviation
d. tstatistic
107. The squire of the standard deviation is the:
a. Variance.
b. Standard error
c. Zscore
d. Variance
108. Which is NOT a characteristic of normal distribution?
a. Symmetric
b. Bellshaped
c. Mean = median = mode
d. Negative skewness
109. Skewness is a measure:
a. of the asymmetry of the probability distribution
b. which decides whether the distribution may have high or low variance
c. of central tendency
d. None of the above
110. The listed observations 1,2,3,4,100, suggest the distribution:
a. is positively skewed
b. is negatively skewed
c. has zero skewness
d. is leftskewed
111. Which statement about normal distribution is FALSE:
a. 50 percent of the observations fall within one standard deviation sigma of the mean.
b. 68 percent of the observations fall within one standard deviation sigma of the mean.
c. 95 percent of observation falls within 2 standard deviations.
d. 99.7 percent of observations fall within 3 standard deviations of the mean.
112. A measure used to standardize the central tendency away from the mean across different samples is:
a. skewness
b. Range
c. Zscore
d. mode
113. Probability values fall on scale between:
a. 1 to +1
b. 0 and 1.
c. 3 to + 3
d. 0.05 to 0.01
114. Standard error is calculated by:
a. Dividing standard deviation by the square root of the sample size.
b. Dividing number of nominated outcome by number of possible outcome.
c. Adding all the numbers and then dividing by the numbers of observations.
d. Arranging the numbers in numerical order, then taking the middle one.
115. 95% confidence interval refers to:
a. A. considering 1 out of 20 chances are taken to be wrong.
b. B. considering 1 out of 100 chances are taken as wrong.
c. C. considering 95 out of 100 chances are taken as wrong.
d. D. considering 5 out of 20 chances are taken as wrong.
116. The given formula is used to calculate: (O= Observed frequency, E= Expected frequency)
a. ttest
b. chisquire statistic
c. correlation coefficient
d. Standard deviation
117. A contingency table (2×2) is used to calculate:
a. tstatistic
b. correlation coefficient
c. variance
d. chisquire statistic
118. Correlation coefficient ranges from:
a. 0.01 to 0.05
b. 0 to 1
c. 1 to +1
d. 3 to +3
119. A type of graphical presentation data used to explain correlation between dependent and independent variable is:
a. Histogram
b. Frequency polygon
c. Frequency curve
d. Scatter plot
120. When explaining the direction of the linear association between two numerical paired variables, a positive correlation is stated when:
a. One variable increases and the other variable decreases or vice versa.
b. dependent variable increases and independent variable decreases
c. Both variables increase and decrease at the same time.
d. Correlation coefficient is stated close to 0.
Biostatistics MCQsPartI
M.C.Q’s of Biostatistics
1. The mean of the data a, a, a, a will be
a. Zero
b. a
c. 2
d. none of the above
2. The mean of the square deviation about mean is known as;
a. Mean
b. Median
c. Variance
d. Standard deviation
3. If sum of 20 values is 300 then mean of the data is;
a. 15
b. 20
c. 30
d. 300
4. If we add or subtract any value in the original any value in the original data then this process is known as;
a. Change of scale
b. Change of origin
c. Both a and b
d. None of the above
5. The mean of the 10 values is 20, if we add a value 10 in each observation then mean for the new value will be ;
a. 20
b. 0
c. 30
d. 10
6. When two coins are tossed together then probability of getting no tail is;
a. 0
b. ½
c. ¼
d. 1
7. The mean value or central value or average value of a data are;
a. All same value
b. All different value
c. None of these
d. Always negative
8. When “n” is an odd number then median is defined as;
a. Middle value
b. Median of two middle values
c. Sum of the values
d. Most repeated value
9. For a group data the class interval having maximum frequency is known as
a. Median class
b. Mode
c. Median
d. Model class
10. The sum of the deviation about mean for the data 6, 8, 10, 2, and 4 is always;
a. 1
b. 0
c. Negative
d. 30
11. If the calculated value of chisquire lies in the region of acceptance, then we;
a. Accept Ho
b. Reject Ho
c. No conclusion
d. None of the above
12. Chisquare test is always used to test;
a. Population mean
b. Population median
c. Test of association
d. None of these
13. Pulse rate or weight of patient are known as;
a. Nominal data
b. Continuous data
c. Discrete data
d. Random variable
14. Classification of objects or persons into classes or groups in such a way that only one object or person falls in only one group at a time is called as;
a. Mutually exclusive
b. None Mutually exclusive
c. Dependent
d. Independent
15. In testing hypothesis we use different level of significance to test Ho , in most situations level of significance is not given then we have to use;
a. 1 %
b. 2 %
c. 5%
d. 10%
16. If we want to compare two or more groups then we use coefficient of variation (C.V), the group which has maximum C.V is known as the more;
a. Consistent
b. Not consistent
c. None of the above
d. It is not possible
17. When we make a 95% confidence interval for the population mean using t or z test then probability or chance of error will be;
a. 0.05
b. 0.1
c. 1
d. 5
18. A variable which has some chance or probability of its occurrence is known as;
a. Simple variable
b. Qualitative variable
c. Quantitative variable
d. Random variable
19. The sample mean x is known as the point estimator of the population;
a. Median
b. Mode
c. Variance
d. Mean μ
20. In all research analysis it is not possible to study whole population, we always estimate population parameters on the basis of;
a. Population information
b. Sample information
c. We could not estimate parameters
d. Estimation of samples
21. Sampling is the process of drawing samples from the population, when the chance or probability of each member of the population is equal than such sampling design known as;
a. Simple random sampling
b. Not random sampling
c. Judgment sampling
d. None of these
22. Estimation is the process of estimating parameters on the basis of;
a. Parameters
b. Statistics
c. A and B
d. None of the above
23. If random sample size 4 taken from a population whose variance is 16. When sampling is done with replacement than variance of the sample mean is;
a. 2
b. 16
c. 4
d. 48
24. When the size of samples is increasing then variance of sample means is also;
a. Increases
b. Decreases
c. Constant
d. None of the above
25. When two dice and a single coin are tossed together then total sample spaces will be;
a. 36
b. 14
c. 24
d. 72 (Rational 6*6*2=72)
26. Student ttest is used to test population mean when population variance is always unknown and the sample size is;
a. Less than 30
b. More than 30
c. Any size
d. None of them
27. The minimum d.f for the Chisquare test of independence or association is always;
a. 0
b. 1
c. 2
d. N1
28. If Chisquare test’s calculated value is less than critical value then o H is always be;
a. Accepted and rejected both
b. Accepted
c. Rejected
d. None of these
29. Pvalue is the probability of the calculated value, if pvalue is zero then we reject the o H after comparing with;
a. Level of significance
b. Critical value
c. d.f
d. sample size
30. squire root of the mean of squire deviation is known as;
a. variance
b. median
c. SD
d. Mean
31. A type of qualitative data where zero is not fixed (arbitrary) termed as;
a. Discrete
b. Continuous
c. Ratio
d. Interval
32. A subset of all the measurement of interest is;
a. Sample
b. Population
c. Sample unit
d. None of these
33. All of the following are an example of qualitative data except;
a. Sex
b. Age
c. Educational level
d. Socioeconomic status
34. All of the following are an example of quantitative data except;
a. Gender
b. Height
c. Weight
d. Temperature
35. Mean is the measure of central tendency can be calculated for all of the following except;
a. Age
b. Weight
c. Systolic BP
d. Marital status
36. Which one is formula for empirical rule
a. μ± 1SD = 60%
b. μ± 1SD = 65%
c. μ± 1SD = 68%
d. μ± 1SD = 70%
37. Following all are true for mean EXCEPT;
a. Applicable for continuous data
b. Not applicable for qualitative data
c. Do not affect by extraneous values
d. Affected by each value in data set
38. Fourth step of hypothesis testing is;
a. Level of significance
b. Test statistic
c. Rejection region
d. None of these
39. The most frequent occurring observation is
a. Mean
b. Median
c. Mode
d. SD
40. When the distribution of data is skewed, one should ideally use;
a. Mean
b. Median
c. Mode
d. None of these
41. Sample SD is denoted by;
a. S
b. S2
c.
d.
42. Zcore is calculated for;
a. Chiquire distribution
b. Standard normal distribution
c. Tdistribution
d. Normal distribution
43. A hospital claims, its ambulance response time is less than 10 minutes, it can be written as;
a. o H >10 min, A H ≤ 10 min
b. o H ≤10 min, A H > 10 min
c. o H ≠10 min, A H = 10 min
d. o H – 10 min, A H / 10 min
44. Chiquire test of significance is used when;
a. Data is continuous
b. Data is categorical
c. Data is discrete
d. None of these
45. In normal distribution curve, mean of the data lie on the
a. Right end
b. Centre
c. Left end
d. None of these
46. Parameters of standard normal distribution are;
a. Mean
b. SD
c. Range
d. Both a and b
47. Which one the following is true for standard normal distribution;
a. Mean = 0
b. Mean = 50
c. Mean = 100
d. Mean = 0.5
48. When mean, median, and mode lie in the centre of the curve, the distribution is known as;
a. Right skewed
b. Left skewed
c. Chisquire
d. Normal
49. In 95% confidence interval, the level of significance (α) is;
a. 0.01
b. 0.05
c. 0.1
d. None of these
50. All of the following are true for student ttest except;
a. Sample size 30
b. = unknown
c. Approximate Z when N>30
d. Use for qualitative data
51. Which one the formula is used for df in chisquire distribution;
a. (row)(column)
b. (rowcolumn)
c. (row1)(column1)
d. (row1)(column)
52. All of the following are true for measure of dispersion except;
a. Mean
b. Range
c. Interquartile range
d. Variance
53. What is the relationship between SD and variance;
a. Variance = SD
b. Variance = SD/n
c. Variance = (SD)2
d. None of these
54. First step in calculating median is;
a. Calculate range
b. Arrange data
c. Count the data
d. None of these
55. What is true for descriptive statistics;
a. Organization & displaying of data
b. Drawing inferences for population
c. Hypothesis testing
d. Calculation pvalue
56. The area under normal distribution curve is;
a. 1
b. 0.5
c. 0
d. None of these
57. Negative zscore shows that;
a. Observation is below to mean
b. Observation is above to mean
c. Observation is equal to mean
d. None of these
Biostatistics MCQs PartII
SCENARIO (for 58 to 60)
A survey was conducted by graduate students to investigate the current situation of student in Pakistan. Some of the variable was Gender, Level of Education, Ethnicity, Place of domicile, Age, Marital status & employee status. Following questions (58 60) are related this scenario;
58. Appropriate graph to display marital status (Married, Unmarried, Divorced, widow) is;
a. Frequency polygon
b. Scatter plot
c. Pie chart
d. Histogram
59. Level of education is;
a. Nominal data
b. Ordinal data
c. Discrete data
d. None of these
60. The best way to display Age data is to draw;
a. Histogram
b. Bar chart
c. Both a & b
d. None of these
Research MCQsPartIV
Research Multiple choice questions with Key
151. The use of multiple data sources to help understand a phenomenon is one strategy that is used to promote qualitative research validity. Which of the following terms describes this strategy?
a. Data matching
b. Pattern matching
c. Data triangulation
d. Data feedback
152. What is another term that refers to a confounding extraneous variable?
a. Last variable
b. First variable
c. Third variable
d. Fourth variable
153. Which of the following refers to any systematic change that occurs over time in the way in which the dependent variable is assessed?
a. Instrumentation
b. Maturation
c. Testing
d. Selection
154. Which strategy used to promote qualitative research validity uses multiple research methods to study a phenomenon?
a. Data triangulation
b. Methods triangulation
c. Theory triangulation
d. Member checking
155. In study design threats, If Subjects’ behaviour may be affected by characteristics of the researchers is known as:
a. Measurement effect
b. Experimenter effect
c. Novelty effect
d. Expectancy effect
156. Which of the following in not one of the key threats to internal validity?
a. Maturation
b. Instrumentation
c. Temporal change
d. History
157. Which is not a direct threat to the internal validity of a research design?
a. History
b. Testing
c. Sampling error
d. Differential selection
158. Internal validity refers to which of the following?
a. The ability to infer that a causal relationship exists between 2 variables
b. The extent to which study results can be generalized to and across populations of persons, settings, and times
c. The use of effective measurement instruments in the study
d. The ability to generalize the study results to individuals not included in the study
159. The posttestonly design with nonequivalent groups is likely to control for which of the following threats to internal validity:
a. History
b. Differential selection
c. additive and interactive effects
d. differential attrition
160. Which of the following designs permits a comparison of pretest scores to determine the initial equivalence of groups on the pretest before the treatment variable is introduced into the research setting.
a. Onegroup pretestposttest design
b. Pretestposttest control group design
c. Posttestonly design with nonequivalent groups
d. Both b and c
161. Which of the following control techniques available to the researcher controls for both known and unknown variables?
a. Building the extraneous variable into the design
b. Matching
c. Random assignment
d. Analysis of covariance
162. The group that does not receive the experimental treatment condition is the ________.
a. Experimental group
b. Control group
c. Treatment group
d. Independent group
163. There are a number of ways in which confounding extraneous variables can be controlled. Which control technique is considered to be the best?
a. Random assignment
b. Matching
c. Counterbalancing
d. None of the above
164. In an experimental research study, the primary goal is to isolate and identify the effect produced by the ____.
a. Dependent variable
b. Extraneous variable
c. Independent variable
d. Confounding variable
165. Which one of the following research tests hypotheses and theories in order to explain how and why a phenomenon operates as it does?
a. Descriptive
b. Predictive
c. Explanatory
d. Exploratory
166. If a research finding is statistically significant, then ____.
a. The observed result is probably not due to chance
b. The observed result cannot possibly be due to chance
c. The observed result is probably a chance result
d. The null hypothesis of “no relationship” is probably true
167. When a researcher starts with the dependent variable and moves backwards, it is called.
a. Predictive research
b. Retrospective research
c. Exploratory research
d. Descriptive research
168. Which approach is the strongest for establishing that a relationship is causal?
a. Causalcomparative
b. Correlational
c. Experimental
d. Historical
169. Following are the threats to internal validity except one;
a. Novelty Effect
b. History
c. Selection
d. Maturation
170. The degree to which the components of the research reflect the theory, concept, or variable under study is termed as;
a. Design Validity
b. Threats to Validity
c. Internal validity
d. External validity
171. Which of the following is NOT a purpose of descriptive studies?
a. To serve as a starting point for hypothesis generation
b. To get rigorous control of the variables
c. To serve as a starting point for theory development
d. To observe, describe, & document aspects of a situation as it naturally occurs
172. Which of the following attempts to understand relationships among phenomena as they naturally occur, without any intervention?
a. Ex post facto research
b. Experimental research
c. Prospective design
d. Retrospective design
173. The nursing community’s interest in qualitative research began in;
a. Late 1910’s
b. Late 1930’s
c. Late 1950’s
d. Late 1970’s
174. Which of the following is characteristic of qualitative research?
a. Generalization to the population
b. Random sampling
c. Unique case orientation
d. Standardized tests and measures
175. Which of the following is a characteristic of qualitative research?
a. Design flexibility
b. Inductive analysis
c. Context sensitivity
d. All of the above
176. Which of the following is usually not a characteristic of qualitative research?
a. Design flexibility
b. Dynamic systems
c. Naturalistic inquiry
d. Deductive design
177. Which of the following focuses on individuals’ interpretation of their experience & the ways in which they express them?
a. Historical research
b. Phenomenological Research
c. Grounded theory
d. Ethnography Research
178. Which of the following is not phase of qualitative research?
a. Orientation and overview
b. Focused exploration
c. Conformation and Closure
d. Orientation and closure
179. A research is undertaken to answer questions about causes, effects, or trends relating to past events that may shed light on present behaviors or practices is called as;
a. Historical research
b. Phenomenological Research
c. Grounded theory
d. Ethnography Research
180. Following are the major types of triangulation EXCEPT ONE;
a. Data Triangulation
b. Time Triangulation
c. Method Triangulation
d. Theory Triangulation
181. Which of the following has contributed to the development of many middle range theories of phenomena relevant to nurses?
a. Historical research
b. Phenomenological Research
c. Grounded theory
d. Ethnography Research
182. Which of the following refers to use of more than one theoretical position in interpreting data?
a. Data Triangulation
b. Time Triangulation
c. Method Triangulation
d. Theory Triangulation
183. Phenomenology has its disciplinary origins in:
a. Philosophy
b. Anthropology
c. Sociology
d. Many disciplines
184. The primary data analysis approach in ethnography is:
a. Open, axial, and selective coding
b. Holistic description and search for cultural themes
c. Crosscase analysis
d. Identifying essences of a phenomenon
185. The term used to describe suspending preconceptions and learned feelings about a phenomenon is called:
a. Axial coding
b. Design flexibility
c. Bracketing
d. Ethnography
186. ________ is a study of human consciousness and individuals’ experience of some phenomenon.
a. Phenomenology
b. Ethnography
c. Grounded theory
d. Case study research
187. ________ is a general methodology for developing theory that is based on data systematically gathered and analysed.
a. Theory confirmation
b. Grounded theory
c. Theory deduction
d. phenomenology
188. In which qualitative research approach is the primary goal to gain access to individuals’ inner worlds of experience?
a. Phenomenology
b. Ethnography
c. Grounded theory
d. Case study
189. The type of qualitative research that describes the culture of a group of people is called;
a. Phenomenology
b. Grounded theory
c. Ethnography
d. Case study
190. What term refers to the insider’s perspective?
a. Ethnocentrism
b. Emic perspective
c. Etic perspective
d. Holism
191. A researcher studies a Kashmiri group for a six month period to learn all about them so he can write a book about that particular tribe. What type of research will he likely be conducting?
a. Ethnography
b. Phenomenology
c. Grounded theory
d. Collective case study
192. _________ is used to describe cultural scenes or the cultural characteristics of a group of people.
a. Phenomenology
b. Ethnography
c. Grounded theory
d. Instrumental case study
193. Which of the following is not one of the 4 major approaches to qualitative research?
a. Ethnography
b. Phenomenology
c. Case study
d. Nonexperimental
194. Which of the following is known as a clear statement of the specific aim or goal of the study
a. Research Question
b. Research objective
c. Research Purpose
d. Research Problem
195. Tuskegee Syphilis study was conducted in which of the following year.
a. 1930
b. 1940
c. 1932
d. 1942
196. Medical experiments conduct on prisoners of war and racially valueless persons is named as;
a. Jewish C.D Hospital Study
b. Nazi medical experience.
c. Tuskegee Syphilis study
d. Willow brook Study
197. A hypothesis which states the relationship among three or more variables is called as
a. Simple Hypothesis
b. Complex Hypothesis
c. Research Hypothesis
d. Non directional Hypothesis
198. Which of the following is not an element of the ethical research?
a. Protecting subjects rights
b. Obtaining informed consent
c. Obtaining institutional approval
d. Unbalancing the benefits and the risk in the study
199. Misinforming the subjects for the research purposes is called as follow?
a. Anonymity
b. Confidentiality
c. Scientific misconduct
d. Deception
200. A hypothesis that States the nature (positive or negative) of the interaction between two or more variables is called
a. Associated Hypothesis
b. Casual Hypothesis
c. Null Hypothesis
d. Directional Hypothesis