Merck Manual

Please confirm that you are a health care professional

honeypot link
Professional Version

Basic Principles of Epidemiology


Donald L. Noah

, DVM, DACVPM, College of Veterinary Medicine and DeBusk College of Osteopathic Medicine, Lincoln Memorial University;

Stephanie R. Ostrowski

, DVM, MPVM, DACVPM, Department of Pathobiology, College of Veterinary Medicine, Auburn University

Medically Reviewed May 2015 | Modified Nov 2022
Topic Resources

The definition of epidemiology is “the study of disease in populations and of factors that determine its occurrence over time.” The purpose is to describe and identify opportunities for intervention. Epidemiology is concerned with the distribution and determinants of health and disease, morbidity, injury, disability, and mortality in populations. For veterinary epidemiology, this intervention is to enhance not only health but also productivity. Distribution implies that diseases and other health outcomes do not occur randomly in populations; determinants are any factors that cause a change in a health condition or other defined characteristic; morbidity is illness due to a specific disease or health condition; mortality is death due to a specific disease or health condition; and the population at risk can be people, animals, or plants.

Epidemiology is applied in many areas of public health practice. Among the most salient are to observe historical health trends to make useful projections into the future, discover (diagnose) current health and disease burden in a population, identify specific causes and risk factors of disease, differentiate between natural and intentional events (eg, bioterrorism), describe the natural history of a particular disease, compare various treatment and prevention products/techniques, assess the impact/efficiency/cost/outcome of interventions, prioritize intervention strategies, and provide foundation for public policy.

Epidemiologic Terms and Concepts

The natural history of a disease in a population, sometimes termed the disease’s ecology, refers to the course of the disease from its beginning to its final clinical endpoints. The natural history begins before infection (prepathogenesis period) when the agent simply exists in the environment, includes the factors that affect its incidence and distribution, and concludes with either its disappearance or persistence (endemnicity) in that environment. Although knowledge of the complete natural history is not absolutely necessary for treatment and control of disease in a population, it does facilitate the most effective interventions.

An important epidemiologic concept is that neither health nor disease occurs randomly throughout populations. Innumerable factors influence the temporal waxing and waning of disease. A disease is considered endemic when it is constantly present within a given geographic area. For instance, animal rabies is endemic in the USA. An epidemic occurs when a disease occurs in larger numbers than expected in a given population and geographic area. Raccoon rabies was epidemic throughout the eastern USA for much of the 1980s and 1990s. A subset of an epidemic is an outbreak, when the higher disease occurrence occurs in a smaller geographic area and shorter period of time. Finally, a pandemic occurs when an epidemic becomes global in scope (eg, influenza, HIV/AIDS).

The population at risk is an extremely important concept in epidemiology and includes members of the overall population who are capable of developing the disease or condition being studied. This concept seems simple at first, but misinterpretations can lead to erroneous study results and conclusions. As a simple example, a study of testicular cancer among residents in a population should not include women in the population at risk (frequently expressed as the “denominator” in an epidemiologic ratio).

A ratio is the value obtained from dividing one quantity by another (X/Y). The numerator and denominator may be independent of each other. In fact, in epidemiology, the term ratio is applied when the numerator is not a subset of the denominator. For example, in a class of veterinary students in which 88 are female and 14 are male, the sex ratio of female students to male students is 88/14, or 6.3 to 1.

A proportion is a type of ratio in which the numerator is part of the denominator (A/[A + B]). Therefore, they are not independent. For example, suppose that, among domestic dogs testing positive for internal parasites in Glendale, Arizona, 889 were male and 643 were female. The proportion of female dogs among those found to have parasite infections would be 643/(889 + 643), or 0.42.

A rate is another type of ratio in which the denominator involves the passage of time. This is important in epidemiology, because rates can be used to measure the speed of a disease event or to make epidemiologic comparisons between populations over time. Rates are typically expressed as a measure of the frequency with which an event occurs in a defined population in a defined time (eg, the number of foodborne Salmonella infections per 100,000 people annually in the USA).

Incidence is a measure of the new occurrence of a disease event (eg, illness or death) within a defined time period in a specified population. Two essential components are the number of new cases and the period of time in which those new cases appear. In an example regarding the class of veterinary students, if 13 of them developed influenza over the course of 3 mo (one quarter), the incidence would be 13 cases per quarter.

An incidence rate takes the population at risk into account. In the previous example, the incidence rate would be 13 cases per quarter/102 students, or 0.127 cases per quarter per student. Incidence rates are usually expressed by a multiplier that makes the number easier to conceptualize and compare. In this example, the multiplier would be 100, and the incidence rate would be 12.7 cases per quarter per 100 students (or 12.7%). An attack rate is an incidence rate; however, the period of susceptibility is very short (usually confined to a single outbreak).

A similar concept to incidence is prevalence. Prevalence (synonymous with “point prevalence”) is the total number of cases that exist at a particular point in time in a particular population at risk. Again using the influenza example from above, if 7 students had influenza at the same time during the academic quarter, the prevalence would be 7/102 or 0.069 cases per class (or 6.9%).

Measures of disease burden typically describe illness and death outcomes as morbidity and mortality, respectively. Morbidity is the measure of illness in a population, and numbers and rates are calculated in a similar fashion as with incidence and prevalence. Mortality is the corresponding measure of death in a population and can be applied to death from general (nonspecific) causes or from a specific disease. In the latter case, cause-specific mortality is expressed as the case fatality rate (CFR), which is the number of deaths due to a particular disease occurring among individuals afflicted with that disease in a given time period. In another example, consider a large veterinary practice in the southwest USA that frequently sees dogs with coccidioidomycosis. The practice diagnosed 542 clinical cases in a particular year, 83 of which died from the disease in the course of that year. The month in which the most cases were diagnosed was September, in which 97 cases were diagnosed. Further, at a single point in time (perhaps based on the results of a serosurvey of dogs in the practice area), 237 dogs of 6,821 dogs with active records in the practice had the disease. In this scenario, the prevalence of coccidioidomycosis at the time of the serosurvey would be 237/6,821 or 0.035 (3.5%); the incidence in September would be 97 cases, and the incidence rate would be 97/6,821 or 0.014 (1.4%). Finally, the annual mortality rate due to coccidioidomycosis would be 83/6,821 or 0.013 (1.3%), and the case fatality rate would be 83/542 or 0.153 (or 15.3%).

Public health surveillance is defined as the ongoing systematic collection, analysis, interpretation, and dissemination of outcome-specific data essential to the planning, implementation, and evaluation of public health practice. In epidemiology, health surveillance is accomplished in either passive or active systems. Passive surveillance occurs when individual health care providers or diagnostic laboratories send periodic reports to the public health agency. Because this reporting is voluntary (sometimes referred to as being "pushed" to health agencies), passive surveillance tends to underreport disease, especially in diseases with low morbidity and mortality. Passive surveillance is useful for longterm trend analysis (if reporting criteria remain consistent) and is much less expensive than active surveillance. An example of passive surveillance is the system of officially notifiable diseases routinely reported to CDC by select health departments across the USA. Active surveillance, in contrast, occurs when an epidemiologist or public health agency seeks specific data from individual health care providers or laboratories. In this case, the data are “pulled” by the requestor, usually during emerging diseases or significant changes in disease incidence. Active surveillance is usually much more expensive and labor intensive; it typically is limited to short-term analyses of high-impact events. An example is the 1-yr surveillance conducted by CDC of the rapid increase in incidence of coccidioidomycosis among people in Arizona in 2007–2008.

Descriptive Epidemiology

Given that neither health nor disease is equally distributed throughout a population, epidemiologists use various methods to study and describe their occurrence. In descriptive epidemiology, diseases are classified according to the variables of person, place, and time.

Person: Who is affected by this disease? This is relevant, because certain variables may highlight changes in disease status and can be used to focus additional studies and interventions. Common person variables include age, sex, race, socioeconomic status, marital status, religion, smoker/nonsmoker, etc. In the case of animals, equivalent variables may include species, breed, reproductive status (eg, intact vs neutered, pregnant vs nonpregnant), function (eg, meat/milk/fiber production, race horse vs working horse vs pleasure horse, companion dog vs military working dog), and wild/feral vs domesticated (cats).

Place: Where does this disease occur? Place variables commonly illustrate geographic differences in the occurrence of a particular disease. Focused studies can help assist epidemiologists to determine why those differences have occurred and to identify specific risk factors. Common place variables include comparisons across national, state, and municipal boundaries and between urban and rural communities. For animal populations, “place” may refer to housing (eg, indoors vs outdoors, pen number or stall) or type of herd management (eg, intensive feedlot confinement vs extensive grazing). Place may also relate to risk of exposure to infectious animals at sale barns or during shipment or to external factors such as severe weather and natural disasters.

Time: When and over what time period (hours, days, weeks, day vs night) does this disease occur? Time variables are important to describe when disease occurs in relation to various factors of potential exposure and vulnerability. In animals, time may refer to milking shift, breeding season, lambing/calving season, at weaning, during shipment, on arrival at the feedlot, dry vs wet season, etc. Common time variables include secular trends (changes over long periods of time), seasonal/cyclic periods, and specific points in time (eg, outbreaks, epidemics, clusters, etc).

When a particular disease is observed relative to the variables of person, place, and time, it is often systematically described to facilitate more in-depth study. These systematic descriptions commonly take the form of case reports, case series, or cross-sectional studies.

Case reports are accounts of single or a few noteworthy health-related incidents (eg, an epidemiologic description of a case of human rabies).

Case series are listings of a larger number of cases, usually presented consecutively (eg, a characterization of dog bite incidents in a population of veterinarians and/or technicians over time). Case series articles are useful for comparing variables of person, place, or time as they appear to affect the occurrence of a particular disease.

Cross-sectional studies are one-time assessments of the incidence or prevalence of a disease in a defined population, which is usually selected at random from a larger population at risk (eg, a serosurvey of veterinarians for the presence of antibodies to Bartonella henselae organisms to determine risk factors and for cat scratch disease). Cross-sectional studies are especially useful in forming hypotheses to be addressed by follow-on analytic studies.

Two main types of bias in descriptive epidemiology are selection bias and observation bias. Selection bias results from the identification of subjects/cases from a subset that is not representative of the entire population at risk. A nonmedical example of selection bias would occur in a voter survey, intended to predict the outcome of a political election, but drawn from a sample of voters from either high- or low-income status, neither of which would be representative of the overall voting population. Observation bias arises from systematic differences in the method of obtaining information from subjects/cases. Consider a study comparing library usage between students at two universities. Significant differences might result if students from one university were queried over the phone regarding library visits, whereas students at the other university were directly observed for actual usage. In general, bias in descriptive studies is not as prevalent or significant as bias in analytical studies.

In summary, descriptive epidemiology serves to describe the occurrence of disease in a population. Descriptive methods are commonly applied to little-known diseases; they use preexisting data, address the questions of who/where/when, and identify potential associations for more in-depth analytical studies.

Analytical Epidemiology

Analytical studies are applied to study the etiology of disease, to identify a causal relationship between exposures and health outcomes. They are typically used when insights of a particular health issue are available, commonly from previous descriptive studies. In evaluating the causality of disease associations, analytical studies address the question of “why” as opposed to the “person/place/time” of descriptive studies.

Once potential associations have been observed between those who have a particular disease and those who do not, further investigations are undertaken to determine causality and identify effective interventions. The first step in an analytic study is to form some conjecture regarding observed exposures and health outcomes. In analytical studies, this conjecture is termed the null hypothesis, meaning that the default assumption is that there is no association between the exposure in question and the disease outcome. Note that this assumption of no association is made even though the epidemiologist often thinks that some association actually exists. Once the null hypothesis is generated, studies are designed to test it and either reject it (by finding that some association actually does exist between exposure and disease outcome) or accept it (by finding that no association exists).

Analytical epidemiology is accomplished through either observational studies or interventional studies. In the former, the investigator does not control the exposure between the groups under study and typically cannot randomly assign subjects to study groups.

Observational Studies:

Ecologic Studies

The unit under study is a group of people or animals versus an individual. The group has no size limitation but must be able to be defined. For instance, the group could be a kennel of dogs, a class of veterinary students, or the citizens of an entire country. Once defined, the group is analyzed against some exposure to see what outcome(s) ensue. Examples of ecologic studies include Dr. John Snow’s analysis of the association between the incidence of cholera in London and where people obtained their drinking water, an analysis of how tobacco taxes affect tobacco usage, and an analysis of certain occupations for resultant hearing loss.

Ecologic studies have several advantages over other types of observational studies. They are relatively quick, easy, and inexpensive. Individual data are not necessary, only aggregate data for the group(s) under study. Finally, they are useful in generating information about the overall context of health, especially how it is affected by variables such as demographics, geography, and the social environment.

Ecologic studies also have several disadvantages. First, the measurement of many exposures is imprecise, especially of large groups in which the influence(s) of those exposures is difficult to define or not equally exerted. This phenomenon of unequal variable exertion results in another potential drawback to ecologic studies. Known as ecologic fallacy, it is described by “associations observed at the group level do not necessarily hold true at the individual level.” As an example, one could determine that the average IQ of a class of veterinary students is above average (which, by definition, would be 100). If a particular student was randomly selected from that class, could it be inferred that that student’s IQ was above 100? The answer is no, because of the difference between average and median. If the class had only a few people above average, but these students were significantly above average, and the rest of the students were only slightly below average, the distribution would be skewed toward a higher IQ when, in actuality, many members of the class would be below average.

Cohort Studies

In this type of study, a group of individuals (termed a cohort) is observed over time for changes in health outcomes.

When the period of the study is from the present into the future, the study is a prospective cohort study. In this case, the cohort is assumed to share a particular exposure and is followed over time to document the occurrence of new instances of a particular disease or outcome. Obviously, each member of the cohort must not have the disease or outcome at the beginning of the study. One of the most famous medical prospective cohort studies is the Framingham Heart Study. Researchers began the study in 1948 by recruiting 5,209 men and women, 30–62 yr old, from the town of Framingham, Massachusetts. Since that time, they have accomplished extensive serial physical examinations and surveys relating to the development of cardiovascular disease.

The major advantage of the prospective cohort study is that many different exposures can be considered and analyzed for influencing the outcome under study. Disadvantages include the high cost in terms of money and time during the period of the study and the inability to study very rare diseases or health outcomes unless the cohort is extremely large.

When the period of the study is from the past to the present, the study is a retrospective cohort study. The methodology is very similar to that of the prospective cohort study, except that all the events (exposures and outcomes) have already occurred; the investigator is merely looking back rather than forward. Retrospective studies are conceived after some individuals have already developed the outcomes of interest. The investigators jump back in time to identify a cohort of individuals at a point in time before they developed the outcomes of interest, and try to establish their exposure status at that point in time. They then determine whether the subject subsequently developed the outcomes of interest. If so, they can analyze the exposure(s) that may have contributed to those outcomes.

Retrospective cohort studies have several advantages over prospective cohort studies. They typically take less time and are less expensive. Additionally, they can address rare outcomes, because the cases are selected after having already developed the disease or outcome. Disadvantages include a potentially high possibility of selection bias, the fact that individuals may have difficulty recalling certain exposures (termed recall bias), and the requirement for the existence of medical and/or exposure records.

Regardless of being retrospective or prospective, the measure of association of all cohort studies is the relative risk (RR). Relative risk is calculated by dividing the incidence rate of the disease or outcome in the exposed individuals by the incidence rate in the unexposed individuals. An RR of 1 means there is no difference in risk between the two groups. An RR <1 means that the outcome is less likely to occur in the exposed group than in the unexposed group. Conversely, an RR >1 means the outcome is more likely to occur in the exposed group than in the unexposed group. Consider an example in which the incidence of prostate cancer among neutered male dogs was found to be 1.37%, and the incidence in intact male dogs was 0.36%. In this case, the relative risk would be 1.37/0.36 or 3.8. This could be stated as “Neutered male dogs would be nearly four times as likely as intact male dogs to develop prostate cancer.”

Case-Control Studies

In this type of study, subjects are selected as either having a particular outcome (cases) or not having the outcome (controls). They are then compared in a retrospective way to identify differences in their exposures that might explain the differences in outcomes. Ideally, cases and controls should be as similar as possible in all characteristics except the outcome in order to make the comparisons simpler and more meaningful. That is why some investigators “match” cases and controls. In one notable example, a very large case-control study in 1950 studied people with lung cancer and demonstrated a very positive association between smoking and lung cancer. Although it did not prove causality alone, it was instrumental in the U.S. Surgeon General’s now-standard warnings.

Case-control studies have several advantages. They are inherently retrospective, so they are relatively quick and inexpensive. Because the cases have already been identified, they are appropriate for studying rare diseases and examining multiple exposures. Disadvantages include the fact that, like cohort studies, they are prone to selection, recall, and observer bias. Additionally, their application is limited to the study of one outcome.

The most common measurement of association in case-control studies is the odds ratio. The odds ratio (OR) represents the odds that an outcome will occur from a particular exposure, compared with the odds of the outcome occurring in the absence of that exposure. ORs are calculated using a 2 × 2 frequency table (Calculating an Odds Ratio Calculating an Odds Ratio Calculating an Odds Ratio ).


An OR of 1 means the exposure did not affect the odds of the outcome. An OR >1 means the exposure is associated with a higher odds of the outcome, and an OR <1 means the exposure is associated with a lower odds of the outcome. Although a higher OR indicates a stronger association between exposure and outcome, it does not necessarily imply statistical significance and, by itself, is not enough to prove causality.

Interventional Studies:

The other category of studies that comprise analytical epidemiology are interventional studies. In contrast to observational studies, the investigator using an interventional approach can intentionally change some form of exposure between several groups to determine differences in outcome(s). In medical research, these exposures typically include interventions such as vaccines, therapeutic drugs, surgical techniques, or medical devices. The results of interventional studies can be very powerful in proving causality or identifying efficacy of various interventions. Interventional studies typically take one of two forms, either a randomized controlled (clinical) trial or a nonrandomized (community) trial.

Randomized Controlled (Clinical) Trials:

In this type of study, participants are selected from a population and randomly assigned to one of two groups, one being the study group and the other being the control group. Study groups receive the intervention, and the controls do not.

Bias can be introduced in such a trial when either the participants or the investigator know which participants are in which group. This bias can be alleviated in one of two ways. First, in a single-blinded design, the participants are unaware whether they are in the study group or the control group. Additionally, in a double-blinded design, neither the investigator nor the participants are aware of the group assignments.

A major advantage of randomized controlled clinical trials is an inherently high validity for identifying differences in therapeutic efficacy of various interventions. Perhaps the major disadvantage is the high potential for ethical implications if an intervention with great potential benefit is intentionally withheld from the control group (eg, the historic Tuskegee Syphilis Study). For this reason, it can be very difficult to legally use human participants in many such trials. Additionally, this type of study is not usually applicable for discovering disease etiologies; observational studies are much better suited for this purpose.

Nonrandomized (Community) Trials

In this type of study, the units are groups (or communities) of participants assigned to treatment or control conditions. Although the communities may be selected at random, the individuals within them obviously are not. These studies are commonly undertaken to assess the quality and effectiveness of educational programs, behavioral changes, or mass interventions such as water fluoridation.


Bias is defined as the systematic deviation of results or inferences from truth. Bias is extremely difficult to completely avoid when undertaking scientific study. Therefore, studies are designed in ways that minimize the sources and effects of bias. Examples of bias are described below.

The Hawthorne Effect: Participants in a study may act or behave differently because they know they are being studied. In 1924–1932, workers at the Hawthorne Works (an electric company near Chicago) were studied to see whether productivity was greater depending on how much light was provided at work. Results showed that productivity increased during the course of the study regardless of the changes in light; the workers just performed better because of the attention. When the study ended, productivity went back to prestudy levels.

Recall bias: Cases and controls may remember an exposure differently (and non-randomly). Usually, cases remember exposures more clearly than controls.

Selection bias: This occurs when selected controls are not representative of the population from which the cases were selected. In other words, there is an important characteristic of the controls that make them different from the general population. An example is the healthy worker effect, which refers to the phenomenon that employed groups have lower mortality than the general population. Therefore, if the study groups are comprised of differing fractions of employed and unemployed people, the results may very well be skewed.

Observer bias: The investigator, having knowledge of the outcome(s), might record exposures differently between cases and controls.

Confounding: In epidemiologic studies, a confounder is a variable that is not considered in the study design but is associated with the exposure and exerts an effect on the outcome. Confounders can either produce a false association between variables or mask a true association between variables. An example of the former was a spurious conclusion drawn from a study of the relationship between alcohol consumption and heart disease. In the study, it was concluded that alcohol consumption was significantly associated with heart disease. Smoking was later identified as a confounder, because smoking was correlated both with alcohol consumption and also with heart disease. When corrected for the effects of this confounder, no association was found between alcohol consumption and heart disease.


When analyzing results of an epidemiologic study, there are two categorical types of error when either accepting or rejecting the null hypothesis.

Type I error, also known as a false positive, is when the null hypothesis is rejected when it actually should have been accepted. In other words, this is the error of accepting an alternative hypothesis (the real hypothesis of interest) when the results can actually be attributed to chance. Type I error (which can never be zero) is generally reported as the P value. In scientific studies, the most common P value is .05, which means an error of <5% of detecting an association between a variable and an outcome is acceptable.

Type II error, also known as a false negative, is when the null hypothesis is accepted when it actually should have been rejected. In other words, the study did not have adequate power to detect an association between a variable and an outcome when the association actually existed.

Variable Associations and Causality

In epidemiology, variables are either associated or they are not. If the variables are not associated, there is no relationship; they are independent. If the variables are associated, that relationship can be either positive or negative. If two variables are positively associated, the values of both variables increase or decrease together. If they are negatively associated, the value of one variable increases when the other decreases. Finally, if an association exists (positive or negative), it is either causal or noncausal related to the outcome.

When two variables are associated, it is sometimes obvious as to whether it is causal. Consider the relationship between animal bites and rabies; we know they are causally associated. However, in most epidemiologic studies, the relationship between variables (such as exposure and outcome) is much more difficult to ascertain and requires more extensive analysis. Several sets of systematic criteria for determining causality have been proposed: 1) Strength—Although a small association does not mean that there is not a causal effect, the larger the association, the more likely that it is causal. 2) Consistency—Repeatedly similar findings observed by different persons in different places with different samples strengthen the likelihood of a causal effect. 3) Specificity—Causation is more likely in a very specific population at a specific site and disease with no other likely explanation. The more specific the association between an exposure and an outcome, the higher the probability of causation. 4) Temporality—The outcome must occur after the exposure. 5) Biological gradient—Greater exposure generally results in greater incidence of the outcome. However, in some cases, the mere presence of the exposure, without regard to its magnitude, can trigger the effect. In yet other cases, an inverse relationship is observed when greater exposure of a protective factor leads to lower incidence of outcomes. 6) Plausibility—A rational, explainable mechanism between cause and effect is helpful (but may be limited by current knowledge). 7) Coherence—Agreement between epidemiologic and laboratory findings increases the likelihood of a causal effect. 8) Analogy—The effect of similar associations between other variables of exposure and outcome may be considered.

Sensitivity and Specificity

Veterinary practitioners use many diagnostic tests to determine what may be wrong with an animal and how it may be treated. The diagnostician must realize that these tests are fallible and that results are usually only close approximations of “truth.” We can assume that an animal either has a medical condition or does not; however, no tests are 100% sensitive and specific. That is, no test can eliminate the potential for false-positive and false-negative results. However, there are methods to interpret test results to reduce their inherent fallibility. Those tests or diagnostic procedures known to produce the absolute best results are termed “gold standard” tests. It is against these gold standards that newer, usually faster and more convenient, tests are measured in terms of sensitivity and specificity.

Sensitivity is the probability of a positive test result when the disease is actually present. A sensitive test is “positive in disease” and minimizes false-negative results, thus minimizing type II error. Specificity, in contrast, is the probability of a negative test result in the absence of disease, thereby correctly classifying an individual as disease-free (regarding that particular condition). A specific test is “negative in health” and minimizes false-positive results, thus minimizing type I error. To calculate the sensitivity and specificity of a test, consider the following 2 × 2 table ( see Table: Calculating Test Sensitivity and Specificity Calculating Test Sensitivity and Specificity Calculating Test Sensitivity and Specificity ).


Sensitivity and specificity are inversely proportional, ie, when one increases, the other decreases. Therefore, the accuracy of a test is a trade-off between each of these parameters. Screening tests tend to have higher sensitivity and lower specificity, because the purpose of such a test is to detect the maximum number of individuals with the particular disease condition. A negative test result, therefore, strongly implies that disease is absent, whereas a positive result may require additional, confirmatory testing. For that reason, positive tests are often followed up with a confirmatory test that displays higher specificity to identify which positive results are true and which are false. Given the high specificity of confirmatory tests, a positive result strongly implies that disease is present.

In clinical medicine, two additional diagnostic test parameters are relevant. The positive predictive value (PPV) of a test is the probability of a patient actually having the disease condition when the test is positive. PPV is calculated as true positives divided by the sum of the true positives and the false positives: a/(a + b). False-negative test results do not affect the PPV. Therefore, if the PPV of a test is 100%, the validity of a negative test result is still unknown.

The negative predictive value (NPV) is the probability of a patient not having the disease condition when the test is negative. NPV is calculated as true negatives divided by the sum of the true negatives and the false negatives: d/(c + d). False-positive test results do not affect the NPV. Therefore, if a test was reported to have an NPV of 100%, the validity of a positive test result is still unknown.

PPV and NPV are clinically relevant, because they are directly related to the prevalence of disease. For diseases of high prevalence, the PPV of a test will be high and the NPV will be low. For rarer diseases, the opposite will be true, ie, the PPV will be low and the NPV will be high. For these reasons, the assumed prevalence of a disease must be taken into account when interpreting diagnostic test results.

Disease Outbreak Investigation

Public health officials investigate disease outbreaks to control them, to prevent additional illnesses, and to learn how to prevent similar outbreaks from happening in the future. Whether an outbreak is foodborne in origin or from another infectious source, the methodology is similar. The following steps in investigating disease outbreaks are used by the CDC: prepare for field work, confirm the existence of an outbreak, verify the diagnosis, establish a working case definition, engage in systematic case finding, apply descriptive epidemiology, develop and test hypotheses (analytical epidemiology), implement control measures, and communicate findings.

Although these steps are accomplished in a systematic fashion, they frequently overlap or occur concurrently. For example, establishing a working case definition typically begins while the diagnosis is in the process of being verified and continues through the initial process of systematic case finding.

Establishing a working case definition is the method by which public health officials define what individuals are included as official cases in the outbreak and illustrate the boundaries of the outbreak. An effective case definition is critical because it may be confusing, especially in the absence of definitive diagnostics, to differentiate between actual disease cases and those ill from other causes. The case definition defines a case in terms of person, place, and time. Person criteria typically include vulnerability factors such as age, sex, signs/symptoms, or attendance at certain meals or public functions. Place criteria usually include a geographic boundary such as a state or local area, a school class, or a particular restaurant. Time parameters may be after an implicated meal (if foodborne) or other types of exposures. Case definitions are usually based on either clinical signs or diagnostic test results. The former is more subjective than the latter but can be just as effective, especially in field epidemiologic conditions. As the investigation ensues and more information becomes available, it may be necessary to revise the case definition. However, this is done judiciously, because changes to the case definition result in changes to the epidemic curve.

The epidemic (epi) curve shows progression of an outbreak over time. The horizontal axis represents the date when an individual became ill, also called the date of onset. The vertical axis is the number of individuals who became ill on each date. These are updated as new data come in and thus are subject to change. The epi curve is complex and may be limited by information deficiencies and inaccurate case definitions. Despite these potential limitations, detailed information regarding the dates and numbers of reported cases is visually useful. Moreover, in addition to the magnitude and duration of the outbreak, the shape of the curve can show useful information regarding the nature of the outbreak.

The overall shape of the epi curve can give clues to the type of exposure that resulted in the outbreak. Typical epi curves from common sources include 1) a common specific point source in which all cases were exposed at the same time and place (eg, a foodborne illness outbreak); 2) a common source with continuous exposure in which although the source is common, cases gradually rise before either peaking or plateauing and declining; and 3) a common source with intermittent exposure in which the peaks occur at irregular times corresponding to the earlier exposures. In addition to outbreaks having common sources, they can have propagated sources, in which cases can directly infect other cases separate from the initial source. In a propagated source outbreak, person-to-person transmission occurs, often through several cycles before declining.

Another attribute of an outbreak that can be illustrated by the epi curve is the period of incubation. For example, in a common point source outbreak, the investigation frequently identifies the event (eg, meal, social gathering, etc) when the exposure occurred. In this case, the peak of cases occur one incubation period after the exposure event. In a propagated source outbreak, the peaks of cases occur one incubation period apart.

quiz link

Test your knowledge

Take a Quiz!