Statistics for Epidemiology - Glossary of Terms

95% Confidence Interval

The most commonly used confidence interval in Public Health is the 95% confidence interval. In other words, the probability that the population mean will fall within the calculated interval is 95%. The 95% confidence level correlates to 1.96 standard deviation.

Absolute Value

The value of a number without regard to its sign. The absolute value of -1 is 1. The absolute value of a number is indicated by bars on either side of the number: |-1|.

Adjusted Rates

Adjusted rates are summary measures of the rate of morbidity (illness) or mortality (death) in a population in which statistical procedures have been applied to remove the effect of differences in composition of the various populations. There are two methods for the adjustment of rates: the direct method and the indirect method. Direct and indirect refer to the source of the rates. The direct method may be used if the population for standardization is known and stable. The indirect method may be used if the population for standardization is unknown or unstable.


Age-specific disease incidence rates evaluate the occurrence of disease in proportion to a specified age group. For example, diseases such as measles or mumps are more likely to occur during childhood.

Age-specific rates are calculated by defining the age interval, then dividing the number of disease occurrences within that interval by the total number of persons within that age interval for a particular time period:

  Number of cases of measles in 5-to-10-year-olds
    Number of 5-to-10-year-olds in study group


Age-specific Death Rates

Also referred to as age-specific mortality rate. This is a rate that evaluates the number of deaths in a specified age population against the total number of persons within that age population for a particular time period:

   Number of deaths among those aged 20 - 30 years 
  Number of persons aged 20-30 years in study group


At Risk

Risk factors are physical characteristics, dietary habits, or living habits that increase an individual's probability of developing a particular disease in the future. For example, the population 'at risk' of developing heart disease include those individuals with high blood pressure, elevated cholesterol, and cigarette smoking.

Attack Rate

This is an incidence rate that is used when the disease or condition occurs for a short period of time, often as a result of a specific exposure. For example, food poisoning outbreaks related to a specific event (picnic) employ an attack rate evaluation to determine the number of people who are sickened due to exposure to the suspected agent (potato salad) over a particular time period.

   Total number of sick persons in study group 
  Sick persons plus well persons in study group


Average Deviation

Sum of the absolute values of the deviations of the individual measurements from the median divided by the number of observations.

Bell Curve

A frequency distribution in statistics that resembles the outline of a bell when plotted on a graph.


A rectangular section of a histogram, representing one data point.

Central Tendency

The tendency for events, when observed a sufficient number of times, and when unbiased, to cluster around a central point.


A method of presenting statistical information symbolically, using only one mathematical coordinate or data variable. No mathematical relationship is implied by a chart.

Chi Square

A theoretical sample distribution.

Coefficient of Variation

Measures dispersion or variation in relation to the mean.

Confidence Limit

A probability statement about the value of a population mean which has been estimated by a sample mean. Also described as confidence interval.

Convenience Sampling

The type of sample selection that is based on the convenience rather than design, relying on random interaction or population lists that have been compiled for unrelated purposes. No attempt is made to ensure that the sample is representative of the target population.

Correlation Coefficient

A measurement of the strength of the relationship between two variables.

Crude Death Rate

The crude death rate approximates the proportion of a population that dies during a particular time period.

   # of deaths reported within a given period  * 1,000 population
  Population size at the middle of that period

Crude Rates

Crude rates are based on the actual number of events in a population over a given time period. For example, the crude birth rate approximates the proportion of a population that is born during a particular time period.

Chunk Sampling

See Convenience Sampling.


Data represents a fact or a set of facts used to draw a conclusion or make a decision. In epidemiological studies, facts are gathered about particular populations under study. These facts are assembled into data sets representing a collection of facts about a particular population. For example, data sets on mortality (death) rates are collected and available from government agencies.


The number below the line indicating division in a fraction; the number that divides the numerator:

    numerator    OR    50
   denominator        100


Dependent Variable

The variable thought to depend on another variable. The dependent variable is plotted along the vertical or Y axis of a graph. In epidemiological studies, the frequency of occurrence of a disease or a count of a population is often the dependent variable.

Descriptive Statistics

The methods applied to summarize key characteristics of quantitative information or data.


The absolute difference between a set of numbers and their mean.


Epidemiology is the study of diseases in populations of humans or other animals, specifically how, when, and where they occur.


Statistical frequency refers to the number of measurements in an interval of a frequency distribution. A frequency distribution consists of a set of intervals into which the range of a statistical distribution is divided, each associated with a frequency indicating the number of measurements in that interval.

Frequency Distribution

A count of the number of times each value occurs in a group of values. For example, grades for a class can be divided into a frequency distribution to determine how many of each grade occurs within the grade range of A through F.

Frequency Polygon

A graphic representation of a frequency distribution in which points corresponding to the frequencies of each value are connected by straight lines.


A method of showing quantitative data using a coordinate system. Often there is a mathematical relationship between the data variables depicted on a graph.


A generic term for a pictorial representation, as in a graph or chart.


A graph of vertical bars representing the frequency distribution of a set of data.


A set of two ore more statements subject to verification or proof, of which only one can actually be true.


A statement subject to verification or proof.


This is a measure of how quickly a disease occurs, or the frequency of addition of new cases of a disease. Incidence is always calculated for a given period of time.

Incidence Rate

The incidence rate describes the rate of development of a disease in a group over a specified time period:

    Number of new cases of a disease per unit of time  
  Total number at risk in beginning of this time period


Independent Variable

The variable thought to influence another variable. The independent variable is plotted on the horizontal or X axis of a graph. Common independent variables are time and people's age.


An index is the best available approximation of a true rate. This usually occurs when we are unable to count directly the number at risk (denominator) and instead we use something which we can count to provide an estimation of the number at risk.


Any of various statistical values that provide an indication of the nature of the epidemiological phenomena.

Interval Scale

Scale of measurement with equal intervals, but not an absolute zero point.


Something that is opposite; reversed in order. Mathematically, the inverse of a number is 1 divided by that number.

Judgment Sampling

See Subjective Sampling.

Logarithmic scale

A graph scale where each increment represents a power of 10 rather than a single numeric value. Used for wide ranges of data.


Sum of a set of values divided by the number of observations. Commonly known as arithmetic average.


The middle of a set of values arranged in order of magnitude (ascending order). This value divides the distribution into two equal parts.


The midpoint of the observations. Calculated as the lowest observation plus the highest observation divided in half.


The most frequently occurring value in a set of values. There may be more than one mode in a set of values.

Morbidity Rates

Morbidity rates measure the frequency of illness within specific populations. Time and place must always be specified. The most commonly used morbidity rates include point prevalence, period prevalence, incidence, and attack rate.

Mortality Rates

Mortality (or death) rates measure the frequency of deaths within specific populations and are calculated for a given place and time interval.


Birth rate, or the ratio of births to the population.

Normal Distribution

A bell-shaped probability distribution. Also known as a 'Gaussian Curve'.


The expression written above the line in a fraction to indicate the number (part) to evaluate against the total number being evaluated (whole):

    numerator    OR    part
   denominator        whole



A proportion of a whole, expressed in hundreths. Also a fraction or ratio with 100 understood as the denominator; for example, 0.98 equals a percentage of 98 or 98%.

Period Prevalence

Period prevalence measures the frequency of all current cases of disease (old and new) for a particular period of time.

     Number of active cases during a period of time       
  Estimated population at risk during same time period


Point Prevalence

Point prevalence measures the frequency of all current cases of a disease (old and new) for a given instance in time.

       Number of cases at a given point in time     
  Estimated population at risk at same point in time



The complete collection of elements, groups, or individuals to be studied.

Population at Risk

The subset of a population determined, through investigative techniques, to be the group susceptible to a disease.


Prevalence measures the frequency of all current cases of disease (old and new), and is of two types: point prevalence and period prevalence.

      Number of cases during a given time interval   
  Population size at the middle of that time interval


Primary Attack Rate

This is an incidence rate that is used when the disease or condition occurs for a short period of time as a result of a specific exposure.


The likelihood of one event relative to all possible events. The quantification of uncertainty. The mathematical range of probability is from 0 to 1, noninclusive.

Probability Sampling

The type of sampling that makes use of formal statistical theory in the design of an empirical investigation. This formal rigor gives the greatest assurance that the sample is free from bias, and that conclusions can be drawn from the data with a high degree of confidence


A proportion is an expression in which the numerator is always included in the denominator, and the base is equal to 100. Therefore a proportion is always expressed as a percent.


The difference between the highest and lowest values in a set of values.


A rate is a measure of a part with respect to a whole. Epidemiological rates can be broken into three general categories: crude rates, specific rates, and adjusted rates. A rate measures the probability of occurrence of some particular event. A rate is expressed as:

x * k

x = Number of times an event has occurred during a specific interval of time.

y = Number of persons exposed to the risk of the event during the same interval.

k = Some round number (100; 1,000; 10,000; 100,000; etc.) or base (also called a "standard population"), depending on the relative magnitude of x and y.


A ratio is the expression of a relationship between a numerator (above the line in a fraction) and denominator (below the line in a fraction) which may involve either an interval in time or may be an instant in time. A ratio is expressed as follows:

x * k   OR   x : y

x = Number of events or items counted and not necessarily a portion of y.

y = Number of events or items counted and not necessarily a population of persons exposed to the risk.

k = A base, as in the case of a rate, but usually 1 or 100 for the purposes of expressing ratios.


The exposure to a chance of acquiring a disease.


A subset of a population, used for statistical analysis of the entire population.

Sample Size

The number of people needed to give an accurate view of the population.

Scatter diagram

A graph made up of individual dots, used to determine correlation among a group of data points. Also called a scatter plot.

Secondary Attack Rate

The secondary attack rate measures the spread of a disease within a household or similarly limited situation. For example, a case of measles brought into a family may spread from the initial case to other members of the household.

Specific Rates

Specific rates refer to a particular subgroup of the population defined. For example, the rate can be evaluated in terms of race, age, or some other subgroup, or may refer to the entire population be specific to some single cause of death or illness. Specific rates can be used to correct factors that may influence crude rates. For example, a specific rate could be used to evaluate the cause-specific mortality rate due to HIV for a particular age group during a particular time period.

  Mortality (or frequency of a given disease)  
  Population size at midpoint of time period 


Standard Deviation

The square root of the sum of the squared deviations from the mean divided by one less than the number of observations. The measure of how far a point is away from the mean value. Standard Deviation is the most commonly used and rigidly defined measure of dispersion.

Standard Deviation of the Mean

Measure of the variation in means of repeated samples. Defined as the standard deviation divided by the square root of the sample size.

Standard Population

The subset of a population determined, through sampling techniques, to have characteristics representative of the entire population. Expressed as a round number used as the basis of comparison for consistency. The CDC uses a standard population of 100,000 for most purposes, so any attack rates, incidence rates, etc. would be expressed in terms of "x per 100,000."


A measured characteristic of a sample. The application of probability theory to problems in the analysis of data, and related questions.

Statistical Inference

The process by which one draws conclusions regarding a population from the results observed in a sample.

Stem and Leaf Plot

A type of graphic that uses an arranged array of digits to represent data points. The "stem" numbers represent a group or range of data points. The "leaf" numbers represent individual data points. Stem and leaf plots are easily drawn by hand, and give quick indications of distribution patterns.

Subjective Sampling

The type of sampling relies on the knowledge and experience of a subject matter expert. This type of sample is normally used only in situations where there are not enough resources available to define a probability sample.


A formatted and logically presented display of data.

Target Population

The subset of a population identified as being the appropriate focus of an activity, e.g. a survey.