Finding and evaluating evidence is the second phase in the Johns Hopkins Evidence-Based Practice Model (JHEBP). Evidence hierarchies guide identifying the best evidence for decision-making based on the rigor of the methods used (level) and the execution of the study or reporting (quality). Appraisal begins with identifying the level of evidence and then the quality. The combination of level and quality determines the overall determination of the strength of the evidence.
The JHEBM uses a five-level evidence hierarchy, which includes research and nonresearch evidence. Below you will find a guide to the evidence levels and types. Still, we recommend consulting the Johns Hopkins Evidence-based Practice for Nurses and Healthcare Professionals book above for detailed information about levels and types. If you're unfamiliar with the kinds of evidence noted here, browse the Types of Evidence box on this page.
Research Evidence |
Evidence Level | Types of Evidence |
Level I |
|
Level II |
|
Level III |
|
Nonresearch Evidence |
Evidence Level | Types of Evidence |
Level IV |
Opinion of respected authorities and/or nationally recognized expert committees or consensus panels based on scientific evidence. Includes:
|
Level V |
Based on experiential and nonresearch evidence. Includes:
|
Table: Dang, D., Dearholt, S., Bissett, K., Ascenzi, J., & Whalen, M. (2022). Johns Hopkins evidence-based practice for nurses and healthcare professionals: Model and guidelines. 4th ed. Sigma Theta Tau International
To understand and assess levels of evidence, it's helpful to have an understanding of the basic characteristics of the major evidence types, several of which are defined below. For additional evidence type definitions browse the Centre for Evidence-Based Medicine Glossary below.
Systematic Review
The application of strategies that limit bias in the assembly, critical appraisal, and synthesis of all relevant studies on a specific topic. Systematic reviews focus on peer-reviewed publications about a specific health problem and use rigorous, standardized methods for selecting and assessing articles. A systematic review may or may not include a meta-analysis, which is a quantitative summary of the results.
Randomized Controlled Trial
An experiment in which subjects in a population are randomly allocated into groups, usually called study and control groups, to receive or not receive an experimental preventive or therapeutic procedure, maneuver, or intervention. The results are assessed by rigorous comparison of rates of disease, death, recovery, or other appropriate outcomes in the study and control groups.
Cohort Studies
Cohort studies identify a group of patients who are already taking a particular treatment or have an exposure, follow them forward over time, and then compare their outcomes with a similar group that has not been affected by the treatment or exposure being studied. Cohort studies are observational and not as reliable as randomized controlled studies since the two groups may differ in ways other than in the variable under study.
Case-Control Studies
Case-control studies are studies in which patients who already have a specific condition are compared with people who do not have the condition. The researcher looks back to identify factors or exposures that might be associated with the illness. They often rely on medical records and patient recall for data collection. These types of studies are often less reliable than randomized controlled trials and cohort studies because showing a statistical relationship does not mean that one factor necessarily caused the other.
Cross-Sectional Studies
Describe the relationship between diseases and other factors at one point in time in a defined population. Cross-sectional studies lack any information on the timing of exposure and outcome relationships and include only prevalent cases. They are often used for comparing diagnostic tests. Studies that show the efficacy of a diagnostic test are also called prospective, blind comparisons to a gold standard study. This is a controlled trial that looks at patients with varying degrees of an illness and administers both diagnostic tests — the test under investigation and the “gold standard” test — to all of the patients in the study group. The sensitivity and specificity of the new test are compared to that of the gold standard to determine potential usefulness.
Case Series and Case Reports
Case series and Case reports consist of collections of reports on the treatment of individual patients or a report on a single patient. Because they are reports of cases and use no control groups to compare outcomes, they have little statistical validity.
Definitions adapted from: https://www.cebm.ox.ac.uk/resources/ebm-tools/glossary
An evidence pyramid is a visual representation of an evidence hierarchy. As you move up the pyramid, the amount of available evidence on a given topic decreases, but the strength of evidence increases. However, you may not always be able to find the highest level of evidence to answer your question. This illustration is helpful because it also notes the information sources associated with each level.
Image: https://guides.himmelfarb.gwu.edu/ebm/studytypes
Consult these resources to understand the language of evidence-based practice and terms used in clinical research.
Dr. Trisha Greenhalgh's clearly written papers (full-text links to each below) discuss how to critically appraise the medical literature. These articles appeared originally in the British Journal of Medicine and were later produced as a book: How to Read a Paper. How to Read a Paper: The Basics of Evidence-based Medicine is also available as an eBook from OHSU Library below.
When appraising research, keep the following three criteria in mind:
Quality
Trials that are randomised and double blind, to avoid selection and observer bias, and where we know what happened to most of the subjects in the trial.
Validity
Trials that mimic clinical practice, or could be used in clinical practice, and with outcomes that make sense. For instance, in chronic disorders we want long-term, not short-term trials. We are [also] ... interested in outcomes that are large, useful, and statistically very significant (p < 0.01, a 1 in 100 chance of being wrong).
Size
Trials (or collections of trials) that have large numbers of patients, to avoid being wrong because of the random play of chance. For instance, to be sure that a number needed to treat (NNT) of 2.5 is really between 2 and 3, we need results from about 500 patients. If that NNT is above 5, we need data from thousands of patients.
These are the criteria on which we should judge evidence. For it to be strong evidence, it has to fulfill the requirements of all three criteria.