New laboratory tests generally come into common use having first come from research laboratories where scientists find a strong and consistent association between something they can measure and a disease or group of diseases. The next step is to confirm that the test can distinguish between people with the disease and those without.
Early studies typically compare people with the disease with a group of healthy controls. Often the test initially seems very promising and appears able to distinguish very well between the two groups and an article is published in a scientific or medical journal. However the great majority of potential new tests, although initially very promising, never make it from a publication into widespread clinical use. Why should this be the case?
Validation of a test to confirm it is effective in diagnosing disease
Validation is the collection and evaluation of data which establishes scientific evidence that a process is capable of consistently delivering quality products. Tests that are going to make it into common use need to fulfill the criteria listed by the Centers for Disease Control and Prevention ACCE project:
The ACCE framework was originally developed for evaluation of genetic tests. However, it is applicable to all forms of laboratory tests. There are four key factors for evaluation:
- Analytic validity of a test defines its ability to measure accurately and reliably the component of interest - its technical performance.
- Clinical validity of a test defines its ability to detect or predict the presence or absence of an accepted clinical disease or predisposition to such a disease.
- Clinical utility of a test refers to the likelihood that using the test will lead to an improved outcome for the patient.
- Ethical, legal and social implications (ELSI) of a test. Issues include how the test is promoted, how the reasons for testing are explained to the patient, the incidence of false-positive test results and false-positive or just false diagnoses, the potential for unnecessary treatment and the cost-effectiveness of testing, particularly in the chronically ill.
The most common reason tests fail is because they do not meet the first two criteria. This is often because on further examination the test is found to be a poor discriminator between those with and without disease. When the test is compared in people with advanced disease and healthy people it may work well. However, in real life in the clinic, the doctor is trying to distinguish between people with a variety of other diseases, often affecting the same organ, and people with the disease in question.
To make it more difficult some people will have early stage disease (which is ideally when we want to detect it) rather than those with advanced stage disease who were used in the earlier comparisons. This is where most new tests fail. Either they are positive in too many people with other conditions (false positive) or negative in people with early stage disease (false negative) and sometimes they are both. There are almost no tests that can identify all people with a particular condition (100% sensitive) while being negative in all people without the condition (100% specific).
Some genetic tests probably come closest to this ideal but even then both false positive and false negative genetic tests have been described. There are a variety of other reasons for tests failing to be used routinely include being too expensive, too difficult to perform, too unreliable, too subject to interferences or requiring chemical reagents that are too toxic to be safe. The test may even be unnecessary because there are many conditions (e.g. most musculoskeletal disorders) that doctors can diagnose and treat without needing to use any laboratory tests at all, so the new test may not be of any use or value.
Use of a validated test in the real world
A laboratory test may meet all four of these desirable criteria but still be used inappropriately by a healthcare practitioner. There are many ways this can happen, and here are some examples:
- Right test, wrong time e.g. a serology test used too early in an attempt to diagnose an infectious disease by the antibodies produced by the patient in response to the disease. If the test is done too early the antibodies will not yet be present.
- Wrong test e.g. a practitioner makes a clinical diagnosis and has a specific test performed which comes back negative because the patient has a different disease.
- Right test but performed too frequently e.g. HbA1c done monthly in a stable diabetic when three monthly or less frequently is sufficient.
- Right test but not performed frequently enough e.g. following up a bowel cancer patient with CEA tests at two-yearly intervals when the patient is at high risk of recurrence and was CEA positive prior to the initial surgery.
- Wrong validated test in the wrong patient. A health middle-aged person without a family history of cancer has multiple tumour marker tests performed as a “precaution” and one of them comes back mildly elevated. In this situation the test result is likely to be a false-positive due to a benign cause other than cancer. However the person is going to be very worried and the practitioner is obliged to carry our further investigations which may be expensive, inconvenient and invasive.
- Wrong unvalidated test in the wrong patient. A person consults an alternative health practitioner because they are feeling tired and their doctor can’t find anything wrong with them. The practitioner orders a full panel of “functional tests” including liver detoxification profile, complete digestive stool analysis, intestinal permeability testing and salivary hormone profile. Because so many results will be delivered and because the range of normal variation is so ill-defined, there are sure to be some slight deviations from what the lab or the practitioner consider “normal”. These deviations from an ill-defined “normal” are almost certainly meaningless. However because there is little or no scientific evidence to allow these tests to meet the four necessary criteria, the results can be interpreted in any way the practitioner wants and the person may be diagnosed with a dubious “disease” and prescribed unnecessary and expensive remedies.
Interpretation of a pathology test
Interpretation of a pathology test
The interpretation of test results can be simple but on occasion can very challenging and require an in-depth knowledge of how diseases can affect haematology, microbiology, biochemistry or immunology results. Histopathology (looking at tissue samples under the microscope) is unique in that there are no simple YES/NO or POSITIVE/NEGATIVE divisions and interpretation of tissue pathology or cytology (cell smear) slides requires many years of training.
All people are subtly different and so is their physiology. This is why we have “normal” or “reference ranges” for laboratory tests. These normal ranges are designed to only include 95% of the healthy population. Because of this 1 in 20, (5%) of results from healthy people will fall outside the “normal” range. Usually these “abnormal” results in healthy people are only just outside the range of “normal” and if there are no clinical findings to suggest disease in the person the practitioner can either ignore the result or note it for rechecking in the future. We all know that if we repeat some things a number of times we are likely to get some unusual results, think about throwing a coin and noting the number of heads and tails, it is often not 50:50. Statistics tells us that if we do a large number of tests then there will almost certainly be some mild deviations from what has been defined as normal. An ignorant or unscrupulous practitioner may seize on these so-called abnormalities and say the patient is sick and requires treatment.
Other difficulties arise because tests in general are not 100% sensitive or 100% specific although they are often close to these figures. Test results need to be interpreted in the clinical context in which they are performed. A diagnostic test should only be performed to confirm a diagnosis made by a practitioner after they have made a possible diagnosis based on clinical and family history and a medical examination. The test should help confirm the diagnosis.
Let us go back to the example we used previously of the healthy middle-aged woman having a tumour marker test. Let us assume that the test is 97% sensitive (if we perform it on 100 people with that particular cancer it will correctly identify 97 of them) and it is 99% specific (if we do it on 100 healthy people 1 will receive false-positive results).
We now perform this test on two different people. One is our healthy middle-aged person who we estimate has a 1 in 2000 (0.05%) chance of having this cancer. We also do the test on another middle-aged person who has had this cancer in the past and been treated with surgery and chemotherapy and has come back one year later feeling unwell and with abdominal pain. We estimate their chance of having cancer recurrence as 1 in 2 (50%).
Somewhat surprisingly, the same result of the same test on these two different people means quite different things. Let us suppose they both get exactly the same modestly elevated test result. In the healthy person the positive predictive value (PPV) of the test i.e. the probability they have cancer given that they have a positive test result is less than 5%. In the previously treated cancer patient the PPV or probability they have recurrent cancer is nearly 99%.
Conversely if the test result had come back negative or normal the negative predictive value (NPV) or probability the healthy person does not have cancer is almost 100% and is 97% in the previously treated cancer patient. There is nothing mysterious about this. What it means is that a practitioner ordering a test needs to aware of how likely it is that the person having the test has the disease being sought (this is called the pre-test probability) and some feeling for the sensitivity and specificity of the test being used. They also need to take into consideration how abnormal the test result is, since a more abnormal result is more likely to be a true positive than a mildly abnormal one and this of course affects the calculations above which do not take degree of abnormality into account. When doctors or laboratory personnel interpret test results they take all these factors into account, not like a computer, but using knowledge and experience to make an approximate calculation. If you would like to know more about these types of calculations look at these websites: Wikihow (simple), ophthalmology journal (more detailed).
There are a number of other things that need to be considered when interpreting laboratory test results. There is a very small probability that the result could be wrong for a variety of reasons. Mislabelling and taking the blood from the wrong patient are vanishingly rare errors due to the strict quality control practices and training measures laboratories have in place. If the sample was collected at a distant location there is the small possibility that it has been subjected to adverse conditions during transport, for example being allowed to get too hot despite all precautions being taken. More common but still rare, are interferences in the test. Examples of these are antibodies in the blood of the patient that interfere with some types of assay procedures and interferences by drugs. One example of this is infusions of vitamin C that some fringe practitioners give to patients. If this is done in the surgery and the patient is then sent immediately to have blood taken for testing, the very large quantities of vitamin C can cause chemical interference with some tests.
Laboratory scientists conducting the tests are very well trained and can often identify the abnormalities caused by interferences and prevent the affected tests being reported. However, if the interference is of a new type or relatively mild, then it may not be identified in the lab and an incorrect result reported. If the result does not accord with the clinical findings it is the responsibility of the practitioner to be suspicious and communicate with the lab and perhaps arrange repeat testing, sometimes using a different test method.