Criterion validity is the extent to which a measure accurately predicts or correlates with the construct it is supposed to be measuring.

There are two different types of criterion validity: concurrent and predictive.

Criterion validity is important because, without it, measures would not be able to accurately assess what they are supposed to be assessing in a way consistent with other measures.

What Is Criterion Validity?

Criterion Validity is a way of validating tests that measures the extent to which scores on an inventory or scale correlate with external, non-test criteria (Cohen & Swerdlik, 2005). This aims to demonstrate that a construct’s scores are predictive of real-life outcomes.

For example, when measuring depression with a self-report inventory, a researcher can establish criterion validity if scores on the measure correlate with external indicators of depression such as clinician ratings, number of missed work days, or length of hospital stay.

Types of Criterion Validity

Predictive

Predictive validity is a way of demonstrating that a test score can predict future performance on another criterion (Cohen & Swerdik, 2005).

It is important to have good predictive validity when choosing measures for employment or educational purposes, as it will increase the likelihood of choosing individuals who perform well in those areas.

Predictive criterion validity is established by demonstrating that a measure correlates with an external criterion that is measured at some later point in time.

For example, by measuring the grades of students in the same program of study at the same university with their previous scores on A-levels, a researcher could determine the predictive criterion validity of A-level assessments.

If there is a high correlation, predictive criterion validity would be high; meanwhile, if there is little or no correlation, predictive content validity would be low (Barrett et al., 1981).

Concurrent

Concurrent criterion validity is established by demonstrating that a measure correlates with an external criterion that is measured at the same time.

For example, concurrent criterion validity could be measured if scores on a math test correlate highly with scores on another math test administered at the same time (Barrett et al., 1981).

This approach is useful when the measured constructs are similar but may not overlap perfectly. In this case, it is important to demonstrate that the measure under study predicts variance in the criterion over and above that which other measures of the same construct can predict.

This approach is also useful when the focus is on practical outcomes rather than theoretical constructs, and a measure’s concurrent validity, therefore, needs to be demonstrated in relation to some other measure of a similar outcome (Barrett et al., 1981).

Concurrent validity is usually demonstrated through correlational analyses, although other approaches, such as regression, may also be used.

For example, to establish the concurrent validity of a new survey, a researcher can either administer the new and validated measure to the same group of respondents and compare the responses or administer the new instrument to the respondents and compare the responses to experts’ judgment (Fink, 2010).

It is important to note that concurrent validity does not necessarily imply predictive validity.

How to assess criterion validity

Criterion validity can be tested in various situations and for various reasons. Some common methods for testing criterion validity include (Fink, 2010):

Comparing the results of the test to another similar test that is known to be valid. One potential pitfall of this method is that both tests may contain measurement errors, which would make it difficult to determine the validity of either test.

Ask experts in the field to rate the items on the test according to how well they think each item measures the construct being tested. This method can be time-consuming and expensive, and it may be difficult to find experts.

Using the test results to predict some other outcome that is known to be related to the construct being measured (e.g., job performance)

Conducting a factor analysis of the items on the test to see if they cluster together in a way that makes sense theoretically

It is important to note that no single method is definitive and that multiple methods should be used whenever possible.

Generally, testing of criterion validity requires a ‘gold standard’ — a definite example of the thing that a researcher is setting out to measure. However, in psychology and psychiatry, such “gold standards” are not physical or biological.

Instead, psychiatrists commonly use clinical interviews, such as the Schedules for Clinical Assessment in Neuropsychiatry (SCAN), and then apply ICD-10 or DSM-IV diagnoses to a patient.

However, this can suffer from both random error and a systematic underreporting of symptoms in subsequent interviews due to repetition of these assessments of mental state (Prince, 2012).

Examples of criterion-related validity

Intelligence tests (including emotional intelligence)

Intelligence tests have criterion validity when they are able to correctly identify individuals who will succeed or excel in a particular area. For example, the Stanford-Binet Intelligence Scale is often used to identify students who may need special education services.

On the other hand, emotional intelligence tests can be useful predictors of job performance in customer service or management positions, or in any context where people are expected to work successfully on teams together.

Intelligence tests often assess their own criterion-related validity in comparison to a known standard. For example, a new intelligence test might be validated against the Stanford-Binet Intelligence Scale, or an emotional intelligence test might be validated against a measure of job performance.

If the new test is found to be a good predictor of the criterion, it can be said to have criterion validity. This is an example of concurrent criterion validity (Conte, 2005).

Job applicant tests

Other examples of criterion-related validity include measures of physical fitness being related to on-the-job safety and measures of memory or knowledge being related to academic performance.

As with intelligence tests, the best prediction of job performance is usually found when using a combination of different types of measures.

In general, criterion-related validity is strongest when the criterion (the thing you’re trying to predict) is objective and quantifiable, such as test scores or sales figures (Schmidt, 2012).

Psychiatric diagnosis

Psychiatric diagnosis is the process of classifying individuals with psychological disorders using both clinical assessment and symptomatology.

The most common methods used to diagnose psychological disorders are the Diagnostic and Statistical Manual of Mental Disorders (DSM) and the International Classification of Diseases (ICD).

Here, tests of criterion validity can span from the diagnostic criteria to the validity of external measures used to confirm a diagnosis. However, the DSM has failed to reach its goal of validity, according to many researchers (Aboraya et al., 2005).

FAQs

Is criterion validity internal or external?

External validity is the extent to which the results of a measure can be generalized. Since criterion validity is a measure of how well a test predicts the outcome for other measures, it is a test of externalizability.

Why is criterion validity also known as predictive validity?

Criterion validity is also known as predictive validity because it is a measure of a construct to literally predict scores on other assessments.

In order for a test to have good predictive validity, there must be a strong relationship between the scores on the test and the behavior or performance being predicted. If there is little to no relationship between the two, then the test has little predictive value.

What is the difference between criterion and construct validity?

Construct validity is a measure of how well a test measures what it is supposed to measure.

Criterion validity is a measure of how well a test predicts the outcome for other measures. In order for a test to have good construct validity, the items on the test must be related to the construct being measured.

In order for a test to have good predictive validity, there must be a strong relationship between the scores on the test and the behavior or performance being predicted (Swerdlik & Cohen, 2005).

How do you increase criterion validity?

There are several ways to increase criterion validity, including (Fink, 2010):

– Making sure the content of the test is representative of what will be measured in the future
– Using well-validated measures
– Ensuring good test-taking conditions
– Training raters to be consistent in their scoring

References

Aboraya, A., France, C., Young, J., Curci, K., & LePage, J. (2005). The validity of psychiatric diagnosis revisited: the clinician’s guide to improve the validity of psychiatric diagnosis. Psychiatry (Edgmont), 2(9), 48.

Barrett, G. V., Phillips, J. S., & Alexander, R. A. (1981). Concurrent and predictive validity designs: A critical reanalysis. Journal of Applied Psychology, 66(1), 1.

Conte, J. M. (2005). A review and critique of emotional intelligence measures. Journal of Organizational Behavior, 26(4), 433-440.

Fink, A. Survey Research Methods. In McCulloch, G., & Crook, D. (2010). The Routledge international encyclopedia of education. Routledge.

Prince, M. Epidemiology. In Wright, P., Stern, J., & Phelan, M. (Eds.). (2012). Core Psychiatry EBook. Elsevier Health Sciences.

Schmidt, F. L. (2012). Cognitive tests used in selection can have content validity as well as criterion validity: A broader research review and implications for practice. International Journal of Selection and Assessment, 20(1), 1-13.

Swerdlik, M. E., & Cohen, R. J. (2005). Psychological testing and assessment: An introduction to tests and measurement.

Criterion Validity: Definition & Examples