Smart Psychology: 304: Test Reliability

Monday, February 6, 2012

304: Test Reliability

A test that is relatively free of measurement error is considere _______

Reliable

Conceptualization of Error

Measurement Error:

There will ALWAYS be error in measurement
Goal: design tests relatively free of error
The observed score consists of the true score plus measurement error (O = T + E)
"Rubber Yardstick" comparison

carpenter will never get the same measurement with a rubber yardstick

Systematic Error is biased
When can we increase measurement error?

how the test was created and situational factors

What is Reliability

Methods to test for Reliability

Test-retest reliability

Test someone now and then test them later
Will the same person take the same test in the same way

Alternate-form/parallel-forms reliability

Example

Vocabulary test and then give another vocab test but with content slightly altered

Different versions of the same test

Split-half reliability

Take the scores on the first half of the test and compare them to the scores on the second half of the test
Often test get harder as you go on so often people will say to split the test into even and odd questions

Inter-item consistency: Cronbach's Alpha

What does it mean if your cronbach's alpha is .95?

It means everything is telling you the same thing so you probably don't need so many items because some of your items are telling you the same thing.
The more items you have the more reliable your test will be...but at what cost?
So if your cronbach's alpha is .95 you could say that it is too reliable because you aren't getting much information. But if it is below .7 then it is too low.

Kappa coefficient

is similar to cronbach's alpha but it also takes chance into account

Inter-rater reliability

This is when you have multiple testers doing ratings to make sure you are getting accurate information

Conceptual Definitions of Reliability

The degree to which test-takers' scores reflect "true" abilities

Domain Sampling Model

Domain: extremely large collection of items
The larger the sample the more accurately it measures the domain
Might help to think of a test item as a person in a study

Classical Test Score Theory

Because we assume error is random we also make the assumption that the distribution of error is the same for everyone

If we have a wide variance in the test = lots of error
Less variance = less error

Test Construction
Test Administration

It's just as interesting in some tests to not just know their score but to know how they got that score and what influenced it.
Test Environment

different environments affect scores

Test-taker variables

What if the test-taker doesn't eat breakfast

Examiner-related variables

Perhaps the test-taker is being defiant so the examiner gives her an ultimatum to either take the test or he will call the police. Will this affect the test scores? How so?
What if you are having a bad day or if you are biased in some way? We need to be aware of our biases because we all have them.

Neuropsych majors give simple tests for malingering. For example: asking someone with brain damage to put their fingers together and then pull them apart. Anyone can do this but someone who is malingering will pretend to not be able to.

No comments:

Post a Comment

Your writing a comment!!! I love you now.

Subscribe to: Post Comments (Atom)