10 - 9. Psychometry of rating scales

9. Psychometry of rating scales:

© SPMM Course 9. Psychometry of rating scales: When developing measurement scales, we are concerned about two important properties. Can we use this scale to measure the actual phenomenon we want to measure? Can this scale provide consistent results when it is used? A highly valid scale will measure what it is supposed to measure – the truth. A highly reliable scale will provide consistent results. Reliability refers to the replicable nature of research studies / tools. Note that high reliability does not guarantee scientific validity but guarantees consistency.  Reliability can be assessed by test-retest correlation by administering an instrument twice to the same population. The time difference between test and retest must be long enough to avoid practice effect, but short enough so the underlying state (e.g. depression) does not change very much: 2 to 14 days range is often used in psychiatry.  Cronbach’s alpha measures the internal consistency of a test by correlating each item with the total score and averaging the correlation coefficients. It can take values between negative infinity and 1 as a maximum; but only positive values make sense. Arbitrary cut-off of 0.70 is used commonly to call the evaluated test to be internally consistent.  The split-half reliability refers to splitting a scale into two parts and examining the correlation.  Interrater reliability is measured using two or more raters rating the same population using the same scale.  The intraclass correlation coefficient is used for continuous variables; it is nothing but the proportion of total variance of the measurement that reflects true between subject variability. It ranges between 0 (unreliable) and 1 (perfect reliability). ICC can be measured by either relative agreement or absolute agreement; the relative ICC is always higher than the absolute ICC. ICC of 0.6 is considered fair while 0.8 is very good and 0.9 as excellent, arbitrarily. ANOVA intraclass coefficient is used for quantitative data with more than 2 raters/groups.  For nominal data that has more than two categories, a kappa or weighted kappa can be used. (More details are given below) Validity of an instrument is the extent to which an instrument measures what it proposes to measure.  Face validity refers to a subjective measure of deciding whether the test measures the construct of interest on its face value.e.g., Hamilton depression scale clearly has a face value in measuring depression; but not for measuring obsessions.

© SPMM Course  Construct validity measures whether a test really measures the (theoretical) construct of interest or something else. One way of classifying the construct validity is considering unified construct validity. Here construct validity is taken to consist of both content validity and criterion validity (referred as unified construct validity).  Content validity refers to whether the contents i.e. each individual subscales, items or elements of the test are in line with the general objectives or specifications the test was originally designed to measure. It looks for a good coverage of all domains thought to be related to the measured condition. This often cannot be statistically tested, but experts are called for comments on this aspect of validity.  Criterion validity refers to the performance against an external criterion such as another instrument (concurrent) or future diagnostic possibility (predictive).  Concurrent validity refers to the ability of a test to distinguish between subjects who differ concurrently in other measures (using other instruments). e.g., those who score high on a scale of insomnia may score high on a scale of fatigue ratings too.  Predictive validity refers to the ability of a test to predict future group differences according to current group differences in score. e.g., high aggression score in childhood and high criminal incidents in adult life. (On a similar note, Incremental validity refers to the ability of a measure to predict or explain variance over and above other measures)

Another way of considering the construct validity is by classifying it to convergent, discriminant and experimental/interventional validity:  Convergent validity refers to agreement between instruments that measure same construct e.g. between BDI and HAMD for depression. This agreement can be tested in contrasted groups i.e. depressed and non-depressed, both groups showing a high correlation between the two scales.  Discriminant validity refers to the degree of disagreement between two scales measuring different constructs. e.g., to say that HAMD measures some construct (depression) different from that measured by Hamilton Anxiety scale (anxiety) poor correlation must be demonstrated between HAMD and HAS  Experimental validity: This refers to the sensitivity to change. An instrument must show the difference in results when an intervention is carried out to modify the measured domain.

© SPMM Course Note: Factorial validity is a form of construct validity established via factor analysis of items in a scale. Precision and accuracy Precision is the degree to which a calculated central value (e.g. mean) varies with repeated sampling. The narrow the variation, the precise the value is. Random errors lead to imprecision. Factors reducing precision includes 1. Having wider the limits of the interval 2. Expecting higher confidence interval (e.g. 99.7% versus 95%). Accuracy refers to the correctness of the mean value – i.e. how close is it to the true population value. Precision is comparable to reliability while accuracy is comparable to validity. Bias in a study compromises validity / accuracy. VALIDITY QUESTION IT ANSWERS Face Does this scale appear to be fit for the purpose of measuring the variable of interest? Content Does this scale appear to include all the important domains of the measured attribute? Criterion Is the scale consistent with what we already know (concurrent) and what we expect (predictive)? Convergent Does this new scale associate with a different scale that measures a similar construct? Discriminant Does the new scale disagree with scales that measure unrelated constructs?

01 - 1. Learning Theory

02 - 2. Basic principles of visual and auditory pe

03 - 3. Information processing and attention

04 - 4. Memory

05 - 5. Thought & language

06 - 6. Personality

07 - Measuring personality traits

08 - 7. Motivation needs and drives

09 - 8. Emotions

10 - 9. Stress physiological and psychological asp

11 - The Stress Vulnerability Model

12 - 10. States and levels of awareness

13 - 11. Intelligence

01 - 1. Attitudes

02 - Functions of attitudes (Katz)

03 - Why do attitudes change

04 - Measuring Attitudes

05 - Attitude behaviour correlation

06 - 2. Self psychology

07 - 3. Interpersonal issues

08 - Attribution

09 - Theory of Mind

10 - Interpersonal relationships

11 - Linguistics of interpersonal communication

12 - Persuasive communication

13 - 4. Leadership, social influence, power and ob

14 - Conformity & obedience

15 - Group processes

16 - Social power

17 - Leadership

18 - Social Influence

19 - 5. Intergroup behaviour

20 - Prejudice

21 - 6. Aggression

22 - Stress and aggression among primates

23 - 7. Altruism

01 - 1. Social Classification

02 - 2. Sick role and illness behaviour

03 - 3. Social role of doctors

04 - 4. Family life in relation to major mental il

05 - 5. Life events

06 - 6. Social factors and mental health issues

07 - Society as a risk factor

08 - Sociology of mental illness

09 - Suicide and sociology

10 - Social factors in schizophrenia

11 - Social factors in addictions

12 - 7. The sociology of institutions

13 - 8. Criminology and penology

14 - 9. Stigma and prejudice

15 - Themes of stigma

16 - Interventions against stigma

17 - 10. Culture and mental health

18 - 11. Culture Bound Syndromes

19 - 12. Philosophy in psychiatry

20 - Anti psychiatry movement

21 - Philosophical basis of psychopathology

22 - 13. Ethics in psychiatry

23 - Landmark publications relevant for critique o

24 - Landmark studies relevant for critique on eth

01 - 1. Conceptualizing development

02 - Models and theories

03 - Maturational tasks

04 - Adversities and development

05 - Methodology for studying development

06 - 2. Attachment theory

07 - Bowlby believed that attachment is innate and

08 - This classification below correlates highly w

09 - Object relations theory

10 - 3. Parenting practices

11 - Effect of family dysfunction

12 - 4. Temperament

13 - Resilience to mental illness

14 - 5. Cognitive Development

15 - Sensorimotor stage (SPIRO)

16 - Preoperational stage (FAT PILES)

17 - Concrete operational stage

18 - 6. Language development

19 - Stages of language development

20 - 7. Social competence and peer relationships