Or in other words, while a particular rater might rate Ratee 1 high and Ratee 2 low, it should all even out across many raters.

For example, in our Facebook study, we want to know both.

First, we might ask “what is the reliability of our ratings?

This means that the raters in your task are the only raters anyone would be interested in.

This is uncommon in coding, because theoretically your research assistants are only a few of an unlimited number of people that could make these ratings.This means ICC(3) will also always be larger than ICC(1) and typically larger than ICC(2), and is represented in SPSS as “Two-Way Mixed” because 1) it models both an effect of rater and of ratee (i.e.two effects) and 2) assumes a random effect of ratee but a fixed effect of rater (i.e. After you’ve determined which kind of ICC you need, there is a second decision to be made: are you interested in the reliability of a single rater, or of their mean?Recently, a colleague of mine asked for some advice on how to compute interrater reliability for a coding task, and I discovered that there aren’t many resources online written in an easy-to-understand format – most either 1) go in depth about formulas and computation or 2) go in depth about SPSS without giving many specific reasons for why you’d make several important decisions.The primary resource available is a 1979 paper by Shrout and Fleiss, which is quite dense.If you have the same raters for each case, this is generally the model to go with.

