Search results
Results From The WOW.Com Content Network
Cohen's kappa coefficient (κ, lowercase Greek kappa) is a statistic that is used to measure inter-rater reliability (and also intra-rater reliability) for qualitative (categorical) items. [1] It is generally thought to be a more robust measure than simple percent agreement calculation, as κ takes into account the possibility of the agreement ...
In statistics, inter-rater reliability (also called by various similar names, such as inter-rater agreement, inter-rater concordance, inter-observer reliability, inter-coder reliability, and so on) is the degree of agreement among independent observers who rate, code, or assess the same phenomenon.
Fleiss' kappa is a generalisation of Scott's pi statistic, [2] a statistical measure of inter-rater reliability. [3] It is also related to Cohen's kappa statistic and Youden's J statistic which may be more appropriate in certain instances. [4]
Krippendorff's alpha coefficient, [1] named after academic Klaus Krippendorff, is a statistical measure of the agreement achieved when coding a set of units of analysis.. Since the 1970s, alpha has been used in content analysis where textual units are categorized by trained readers, in counseling and survey research where experts code open-ended interview data into analyzable terms, in ...
The concordance correlation coefficient is nearly identical to some of the measures called intra-class correlations.Comparisons of the concordance correlation coefficient with an "ordinary" intraclass correlation on different data sets found only small differences between the two correlations, in one case on the third decimal. [2]
Kendall's W (also known as Kendall's coefficient of concordance) is a non-parametric statistic for rank correlation.It is a normalization of the statistic of the Friedman test, and can be used for assessing agreement among raters and in particular inter-rater reliability.
Scott's pi (named after William A Scott) is a statistic for measuring inter-rater reliability for nominal data in communication studies.Textual entities are annotated with categories by different annotators, and various measures are used to assess the extent of agreement between the annotators, one of which is Scott's pi.
Bennett et al. suggested adjusting inter-rater reliability to accommodate the percentage of rater agreement that might be expected by chance was a better measure than simple agreement between raters. [2] They proposed an index which adjusted the proportion of rater agreement based on the number of categories employed.