Agreement Statistics

By comparing two measurement methods, it is interesting not only to estimate both the distortion and the limits of concordance between the two methods (inter-advisor agreement), but also to evaluate these characteristics for each method itself. It could very well be that the agreement between two methods is bad simply because one of the methods has wide limits of convergence, while the other is narrow. In this case, the method of narrow match limits would be statistically superior, while practical or other considerations could alter this assessment. What constitutes narrow or broad boundaries of convergence or a small or large bias is a matter of practical evaluation. Dispersal plot with correlation between hemoglobin measurements from two methods for the data presented in Table 3 and Figure 1. The polka dot line is a trend line (line of the smallest squares) through the observed values, and the correlation coefficient is 0.98. However, the individual points are far from the perfect match line (continuous black line) This statistic compares the observed match with the expected match, calculated assuming that the evaluations are independent. Think of two ophthalmologists who measure the pressure inside the eye with a tonometer. Each patient therefore receives two measurements, one from each observer. ICC provides an estimate of the total concordance between these measured values. It somewhat resembles “analysis of variance” because it considers variances between pairs expressed as a percentage of the overall variance of observations (i.e., the overall variability in “2n” observations that is expected to be the sum of variances within and between). The CCI can accept a value from 0 to 1, with 0 showing no agreement and 1 a perfect agreement.

κ = (observed agreement [Po] – expected agreement [Pe]/(1-expected [Pe]). In the square table $Itimes I$, the main diagonal {i = j} represents the conformity of the council or observer. The term πij refers to the probability that Siskel will classify the move as category i and Ebert will classify the same film as category j. For example, π13 means that Ebert gave “two thumbs up” and siskel “thumbs down.” The weighted Kappa coefficient is 0.57 and the asymptotic confidence interval is 95% (0.44, 0.70). . . .