Kappa is similar to a correlation coefficient, as it can`t exceed 1.0 or -1.0. Because it is used as a measure of compliance, only positive values are expected in most situations; Negative values would indicate a systematic disagreement. Kappa can only reach very high values if the two matches are good and the target condition rate is close to 50% (because it incorporates the base rate in the calculation of joint probabilities). Several authorities have proposed “thumb rules” to interpret the degree of the agreement, many of which coincide at the center, although the words are not identical.     Smith PWF, Forster JJ, McDonald JW (1996) Monte Carlo exact tests for square contingency tables. J. R. Figurant. ploughshare. 159 (2): 309-321 Kappa is a way to measure compliance or reliability and correct the number of times ratings could be granted. Cohens Kappa, who works for two councillors, and Fleiss` Kappa, an adaptation that works for any fixed number of councillors, improve the common likelihood that they would take into account the amount of agreement that could be expected by chance. The original versions suffered from the same problem as the probability of joints, as they treat the data as nominal and assume that the evaluations have no natural nature; if the data does have a rank (ordinal measurement value), this information is not fully taken into account in the measurements.
Pearson`s “R-Displaystyle,” Kendall format or Spearman`s “Displaystyle” can measure the pair correlation between advisors using an orderly scale. Pearson believes that the scale of evaluation is continuous; Kendall and Spearman`s statistics only assume it`s ordinal. If more than two clicks are observed, an average match level for the group can be calculated as the average value of the R-Displaystyle r values, or “Displaystyle” of any pair of debtors. Rapallo, F. Algebra exact conclusion for models agree to miss. Statistical Methods – Applications 14, 45-66 (2005). doi.org/10.1007/BF02511574 Cohen J (1960) A coefficient of agreement for nominal scales. Pedagogical and Psychological Measure 20:37-46 If counsellors tend to accept, the differences between the evaluators` observations will be close to zero. If one advisor is generally higher or lower than the other by a consistent amount, the distortion differs from zero. If advisors tend to disagree, but without a consistent model of one assessment above each other, the average will be close to zero.
Confidence limits (generally 95%) It is possible to calculate for bias and for each of the limits of the agreement. If the number of categories used is small (z.B. 2 or 3), the probability of 2 advisors agreeing by pure coincidence increases considerably. This is because the two advisors must limit themselves to the limited number of options available, which affects the overall agreement rate, not necessarily their propensity to enter into an “intrinsic” agreement (an agreement is considered “intrinsic” if not due to chance). Another approach to concordance (useful when there are only two advisors and the scale is continuous) is to calculate the differences between the observations of the two advisors. The average of these differences is called Bias and the reference interval (average ± 1.96 × standard deviation) is called the compliance limit. The limitations of the agreement provide an overview of how random variations can influence evaluations. There are a number of statistics that can be used to determine the reliability of interramas.
Different statistics are adapted to different types of measurement. Some options are the common probability of an agreement, Cohens Kappa, Scott`s pi and the Fleiss`Kappa associated with it, inter-rate correlation, correlation coefficient, intra-class correlation and Krippendorff alpha.