Skip to contents

Important Note

Please note that Iota1 is outdated. Please use the new version Iota2. Central definitions of Iota have changed in the new version. Please refer to Get started

This vignette is not updated to future developments.

1 Introduction

Reliability is a central characteristic of any assessment instrument, and describes the extent to which the instrument produces error-free data (Schreier, 2012). In terms of content analysis, Krippendorff (2019) suggests replicability as a fundamental reliability concept, which is also referred to as inter-coder reliability. This describes the degree to which “a process can be reproduced by different analysts, working under varying conditions, at different locations, or using different but functionally equivalent measuring instruments” (Krippendorff, 2019).

The package iotarelr provides an environment for estimating the degree of inter-coder reliability based on the Iota Reliability Concept developed by Berding et al. (2022). The concept provides one of the first measures for characterizing the degree of reliability for a complete scale and for every single category. Most of the older measures are limited to information on the complete scale.

The suggested concept is applicable to any kind of content analysis that uses a coding scheme with nominal or ordinal data regardless the kind of coders (human or artificial intelligence), the number of coders, and the number of categories. The following introduction shows how to use Iota1.

2 Example for using iotarelr in practice

2.1 Estimating the values

At the beginning, data generated by at least two coders is needed. Let us assume that four coders analyzed 20 textual fragments with a coding scheme consisting of three categories A, B, and C.

library(iotarelr)
#> Loading required package: ggplot2
#> Warning: package 'ggplot2' was built under R version 4.1.3
#> Loading required package: ggalluvial
#> Warning: package 'ggalluvial' was built under R version 4.1.3
coder_1<-c("A","A","C","C","B","A","A","B","C","A","B","C","A","B","A","B","C","A","A","C")
coder_2<-c("A","A","C","B","B","A","A","B","C","A","A","C","A","C","A","B","A","A","A","C")
coder_3<-c("A","A","C","C","B","A","A","B","C","A","B","C","A","B","A","B","C","A","A","C")
coder_4<-c("A","C","C","C","B","A","A","B","C","A","B","A","A","B","A","C","B","A","A","C")

coded_data<-cbind(coder_1,coder_2,coder_3,coder_4)

In this example, the characteristics are saved as characters. The package also supports that the categories are stored as integers. The only important aspect is that the rows must contain the coding units (e.g., the textual fragments), and that the columns represent the different coders. In the next step, the estimation of iota and its elements starts via the function compute_iota1().

results<-compute_iota1(data=coded_data)

2.2 The alpha values

results$alpha
#>              A        B         C
#> [1,] 0.5647321 0.140625 0.2020089

The component $alpha saves the values for the chance-corrected alpha reliabilities. These values describe the extend in which the true characteristic of a coding unit is discovered. It ranges from 0 to 1. 0 indicates the absence of reliability. That is, the assignment of the true category equals a random selection between the categories. 1 indicates that the true value is always recovered.

In the current example, the chance-corrected alpha value for category A is relatively high. This means that the coding scheme leads coders to assign characteristic A to a coding unit if the true characteristic of the coding unit is A with a high probability. In contrast, the values for the other two categories are very low. This indicates that a coding unit with the true characteristic B or C is often assigned to the other categories. That is, the coding scheme does not ensure that coders discover the true category if the true category is B or C.

2.3 The beta values

To provide a more detailed insight of the coding scheme, the beta values account for the errors which occur in the case that the true category is not discovered. That is, the beta values describe the extent to which a category is influenced by errors made in other categories. For example, if the true category of a coding unit is A and a coder does not assign A to that coding unit, the data for category B and C are biased. The data representing categories B and C comprises a coding unit that should not be part of the data.

The chance-corrected values for the beta reliabilites are stored in $beta and range between 0 and 1. 0 indicates that the beta reliability equals the beta reliability in the case of complete guessing. 1 indicate the absence of any beta errors.

results$beta
#>             A         B         C
#> [1,] 0.746875 0.6835938 0.6203125

In the current example, the chance-corrected beta reliabilities are relatively high. This means that the different categories are not strongly influenced by errors in the other categories.

2.4 The iota values

Iota values summarize the different types of errors for each category by averaging the chance-corrected reliabilities. They are stored in $iota.

results$iota
#>              A         B         C
#> [1,] 0.6558036 0.4121094 0.4111607

Iota can range between 0 and 1. 0 indicates that the quality of the coding of a category equals random guessing. Codings of this category are not reliable. 1 indicates a perfect reliability of a category. That is, the true value of a coding unit is recovered if the true category is the category under investigation and errors made by coding coding units of other categories do note influence the data of the category under investigation. In the current example category A is quite reliable while the other categories are not.

2.5 Assignment-Error Matrix (AEM)

The assignment-error matrix combines the alpha and beta values and provides the most detailed description of a coding scheme. It is based on the raw estimates without any chance-correction.

results$assignment_error_matrix
#>           A         B         C
#> A 0.4285714 0.4545455 0.5454545
#> B 0.4000000 0.8461538 0.6000000
#> C 0.4444444 0.5555556 0.7857143

The AEM has to be read row by row because the rows represent the true category of a coding unit and the columns represent the assigned categories. The values on the diagonal represent the alpha-error of the categories. That is the probability not to assign the true category to a coding unit. The other cells describe the probability to assign the category to the other categories. That is, they inform about the probability to choose a category under the condition that the true category is not recovered.

In the example, the alpha-error of category A is about .43 meaning that a coding unit of category A is coded as another category in about 43 %. Or in other words: The probability to assign the true category to a coding unit of category A is about 57%. Thus, the second and third cell in the first row mean: If the true category of a coding unit belonging to category A is not recovered, about 45% of the codings are assigned as category B and 55% of the codings are assigned to category C. Thus, category C suffers more from errors made with coding units truly belonging to category A than category B.

The alpha-error of category B is about .85. Thus in about 85% of cases the coding units truly belonging to category B are assigned to another category. In other words: The probability to recover the true category is about 15 % if the true category of the coding unit is B. In the case that this error occurs, 40% of the cases are assigned as category A and 60 % are assigned to category C. Thus, the data of category C suffers more from errors made on coding units belonging truly to category B than category A.

The alpha-error of category C is about 79%. Thus, in about 79% of the cases a coding unit truly belonging to category C is assigned to category A or B. If this error occurs 44 % of the codings units are treated as category A and 56% are treated as category B. In consequence, category B suffers more form errors made with codings units of category C than category A. In other words: The data for category B is more strongly biased by errors of category C than the data of category A.

2.6 Scale level

The measures described above provide detailed insights into the reliability of every single category which is a new feature for content analysis and very helpful for constructing a coding scheme or for evaluating data of empirical studies. In many applications however, the values have to be summarized to values representing the quality of the complete coding scheme. In the Iota Concept this is done by averaging the iota values for each category. The value is accessible by $average_iota.

results$average_iota
#> [1] 0.4930246

In the current example average iota is about .49. At the moment only a rule of thumb for ordinal data exist. According to Berding et al. (2022) an average iota of at least .474 is necessary for an acceptable level of reliability on the scale level. For a “good” reliability average iota should be at least .601.

References

  • Berding, F., Elisabeth r., Simone S., Jahncke, H., Slopinski, A., and Rebmann, K. (2022). Performance and Configuration of Artificial Intelligence in Educational Settings. Introducing a New Reliability Concept Based on Content Analysis. Frontiers in Education. https://doi.org/10.3389/feduc.2022.818365
  • Krippendorff, K. (2019). Content Analysis: An Introduction to Its Methodology
  • Schreier, M. (2012). Qualitative Content Analysis in Practice. SAGE.