Skip to contents

1 Introduction

In the educational and social sciences, it is common practice to share research instruments such as questionnaires or tests. For example, the Open Test Archive provides access to a large number of open access instruments. aifeducation assumes AI-based classifiers should be shareable, similarly to research instruments, to empower educational and social science researchers and to support the application of AI for educational purposes. Thus, aifeducation aims to make the sharing process as convenient as possible.

For this aim, every model generated with aifeducation can be prepared for publication in a few basic steps. In this vignette, we would like to show you how to make your AI ready for publication.

A explanation how to document a model with AI for Education - Studio can be found in the vignette 02 Using the graphical user interface Aifeducation - Studio. In this vignette we describe how to document a model with R syntax and provide some hints for the content of documentation.

2 Creating Documentation with R Syntax

The process for documenting a model with R syntax is similar for all models in aifeducation, since all models use the same methods. Here, we will illustrate the process for a TextEmbeddingModel.

First, every model needs a clear description of how it was developed, modified and how it can be used. You can add a description via the method set_model_description.

example_model$set_model_description(
  eng = NULL,
  native = NULL,
  abstract_eng = NULL,
  abstract_native = NULL,
  keywords_eng = NULL,
  keywords_native = NULL
)

This method allows you to provide a description in English and in the native language of your model to make the distribution of your model easier.

With abstract_eng and abstract_native you can provide a summary of your description. This is very important if you would like to share your work on a repository. With keywords_eng and keywords_native you can set a vector of keywords, which helps others to find your work through search engines.

You can access a model’s description by using the method get_model_description

example_model$get_model_description()

Besides a description of your work, it is necessary to provide information on other people who were involved in creating the model. This can be done with the method set_publication_info.

example_model$set_publication_info(
  type,
  authors,
  citation,
  url = NULL
)

First of all, you have to decide the type of information you would like to add. You have two choices: “developer” and “modifier”, which you set with type.

  • type="developer" stores all information about the people involved in the process of developing the model. If you use a transformer model from Hugging Face, the contributors and their description of the model should be entered as developers. In all other cases you can use this type for providing a description of how you developed the model.

  • In some cases you might wish to modify an existing model. This might be the case if you use a transformer model and you adapt the model to a specific domain or task. In this case you rely on the work of other people and modify their work. As such, you can describe your modifications by setting type="modifier".

For every type of contributor you can add the relevant individuals via authors. Please use the R function personList() for this. With citation you can provide a short text on how to cite the work of the different contributors. With url you can provide a link to relevant sites of the model.

You can access the information by using get_publication_info.

example_model$get_publication_info()

Finally, you must provide a license for using your model. This can be done with set_model_license and get_model_license.

example_model$set_model_license("GPL-3")

The documentation of your work is not part of the software. Here you can set another license as for your software. You can set the license for your documentation by using the method set_documentation_license.

example_model$set_documentation_license("CC BY-SA")

Now you are able to share your work. Please remember to save your now fully described object as described in the vignette 03 Using R syntax.

The documentation process is the same for all models. There is only one difference. For TextEmbeddingModelsyou can differentiate between “developers” and “modifiers”. This is not possible for the other models. For these models you do not need the argument type. Calling this method would look like:

example_model$set_publication_info(
  authors,
  citation,
  url = NULL
)

4 Content and Style of a Documentation

The necessary structure and content of a documentation depends on the kind of model you would like to document, national laws (such as the European AI Act), and the research standards of a discipline.

From a scientific point of view, we recommend that every model has an abstract, keywords, and a detailed description in English. An additional abstract, keywords, and a description in the native language of the model may be helpful for reaching a broad audience in the corresponding language community.

You can write your abstracts and descriptions in HTML and R Markdown which allows you to add links to other sources or publications, to add tables or to highlight important aspects of your model.

For all models we recommend that your description answers at least the following questions:

  • Which kind of data was used to create the model?
  • How much data was used to create the model?
  • Which steps were performed and which method was used?
  • For which kinds of tasks or materials can the model be used?

This kind of information is necessary for others to form an opinion about the model.

In the case of classifiers, we recommend to add some further descriptions:

  • A short reference to the theoretical models that guided the development.
  • A clear and detailed description of every single category/class.
  • A short statement where the classifier can be used.
  • A description of the kind and quantity of data used for training.
  • Information on potential bias in the data.
  • If possible, information about the inter-coder-reliability of the coding process of the training data.
  • If possible, provide a link to the corresponding text embedding model or at least state where potential users can get the text embedding model.

The statement where to get the text embedding model is important since a classifier can only be used with the corresponding text embedding model.

Please do not report the performance values of your classifier in the description. These are displayed automatically in AI for Education - Studio or can be accessed directly via example_classifier$reliability$test_metric_mean.