
05 Sharing and Using Trained AI/Models
Florian Berding, Julia Pargmann, Andreas Slopinski, Elisabeth Riebenbauer, Karin Rebmann
Source:vignettes/sharing_and_publishing.Rmd
sharing_and_publishing.Rmd
1 Introduction
In the educational and social sciences, it is common practice to share research instruments such as questionnaires or tests. For example, the Open Test Archive provides access to a large number of open access instruments. aifeducation assumes AI-based classifiers should be shareable, similarly to research instruments, to empower educational and social science researchers and to support the application of AI for educational purposes. Thus, aifeducation aims to make the sharing process as convenient as possible.
For this aim, every model generated with aifeducation can be prepared for publication in a few basic steps. In this vignette, we would like to show you how to make your AI ready for publication.
A explanation how to document a model with AI for Education - Studio can be found in the vignette 02 Using the graphical user interface Aifeducation - Studio. In this vignette we describe how to document a model with R syntax and provide some hints for the content of documentation.
2 Creating Documentation with R Syntax
The process for documenting a model with R syntax is similar
for all models in aifeducation, since all models use the same
methods. Here, we will illustrate the process for a
TextEmbeddingModel
.
First, every model needs a clear description of how it was developed,
modified and how it can be used. You can add a description via the
method set_model_description
.
example_model$set_model_description(
eng = NULL,
native = NULL,
abstract_eng = NULL,
abstract_native = NULL,
keywords_eng = NULL,
keywords_native = NULL
)
This method allows you to provide a description in English and in the native language of your model to make the distribution of your model easier.
With abstract_eng
and abstract_native
you
can provide a summary of your description. This is very important if you
would like to share your work on a repository. With
keywords_eng
and keywords_native
you can set a
vector of keywords, which helps others to find your work through search
engines.
You can access a model’s description by using the method
get_model_description
example_model$get_model_description()
Besides a description of your work, it is necessary to provide
information on other people who were involved in creating the model.
This can be done with the method set_publication_info
.
example_model$set_publication_info(
type,
authors,
citation,
url = NULL
)
First of all, you have to decide the type of information you would
like to add. You have two choices: “developer” and “modifier”, which you
set with type
.
type="developer"
stores all information about the people involved in the process of developing the model. If you use a transformer model from Hugging Face, the contributors and their description of the model should be entered as developers. In all other cases you can use this type for providing a description of how you developed the model.In some cases you might wish to modify an existing model. This might be the case if you use a transformer model and you adapt the model to a specific domain or task. In this case you rely on the work of other people and modify their work. As such, you can describe your modifications by setting
type="modifier"
.
For every type of contributor you can add the relevant individuals
via authors
. Please use the R function
personList()
for this. With citation
you can
provide a short text on how to cite the work of the different
contributors. With url
you can provide a link to relevant
sites of the model.
You can access the information by using
get_publication_info
.
example_model$get_publication_info()
Finally, you must provide a license for using your model. This can be
done with set_model_license
and
get_model_license
.
example_model$set_model_license("GPL-3")
The documentation of your work is not part of the software. Here you
can set another license as for your software. You can set the license
for your documentation by using the method
set_documentation_license
.
example_model$set_documentation_license("CC BY-SA")
Now you are able to share your work. Please remember to save your now fully described object as described in the vignette 03 Using R syntax.
The documentation process is the same for all models. There is only
one difference. For TextEmbeddingModels
you can
differentiate between “developers” and “modifiers”. This is not possible
for the other models. For these models you do not need the argument
type
. Calling this method would look like:
example_model$set_publication_info(
authors,
citation,
url = NULL
)
4 Content and Style of a Documentation
The necessary structure and content of a documentation depends on the kind of model you would like to document, national laws (such as the European AI Act), and the research standards of a discipline.
From a scientific point of view, we recommend that every model has an abstract, keywords, and a detailed description in English. An additional abstract, keywords, and a description in the native language of the model may be helpful for reaching a broad audience in the corresponding language community.
You can write your abstracts and descriptions in HTML and R Markdown which allows you to add links to other sources or publications, to add tables or to highlight important aspects of your model.
For all models we recommend that your description answers at least the following questions:
- Which kind of data was used to create the model?
- How much data was used to create the model?
- Which steps were performed and which method was used?
- For which kinds of tasks or materials can the model be used?
This kind of information is necessary for others to form an opinion about the model.
In the case of classifiers, we recommend to add some further descriptions:
- A short reference to the theoretical models that guided the development.
- A clear and detailed description of every single category/class.
- A short statement where the classifier can be used.
- A description of the kind and quantity of data used for training.
- Information on potential bias in the data.
- If possible, information about the inter-coder-reliability of the coding process of the training data.
- If possible, provide a link to the corresponding text embedding model or at least state where potential users can get the text embedding model.
The statement where to get the text embedding model is important since a classifier can only be used with the corresponding text embedding model.
Please do not report the performance values of your classifier in the description. These are displayed automatically in AI for Education - Studio or can be accessed directly via
example_classifier$reliability$test_metric_mean
.