05 Sharing and Using Trained AI/Models
Florian Berding, Julia Pargmann, Andreas Slopinski, Elisabeth Riebenbauer, Karin Rebmann
Source:vignettes/sharing_and_publishing.Rmd
sharing_and_publishing.Rmd
1 Introduction
In the educational and social sciences, it is common practice to share research instruments such as questionnaires or tests. For example, the Open Test Archive provides access to a large number of open access instruments. aifeducation assumes AI-based classifiers should be shareable, similarly to research instruments, to empower educational and social science researchers and to support the application of AI for educational purposes. Thus, aifeducation aims to make the sharing process as convenient as possible.
For this aim, every model generated with aifeducation can be prepared for publication in a few basic steps. In this vignette, we would like to show you how to make your AI ready for publication and how to use models from others. How to save, load, and apply these models is described in 02 Using the graphical user interface Aifeducation - Studio and 03 Using R syntax.
2 Creating Documentation with AI for Education - Studio
The most convenience way to document your work is to use AI for
Education - Studio. In a first step you have to start the user
interface by calling start_aifeducation_studio
:
The next steps depend on the model you would like to document. In
case you would like to document a TextEmbeddingModel,
you
first have to select TextEmbeddingModels at the top of the app.
In the next window, please select the tab Document. Now you can
load the model you would like to document by clicking on the button
“Choose a Model”.
After loading the model you can see different tabs for different parts of the documentation. The first tab is Developer. Here you can add the names and e-mail addresses of the model’s developers. In addition, you can add a suggested citation and a URL as a link to relevant sites of the model.
With help of the other tabs you can write an abstract in English, an abstract in the native language of the model such as French or German, and you can add a detailed description in both languages. The tab for the abstracts allows you to set keywords for your work, which help others to find your work through search engines.
Your documentation goes into the corresponding text field on the left side within a tab. If you click on the button “Preview”, you will see a preview of the documentation on the right side. When you finished documenting a part of your model, please click on the “Save” button to save your changes.
The last tab allows you to set the license for your model and for your documentation.
The documentation for all other models such as
TEFeautreExtractor
s, TEClassifierRegular
, and
TEClassifierProtoNet
works exactly the same. The only
difference is that you have to select the corresponding tab at the top
of the app.
For TextEmbeddingModels
there is an additional tab
called “Modifiers”. This tab is relevant if you do not develop your own
base model but rather modify a base model created by other. Such a
modification can be an adaption to specific tasks or specific domains.
In this case we recommend to add the people who developed the base model
via the tab “Developers” and your research group via the tab
“Modifiers”.
3 Creating Documentation with R Syntax
The process for documenting a model is similar for all models in
aifeducation, since all models use the same methods. Here, we
will illustrate the process for a TextEmbeddingModel
.
First, every model needs a clear description of how it was developed,
modified and how it can be used. You can add a description via the
method set_model_description
.
example_model$set_model_description(
eng = NULL,
native = NULL,
abstract_eng = NULL,
abstract_native = NULL,
keywords_eng = NULL,
keywords_native = NULL
)
This method allows you to provide a description in English and in the native language of your model to make the distribution of your model easier.
With abstract_eng
and abstract_native
you
can provide a summary of your description. This is very important if you
would like to share your work on a repository. With
keywords_eng
and keywords_native
you can set a
vector of keywords, which helps others to find your work through search
engines.
You can access a model’s description by using the method
get_model_description
example_model$get_model_description()
Besides a description of your work, it is necessary to provide
information on other people who were involved in creating the model.
This can be done with the method set_publication_info
.
example_model$set_publication_info(
type,
authors,
citation,
url = NULL
)
First of all, you have to decide the type of information you would
like to add. You have two choices: “developer”, and “modifier”, which
you set with type
.
type="developer"
stores all information about the people involved in the process of developing the model. If you use a transformer model from Hugging Face, the contributors and their description of the model should be entered as developers. In all other cases you can use this type for providing a description of how you developed the model.In some cases you might wish to modify an existing model. This might be the case if you use a transformer model and you adapt the model to a specific domain or task. In this case you rely on the work of other people and modify their work. As such, you can describe your modifications by setting
type=modifier
.
For every type of contributor you can add the relevant individuals
via authors
. Please use the R function
personList()
for this. With citation
you can
provide a short text on how to cite the work of the different
contributors. With url
you can provide a link to relevant
sites of the model.
You can access the information by using
get_publication_info
.
example_model$get_publication_info()
Finally, you must provide a license for using your model. This can be
done with set_model_license
and
get_model_license
.
example_model$set_model_license("GPL-3")
The documentation of your work is not part of the software. Here you
can set another as for your software. You can set the license for your
documentation by using the method
set_documentation_license
.
example_model$set_documentation_license("CC BY-SA")
Now you are able to share your work. Please remember to save your now fully described object as described in the vignette 03 Using R syntax.
The documentation process is the same for all other models such as
TEFeautreExtractor
s, TEClassifierRegular
, and
TEClassifierProtoNet
. There is only one difference. For
TextEmbeddingModels
you can differentiate between
“developers” and “modifiers”. This is not possible for the other models.
For these models you do not need the argument type
. Calling
this method would look like:
example_model$set_publication_info(
authors,
citation,
url = NULL
)
4 Content and Style of a Documentation
The necessary structure and content of a documentation depends on the kind of model you would like to document, national laws (such as the European AI Act), and the research standards of a discipline.
From a scientific point of view, we recommend that every model has an abstract, keywords, and a detailed description in English. An additional abstract, keywords, and a description in the native language of the model may be helpful for reaching a broad audience in the corresponding language community.
You can write your abstracts and descriptions in HTML and R Markdown which allows you to add links to other sources or publications, to add tables or to highlight important aspects of your model.
For all models we recommend that your description answers at least the following questions:
- Which kind of data was used to create the model?
- How much data was used to create the model?
- Which steps were performed and which method was used?
- For which kinds of tasks or materials can the model be used?
This kind of information is necessary for others to form an opinion about the model.
In the case of classifiers, we recommend to add some further descriptions:
- A short reference to the theoretical models that guided the development.
- A clear and detailed description of every single category/class.
- A short statement where the classifier can be used.
- A description of the kind and quantity of data used for training.
- Information on potential bias in the data.
- If possible, information about the inter-coder-reliability of the coding process of the data.
- If possible, provide a link to the corresponding text embedding model or at least state where potential users can get the text embedding model.
The statement where to get the text embedding model is important since a classifier can only be used with the corresponding text embedding model.
Please do not report the performance values of your classifier in the description.** These are displayed automatically in AI for Education - Studio or can be accessed directly via
example_classifier$reliability$test_metric_mean
.
Please consider this native language example for a classifier: