01 Transformers
Yuliia Tykhonova, Florian Berding
Source:vignettes/transformers.Rmd
transformers.Rmd
1 Base Transformer Class
1.1 Overview
See Base Transformer Class Documentation for details.
UML-diagram of Base Transformer Class:
This class has:
private attributes:
title
,sustainability_tracker
,steps_for_creation
andsteps_for_training
(see 1.2 Private attributes).public attributes:
params
andtemp
lists (see 1.3 Public attributes).-
private methods:
clear_variables
method sets theparams
,temp
andsustainability_tracker
attributes toNULL
.check_required_SFC
andcheck_required_SFT
methods check the required steps (functions) in SFC and SFT lists forNULL
respectively. Steps set toNULL
by default and must be functions.init_common_model_params
method containsset_model_params
method calls to set common model’s parameters (listed as ‘static’ parameters in p. 1.3 Public attributes ->param
list).define_required_SFC_functions
anddefine_required_SFT_functions
methods contains the definitions for the required creation/training steps (functions) respectively.
public methods:
initialize
, setters fortitle
attribute,params
andtemp
lists elements, steps for creation (SFC) and training (STF) functions, maincreate
andtrain
user-methods. See documentation for details.
1.2 Private attributes
Developers should know the purpose of all private attributes.
Attribute title
string
The title for a transformer. This title is
displayed in the progress bar. By default title is set to
"Transformer Model"
.
Can be set with set_title()
method (for example when
implementing a new child-transformer in
initialize
-method):
.AIFECustomTransformer <- R6::R6Class(
classname = ".AIFECustomTransformer",
inherit = .AIFEBaseTransformer,
public = list(
initialize = function() {
super$set_title("Custom Transformer") # (1)
}
)
)
Use super
to access public methods of the base class
(1).
Attribute sustainability_tracker
codecarbon.OfflineEmissionsTracker
object is used to
track sustainability on demand. It can be created with the private
create_sustain_tracker()
method.
Attribute steps_for_creation
list()
that stores required and optional steps
(functions) for creating a new transformer.
To access (input) parameters of the transformer, use the
params
list (e.g. params$ml_framework
). To
access a local variable outside of a function, put it in the
temp
list.
Use the set_SFC_*()
methods to set required/optional
steps for creation, where * is the name of the step.
Use the set_required_SFC()
method to set all required
steps at once.
Required
The required steps to be defined in each child transformer are:
-
create_tokenizer_draft
:function()
that creates a tokenizer draft.In this function a new tokenizer must be created and stored as an element of a list
temp
(e.g.temp$tok_new
). This function can include the definition of special tokens and/or trainers (tokenizers.trainers.WordPieceTrainer
).See the
create_WordPiece_tokenizer()
andcreate_ByteLevelBPE_tokenizer()
functions to create a new tokenizer object (tokenizers.Tokenizer
) based on thetokenizers.models.WordPiece
andtokenizers.models.ByteLevel
models respectively. -
calculate_vocab
:function()
for calculating a vocabulary.The tokenizer created in the
create_tokenizer_draft()
function is trained.See
tokenizers.Tokenizer.train_from_iterator()
for details. -
save_tokenizer_draft
:function()
that saves a tokenizer draft to a model directory (e.g. to avocab.txt
file).See
tokenizers.Tokenizer.save_model()
for details. -
create_final_tokenizer
:function()
that creates a new transformer tokenizer object.The tokenizer must be stored in the
tokenizer
parameter of thetemp
list.See
transformers.PreTrainedTokenizerFast
,transformers.LongformerTokenizerFast
andtransformers.RobertaTokenizerFast
for details. -
create_transformer_model
:function()
that creates a transformer model.The model must be passed to the
model
parameter of thetemp
list.See
transformers.(TF)BertModel
,transformers.(TF)DebertaV2ForMaskedLM
,transformers.(TF)FunnelModel
,transformers.(TF)LongformerModel
,transformers.(TF)RobertaModel
, etc. for details.
Attribute steps_for_training
list()
that stores required and optional steps
(functions) for training the transformer.
To access (input) parameters of the transformer, use the
params
list (e.g. params$ml_framework
). To
access a local variable outside of a function, put it in the
temp
list.
Use the set_SFT_*()
methods to set required/optional
steps for creation, where * is the name of the step.
Required
The required step in each child transformer is:
-
load_existing_model
:function()
that loads the model and its tokenizer.The model and the transformer must be stored to the
model
andtokenizer
parameters respectively of thetemp
list.See
transformers.(TF)PreTrainedModel
for details.
Optional
Optional step is:
-
cuda_empty_cache
:function()
to empty the cache iftorch.cuda
is available.
Other
Required and already defined steps are:
-
check_chunk_size
:function()
that checks transformer’s parameterchunk_size
and adjusts it.Uses the
model
parameter of thetemp
list and modifies thechunk_size
parameter of theparams
list. -
create_chunks_for_training
:function()
that creates chunks of the sequenses for the trainining.Uses the
tokenizer
parameter and addstokenized_dataset
parameter to thetemp
list. -
prepare_train_tune
:function()
that prepares the data for the training.For
tensorflow
: uses themodel
andtokenizer
parameters, adds thetf_train_dataset
,tf_test_dataset
,callbacks
parameters to thetemp
list.For
pytorch
: uses themodel
,tokenizer
parameters, adds thetrainer
parameter to thetemp
list. -
start_training
:function()
that starts the training.For
tensorflow
: uses themodel
,tf_train_dataset
,tf_test_dataset
,callbacks
parameters of thetemp
list.For
pytorch
: uses thetrainer
parameter of thetemp
list. -
save_model
:function()
that saves the model.For
tensorflow
: uses themodel
parameter of thetemp
list.For
pytorch
: uses themodel
,pt_safe_save
andtrainer
parameters of thetemp
list.
Required and already defined step, but can be overwritten with a custom version:
-
create_data_collator
:function()
that creates the data collator for the model.From the
temp
list uses thetokenizer
andreturn_tensors
parameters, addsdata_collator
parameter to this list.From the
params
list useswhole_word
andp_mask
.
1.3 Public attributes
params
list
A list containing all the transformer’s parameters (‘static’,
‘dynamic’ and ‘dependent’ parameters). Can be set with
set_model_param()
.
‘Static’ parameters
Regardless of the transformer, the following parameters are always included:
ml_framework
text_dataset
sustain_track
sustain_iso_code
sustain_region
sustain_interval
trace
pytorch_safetensors
log_dir
log_write_interval
‘Dynamic’ parameters
In the case of create it also contains (see
create
-method for details):
model_dir
vocab_size
max_position_embeddings
hidden_size
hidden_act
hidden_dropout_prob
attention_probs_dropout_prob
intermediate_size
num_attention_heads
In the case of train it also contains (see
train
-method for details):
output_dir
model_dir_path
p_mask
whole_word
val_size
n_epoch
batch_size
chunk_size
min_seq_len
full_sequences_only
learning_rate
n_workers
multi_process
keras_trace
pytorch_trace
temp
list
A list containing temporary transformer’s parameters
list()
containing all the temporary local variables that
need to be accessed between the step functions. Can be set with
set_model_temp()
.
For example, it can be a variable tok_new
that stores
the tokenizer from
steps_for_creation$create_tokenizer_draft
. To train the
tokenizer, access the variable tok_new
in
steps_for_creation$calculate_vocab
through the
temp
list of this class.
2 Allowed Transformers
2.1 Overview
UML-diagram of Transformer Classes:
The Base Transformer Class has the following child-classes:
.AIFEBertTransformer
.AIFEDebertaTransformer
.AIFEFunnelTransformer
.AIFELongformerTransformer
.AIFERobertaTransformer
.AIFEMpnetTransformer
The object of the Base Transformer Class cannot be created, thus the
create
and train
methods cannot be called
directly. However objects of the child classes can be created with
new
-method (1). Use create
(2) and train
(3) methods
to create/train the concrete transformer respectively:
# Example of using .AIFEBertTransformer class
# For the other one - analogically
bert_transformer <- .AIFEBertTransformer$new() # (1)
# See .AIFEBertTransformer documentation to get input parameters
# for create and train methods instead of ...
bert_transformer$create(...) # (2)
bert_transformer$train(...) # (3)
2.2 Transformer Parameters
‘Static’ parameters
Name | Type | Description |
ml_framework | string |
Framework to use for training and inference1 |
text_dataset | object |
Object of the class LargeDataSetForText
|
sustain_track | bool |
If TRUE energy consumption is tracked during
training2
|
sustain_iso_code | string |
ISO code (Alpha-3-Code) for the country3 |
sustain_region | string |
Region within a country. Only available for USA and Canada4 |
sustain_interval | integer |
Interval in seconds for measuring power usage |
trace | bool |
TRUE if information about the progress should be
printed to the console |
pytorch_safetensors | bool |
Choose between safetensors and standard pytorch format5 |
log_dir | string |
Path to the directory where the log files should be saved |
log_write_interval | integer |
Time in seconds for updating the log files6 |
1 Available frameworks are “tensorflow” and “pytorch”
2 Via the python library codecarbon
3 This variable must be set if sustainability should be tracked. A list can be found on Wikipedia: https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes
4 See the documentation of codecarbon for more information https://mlco2.github.io/codecarbon/parameters.html
5 Only relevant for pytorch models. TRUE
: a ‘pytorch’ model is saved in safetensors format; FALSE
(or ‘safetensors’ is not available): model is saved in the standard pytorch format (.bin)
6 Only relevant if log_dir
is not NULL
‘Dynamic’ parameters for creation
Name | Type | Description |
model_dir | string |
Path to the directory where the model should be saved |
vocab_size | int |
Size of the vocabulary |
max_position_embeddings | int |
Number of maximum position embeddings1 |
hidden_size | int |
Number of neurons in each layer2 |
hidden_act | string |
Name of the activation function |
hidden_dropout_prob | double |
Ratio of dropout |
attention_probs_dropout_prob | double |
Ratio of dropout for attention probabilities |
intermediate_size | int |
Number of neurons in the intermediate layer of the attention mechanism |
num_attention_heads | int |
Number of attention heads |
vocab_do_lower_case | bool |
TRUE if all words/tokens should be lower case |
num_hidden_layer | int |
Number of hidden layers |
target_hidden_size | int |
Number of neurons in the final layer2 |
block_sizes | vector |
Contains int s that determine the number and sizes of
each block |
num_decoder_layers | int |
Number of decoding layers |
activation_dropout | float |
Dropout probability between the layers of the feed-forward blocks |
pooling_type | string |
Type of pooling3 |
add_prefix_space | bool |
TRUE if an additional space should be inserted to the
leading words |
trim_offsets | bool |
TRUE trims the whitespaces from the produced
offsets |
attention_window | int |
Size of the window around each token for attention mechanism in every layer |
1 This parameter also determines the maximum length of a sequence which can be processed with the model
2 This parameter determines the dimensionality of the resulting text embedding
3 "mean"
and "max"
for pooling with mean and maximum values respectively
‘Dynamic’ parameters for training
Name | Type | Description |
output_dir | string |
Path to the directory where the final model should be saved1 |
model_dir_path | string |
Path to the directory where the original model is stored |
p_mask | double |
Ratio that determines the number of words/tokens used for masking |
whole_word | bool |
Choose a type of masking2 |
val_size | double |
Ratio that determines the amount of token chunks used for validation |
n_epoch | int |
Number of epochs for training |
batch_size | int |
Size of batches |
chunk_size | int |
Size of every chunk for training |
min_seq_len | int |
Value determines the minimal sequence length included in training process3 |
full_sequences_only | bool |
TRUE for using only chunks with a sequence length equal
to chunk_size
|
learning_rate | double |
Learning rate for adam optimizer |
n_workers | int |
Number of workers4 |
multi_process | bool |
TRUE if multiple processes should be
activated4
|
keras_trace | int |
Controls the information about the training process from keras on the console4,5 |
pytorch_trace | int |
Controls the information about the training process from pytorch on the console |
1 If the directory does not exist, it will be created
2 TRUE
: whole word masking should be applied; FALSE
: token masking is used
3 Only relevant if full_sequences_only = FALSE
4 Only relevant if ml_framework = "tensorflow"
5 keras_trace = 0
: does not print any information; keras_trace = 1
: prints a progress bar; keras_trace = 2
: prints one line of information for every epoch
6 pytorch_trace = 0
: does not print any information; pytorch_trace = 1
: prints a progress bar
3 Transformer Maker
3.1 Overview
See Transformer Maker Class for details.
Transformer Maker Class has a make
-method to create a
Transformer object.
UML-diagram of the Transformer Maker Class:
3.2 Usage
BERT
See BERT Transformer for Developers for details.
transformer <- aife_transformer_maker$make(AIFETrType$bert)
# or
# transformer <- aife_transformer_maker$make("bert")
Create
transformer$create(ml_framework = ml_framework,
model_dir = model_dir,
text_dataset = text_dataset,
vocab_size = 30522,
vocab_do_lower_case = FALSE,
max_position_embeddings = 512,
hidden_size = 768,
num_hidden_layer = 12,
num_attention_heads = 12,
intermediate_size = 3072,
hidden_act = "gelu",
hidden_dropout_prob = 0.1,
attention_probs_dropout_prob = 0.1,
sustain_track = TRUE,
sustain_iso_code = NULL,
sustain_region = NULL,
sustain_interval = 15,
trace = TRUE,
pytorch_safetensors = TRUE,
log_dir = NULL,
log_write_interval = 2)
Train
transformer$train(ml_framework = ml_framework,
output_dir = output_dir,
model_dir_path = model_dir_path,
text_dataset = text_dataset,
p_mask = 0.15,
whole_word = TRUE,
val_size = 0.1,
n_epoch = 1,
batch_size = 12,
chunk_size = 250,
full_sequences_only = FALSE,
min_seq_len = 50,
learning_rate = 3e-3,
n_workers = 1,
multi_process = FALSE,
sustain_track = TRUE,
sustain_iso_code = NULL,
sustain_region = NULL,
sustain_interval = 15,
trace = TRUE,
keras_trace = 1,
pytorch_trace = 1,
pytorch_safetensors = TRUE,
log_dir = NULL,
log_write_interval = 2)
DeBERTa-v2
See DeBERTa-v2 Transformer for Develovers for details.
transformer <- aife_transformer_maker$make(AIFETrType$deberta_v2)
# or
# transformer <- aife_transformer_maker$make("deberta_v2")
Create
transformer$create(ml_framework = ml_framework,
model_dir = model_dir,
text_dataset = text_dataset,
vocab_size = 128100,
vocab_do_lower_case = FALSE,
max_position_embeddings = 512,
hidden_size = 1536,
num_hidden_layer = 24,
num_attention_heads = 24,
intermediate_size = 6144,
hidden_act = "gelu",
hidden_dropout_prob = 0.1,
attention_probs_dropout_prob = 0.1,
sustain_track = TRUE,
sustain_iso_code = NULL,
sustain_region = NULL,
sustain_interval = 15,
trace = TRUE,
pytorch_safetensors = TRUE,
log_dir = NULL,
log_write_interval = 2)
Train
transformer$train(ml_framework = ml_framework,
output_dir = output_dir,
model_dir_path = model_dir_path,
text_dataset = text_dataset,
p_mask = 0.15,
whole_word = TRUE,
val_size = 0.1,
n_epoch = 1,
batch_size = 12,
chunk_size = 250,
full_sequences_only = FALSE,
min_seq_len = 50,
learning_rate = 3e-2,
n_workers = 1,
multi_process = FALSE,
sustain_track = TRUE,
sustain_iso_code = NULL,
sustain_region = NULL,
sustain_interval = 15,
trace = TRUE,
keras_trace = 1,
pytorch_trace = 1,
pytorch_safetensors = TRUE,
log_dir = NULL,
log_write_interval = 2)
RoBERTa
See RoBERTa Transformer for Develovers for details.
transformer <- aife_transformer_maker$make(AIFETrType$roberta)
# or
# transformer <- aife_transformer_maker$make("roberta")
Create
transformer$create(ml_framework = ml_framework,
model_dir = model_dir,
text_dataset = text_dataset,
vocab_size = 30522,
add_prefix_space = FALSE,
trim_offsets = TRUE,
max_position_embeddings = 512,
hidden_size = 768,
num_hidden_layer = 12,
num_attention_heads = 12,
intermediate_size = 3072,
hidden_act = "gelu",
hidden_dropout_prob = 0.1,
attention_probs_dropout_prob = 0.1,
sustain_track = TRUE,
sustain_iso_code = NULL,
sustain_region = NULL,
sustain_interval = 15,
trace = TRUE,
pytorch_safetensors = TRUE,
log_dir = NULL,
log_write_interval = 2)
Train
transformer$train(ml_framework = ml_framework,
output_dir = output_dir,
model_dir_path = model_dir_path,
text_dataset = text_dataset,
p_mask = 0.15,
val_size = 0.1,
n_epoch = 1,
batch_size = 12,
chunk_size = 250,
full_sequences_only = FALSE,
min_seq_len = 50,
learning_rate = 3e-2,
n_workers = 1,
multi_process = FALSE,
sustain_track = TRUE,
sustain_iso_code = NULL,
sustain_region = NULL,
sustain_interval = 15,
trace = TRUE,
keras_trace = 1,
pytorch_trace = 1,
pytorch_safetensors = TRUE,
log_dir = NULL,
log_write_interval = 2)
Funnel
See Funnel Transformer for Develovers for details.
transformer <- aife_transformer_maker$make(AIFETrType$funnel)
# or
# transformer <- aife_transformer_maker$make("funnel")
Create
transformer$create(ml_framework = ml_framework,
model_dir = model_dir,
text_dataset = text_dataset,
vocab_size = 30522,
vocab_do_lower_case = FALSE,
max_position_embeddings = 512,
hidden_size = 768,
target_hidden_size = 64,
block_sizes = c(4, 4, 4),
num_attention_heads = 12,
intermediate_size = 3072,
num_decoder_layers = 2,
pooling_type = "mean",
hidden_act = "gelu",
hidden_dropout_prob = 0.1,
attention_probs_dropout_prob = 0.1,
activation_dropout = 0.0,
sustain_track = TRUE,
sustain_iso_code = NULL,
sustain_region = NULL,
sustain_interval = 15,
trace = TRUE,
pytorch_safetensors = TRUE,
log_dir = NULL,
log_write_interval = 2)
Train
transformer$train(ml_framework = ml_framework,
output_dir = output_dir,
model_dir_path = model_dir_path,
text_dataset = text_dataset,
p_mask = 0.15,
whole_word = TRUE,
val_size = 0.1,
n_epoch = 1,
batch_size = 12,
chunk_size = 250,
full_sequences_only = FALSE,
min_seq_len = 50,
learning_rate = 3e-3,
n_workers = 1,
multi_process = FALSE,
sustain_track = TRUE,
sustain_iso_code = NULL,
sustain_region = NULL,
sustain_interval = 15,
trace = TRUE,
keras_trace = 1,
pytorch_trace = 1,
pytorch_safetensors = TRUE,
log_dir = NULL,
log_write_interval = 2)
Longformer
See Longformer Transformer for Develovers for details.
transformer <- aife_transformer_maker$make(AIFETrType$longformer)
# or
# transformer <- aife_transformer_maker$make("longformer")
Create
transformer$create(ml_framework = ml_framework,
model_dir = model_dir,
text_dataset = text_dataset,
vocab_size = 30522,
add_prefix_space = FALSE,
trim_offsets = TRUE,
max_position_embeddings = 512,
hidden_size = 768,
num_hidden_layer = 12,
num_attention_heads = 12,
intermediate_size = 3072,
hidden_act = "gelu",
hidden_dropout_prob = 0.1,
attention_probs_dropout_prob = 0.1,
attention_window = 512,
sustain_track = TRUE,
sustain_iso_code = NULL,
sustain_region = NULL,
sustain_interval = 15,
trace = TRUE,
pytorch_safetensors = TRUE,
log_dir = NULL,
log_write_interval = 2)
Train
transformer$train(ml_framework = ml_framework,
output_dir = output_dir,
model_dir_path = model_dir_path,
text_dataset = text_dataset,
p_mask = 0.15,
val_size = 0.1,
n_epoch = 1,
batch_size = 12,
chunk_size = 250,
full_sequences_only = FALSE,
min_seq_len = 50,
learning_rate = 3e-2,
n_workers = 1,
multi_process = FALSE,
sustain_track = TRUE,
sustain_iso_code = NULL,
sustain_region = NULL,
sustain_interval = 15,
trace = TRUE,
keras_trace = 1,
pytorch_trace = 1,
pytorch_safetensors = TRUE,
log_dir = NULL,
log_write_interval = 2)
MPNet
See MPNet Transformer for Develovers for details.
transformer <- aife_transformer_maker$make(AIFETrType$mpnet)
# or
# transformer <- aife_transformer_maker$make("mpnet")
Create
transformer$create(ml_framework = ml_framework,
model_dir = model_dir,
text_dataset = text_dataset,
vocab_size = 30522,
vocab_do_lower_case = FALSE,
max_position_embeddings = 512,
hidden_size = 768,
num_hidden_layer = 12,
num_attention_heads = 12,
intermediate_size = 3072,
hidden_act = "gelu",
hidden_dropout_prob = 0.1,
attention_probs_dropout_prob = 0.1,
sustain_track = TRUE,
sustain_iso_code = NULL,
sustain_region = NULL,
sustain_interval = 15,
trace = TRUE,
pytorch_safetensors = TRUE,
log_dir = NULL,
log_write_interval = 2)
Train
transformer$train(ml_framework = ml_framework,
output_dir = output_dir,
model_dir_path = model_dir_path,
text_dataset = text_dataset,
p_mask = 0.15,
p_perm = 0.15,
whole_word = TRUE,
val_size = 0.1,
n_epoch = 1,
batch_size = 12,
chunk_size = 250,
full_sequences_only = FALSE,
min_seq_len = 50,
learning_rate = 3e-3,
n_workers = 1,
multi_process = FALSE,
sustain_track = TRUE,
sustain_iso_code = NULL,
sustain_region = NULL,
sustain_interval = 15,
trace = TRUE,
keras_trace = 1,
pytorch_trace = 1,
pytorch_safetensors = TRUE,
log_dir = NULL,
log_write_interval = 2)
4 Implement A Custom Transformer
4.1 Overview
A Custom Transformer template (“dotAIIFECustomTransformer.R”) is
located in the R
-folder of the project.
4.2 Implementation Steps
To implement a new transformer, do the following steps:
Create a new
R
-file with a name likedotAIFECustomTransformer
.-
Open the file and write the creation of a new
R6::R6Class()
inside of it (see the code below). The name of the class must be defined here (1). Remember to inherit the base transformer class (2). Use theprivate
list for the private attributes (3) liketitle
,steps_for_creation
, etc. (which will be explained later) and thepublic
list (4) for theinitialize
,create
,train
methods. -
Define the private
title
attribute (1) and set it in theinitialize
method (2) using the inheritedsuper$set_title()
base method (3) in the base class. -
Define the private
steps_for_creation
list (1) to implement the required steps (functions) (2)-(6), and (7) if needed. Do not forget to passself
as input parameter of the functions.Note that local variables created inside of functions can be used through the inherited
temp
list. Put the localtok_new
variable (8) in thetemp
list in thecreate_tokenizer_draft
step (2) and use it in thecalculate_vocab
step (3) like (9).Similarly, use the input parameters of the transformer such as
ml_framework
using the inheritedparams
list like (10).Important!
In the
create_final_tokenizer
step (5) store the tokenizer in theself$temp$tokenizer
variable (11).In the
create_transformer_model
step (6) store the transformer model in theself$temp$model
variable (12).{r} .AIFECustomTransformer <- R6::R6Class( classname = ".AIFECustomTransformer", inherit = .AIFEBaseTransformer, private = list( title = "Custom Model", steps_for_creation = list( # (1) # required create_tokenizer_draft = function(self) { # (2) # The implementation must be here # self$temp$tok_new <- ... # (8) }, calculate_vocab = function(self) { # (3) # The implementation must be here # ... self$temp$tok_new ... # (9) }, save_tokenizer_draft = function(self) { # (4) # The implementation must be here }, create_final_tokenizer = function(self) { # (5) # The implementation must be here # self$temp$tokenizer <- ... # (!!!) (11) }, create_transformer_model = function(self) { # (6) # The implementation must be here # ... self$params$ml_framework ... # (10) # self$temp$model <- ... # (!!!) (12) }, # optional: omit this element if do not needed check_max_pos_emb = function(self) { # (7) # The implementation must be here } ) ), public = list( initialize = function() { super$set_title(private$title) } ) )
-
Define the
create
method (1) with all the input parameters (2) of thecreate
method of the base class. Add all the dependent parameters of the custom transformer to the input parameters (3). Dependent parameters are parameters that depend on the transformer and are not present in the base class. Set these dependent parameters to the base class using thesuper$set_model_param()
method (4). Set required and optional steps to the base class using thesuper$set_required_SFC()
andsuper$set_SFC_check_max_pos_emb()
methods respectively (5). Finally run the basiccreate
algorithm usingsuper$create()
(6) with all the input parameters (2).{r} .AIFECustomTransformer <- R6::R6Class( classname = ".AIFECustomTransformer", inherit = .AIFEBaseTransformer, private = list( title = "Custom Model", steps_for_creation = list( # required create_tokenizer_draft = function(self) { }, calculate_vocab = function(self) { }, save_tokenizer_draft = function(self) { }, create_final_tokenizer = function(self) { }, create_transformer_model = function(self) { }, # optional: omit this element if do not needed check_max_pos_emb = function(self) { } ) ), public = list( initialize = function() { }, # (1) create = function(# (2) -------------------------- ml_framework, model_dir, text_dataset, vocab_size, # ... trace, pytorch_safetensors, # ... # (3) -------------------------- dep_param1, dep_param2, # ... dep_paramN) { # (4) ----------------------------------------- super$set_model_param("dep_param1", dep_param1) super$set_model_param("dep_param2", dep_param2) # ... super$set_model_param("dep_paramN", dep_paramN) # (5) ----------------------------------------- super$set_required_SFC(private$steps_for_creation) # optional, can be omitted if do not needed super$set_SFC_check_max_pos_emb(private$steps_for_creation$check_max_pos_emb) # (6) ----------------------------------------- super$create( ml_framework = ml_framework, model_dir = model_dir, text_dataset = text_dataset, vocab_size = vocab_size, # ... trace = trace, pytorch_safetensors = pytorch_safetensors # ... ) } ) )
-
Define
train
method (1) similarly to the step 5. Implement steps (functions) in the privatesteps_for_training
list (2). Do not forget to passself
as input parameter of the required function. Set the dependent parameters (4) in the base class usingsuper$set_model_param()
method (5). Set the implemented steps for training in the base class usingsuper$set_SFT_*()
methods (6). Finally run the basictrain
algorithm (7) of the base class with all the (input) parameters (3).{r} .AIFECustomTransformer <- R6::R6Class( classname = ".AIFECustomTransformer", inherit = .AIFEBaseTransformer, private = list( title = "Custom Model", steps_for_creation = list( # required create_tokenizer_draft = function(self) { }, calculate_vocab = function(self) { }, save_tokenizer_draft = function(self) { }, create_final_tokenizer = function(self) { }, create_transformer_model = function(self) { }, # optional: omit this element if do not needed check_max_pos_emb = function(self) { } ), # (2) steps_for_training = list( # required load_existing_model = function(self) { }, # optional cuda_empty_cache = function() { }, create_data_collator = function() { } ) ), public = list( initialize = function() { }, create = function() { # --------------------------- # super$set_model_param(...) # ... # --------------------------- # super$set_required_SFC(...) # super$set_SFC_*(...) # ... # --------------------------- # super$create(...) }, # (1) train = function(# (3) -------- ml_framework, # ... # (4) -------- dep_param1, # ... dep_paramN) { # (5) ----------------------------------------- super$set_model_param("dep_param1", dep_param1) # ... super$set_model_param("dep_paramN", dep_paramN) # (6) ----------------------------------------- super$set_SFT_load_existing_model(private$steps_for_training$load_existing_model) # optional super$set_SFT_cuda_empty_cache(private$steps_for_training$cuda_empty_cache) super$set_SFT_create_data_collator(private$steps_for_training$create_data_collator) # (7) ----------------------------------------- super$train( ml_framework = ml_framework, # ... ) } ) )
Now use the transformer (created in p. 4.2) like in the code example from p. 2.1.
-
In the file “R/AIFETransformerMaker.R” find the definition of the
AIFETrType
list and add a new element to the end of it (1):AIFETrType <- list( bert = "bert", roberta = "roberta", # ... mpnet = "mpnet", custom = "custom" # (1) )
In the end of the file “R/dotAIFECustomTransformer.R” put the following line to use this transformer with Transformer Maker:
.AIFETrObj[[AIFETrType$custom]] <- .AIFECustomTransformer$new
Now use the Transformer Maker class to create a custom transformer like in p. 3.2.