Base R6
class for creation and definition of .AIFE*Transformer-like
classes
Source: R/dotAIFEBaseTransformer.R
dot-AIFEBaseTransformer.Rd
This base class is used to create and define .AIFE*Transformer-like
classes. It serves as a skeleton
for a future concrete transformer and cannot be used to create an object of itself (an attempt to call new
-method
will produce an error).
See p.1 Base Transformer Class in Transformers for Developers for details.
Create
The create
-method is a basic algorithm that is used to create a new transformer, but cannot be
called directly.
Train
The train
-method is a basic algorithm that is used to train and tune the transformer but cannot be
called directly.
Concrete transformer implementation
There are already implemented concrete (child) transformers (e.g.
BERT
, DeBERTa-V2
, etc.), to implement a new one see p.4 Implement A Custom Transformer in
Transformers for Developers
See also
Other Transformers for developers:
.AIFEBertTransformer
,
.AIFEDebertaTransformer
,
.AIFEFunnelTransformer
,
.AIFELongformerTransformer
,
.AIFEMpnetTransformer
,
.AIFERobertaTransformer
,
.AIFETrObj
Public fields
params
A list containing transformer's parameters ('static', 'dynamic' and 'dependent' parameters)
list()
containing all the transformer parameters. Can be set withset_model_param()
.'Static' parameters
Regardless of the transformer, the following parameters are always included:
ml_framework
text_dataset
sustain_track
sustain_iso_code
sustain_region
sustain_interval
trace
pytorch_safetensors
log_dir
log_write_interval
'Dynamic' parameters
In the case of create it also contains (see
create
-method for details):model_dir
vocab_size
max_position_embeddings
hidden_size
hidden_act
hidden_dropout_prob
attention_probs_dropout_prob
intermediate_size
num_attention_heads
In the case of train it also contains (see
train
-method for details):output_dir
model_dir_path
p_mask
whole_word
val_size
n_epoch
batch_size
chunk_size
min_seq_len
full_sequences_only
learning_rate
n_workers
multi_process
keras_trace
pytorch_trace
temp
A list containing temporary transformer's parameters
list()
containing all the temporary local variables that need to be accessed between the step functions. Can be set withset_model_temp()
.For example, it can be a variable
tok_new
that stores the tokenizer fromsteps_for_creation$create_tokenizer_draft
. To train the tokenizer, access the variabletok_new
insteps_for_creation$calculate_vocab
through thetemp
list of this class.
Methods
Method new()
An object of this class cannot be created. Thus, method's call will produce an error.
Usage
.AIFEBaseTransformer$new()
Method set_model_param()
Setter for the parameters. Adds a new parameter and its value to the params
list.
Method set_model_temp()
Setter for the temporary model's parameters. Adds a new temporary parameter and its value to the
temp
list.
Method set_SFC_check_max_pos_emb()
Setter for the check_max_pos_emb
element of the private steps_for_creation
list. Sets a new
fun
function as the check_max_pos_emb
step.
Method set_SFC_create_tokenizer_draft()
Setter for the create_tokenizer_draft
element of the private steps_for_creation
list. Sets a
new fun
function as the create_tokenizer_draft
step.
Method set_SFC_calculate_vocab()
Setter for the calculate_vocab
element of the private steps_for_creation
list. Sets a new fun
function as the calculate_vocab
step.
Method set_SFC_save_tokenizer_draft()
Setter for the save_tokenizer_draft
element of the private steps_for_creation
list. Sets a new
fun
function as the save_tokenizer_draft
step.
Method set_SFC_create_final_tokenizer()
Setter for the create_final_tokenizer
element of the private steps_for_creation
list. Sets a new
fun
function as the create_final_tokenizer
step.
Method set_SFC_create_transformer_model()
Setter for the create_transformer_model
element of the private steps_for_creation
list. Sets a
new fun
function as the create_transformer_model
step.
Method set_required_SFC()
Setter for all required elements of the private steps_for_creation
list. Executes setters for all
required creation steps.
Arguments
required_SFC
list()
A list of all new required steps.
Method set_SFT_load_existing_model()
Setter for the load_existing_model
element of the private steps_for_training
list. Sets a new
fun
function as the load_existing_model
step.
Method set_SFT_cuda_empty_cache()
Setter for the cuda_empty_cache
element of the private steps_for_training
list. Sets a new
fun
function as the cuda_empty_cache
step.
Method set_SFT_create_data_collator()
Setter for the create_data_collator
element of the private steps_for_training
list. Sets a new
fun
function as the create_data_collator
step. Use this method to make a custom data collator for a
transformer.
Method create()
This method creates a transformer configuration based on the child-transformer architecture and a
vocabulary using the python libraries transformers
and tokenizers
.
This method adds the following parameters to the temp
list:
log_file
raw_text_dataset
pt_safe_save
value_top
total_top
message_top
This method uses the following parameters from the temp
list:
log_file
raw_text_dataset
tokenizer
Usage
.AIFEBaseTransformer$create(
ml_framework,
model_dir,
text_dataset,
vocab_size,
max_position_embeddings,
hidden_size,
num_attention_heads,
intermediate_size,
hidden_act,
hidden_dropout_prob,
attention_probs_dropout_prob,
sustain_track,
sustain_iso_code,
sustain_region,
sustain_interval,
trace,
pytorch_safetensors,
log_dir,
log_write_interval
)
Arguments
ml_framework
string
Framework to use for training and inference.ml_framework = "tensorflow"
: for 'tensorflow'.ml_framework = "pytorch"
: for 'pytorch'.
model_dir
string
Path to the directory where the model should be saved.text_dataset
Object of class LargeDataSetForText.
vocab_size
int
Size of the vocabulary.max_position_embeddings
int
Number of maximum position embeddings. This parameter also determines the maximum length of a sequence which can be processed with the model.hidden_size
int
Number of neurons in each layer. This parameter determines the dimensionality of the resulting text embedding.num_attention_heads
int
Number of attention heads.intermediate_size
int
Number of neurons in the intermediate layer of the attention mechanism.hidden_act
string
Name of the activation function.hidden_dropout_prob
double
Ratio of dropout.attention_probs_dropout_prob
double
Ratio of dropout for attention probabilities.sustain_track
bool
IfTRUE
energy consumption is tracked during training via the python library codecarbon.sustain_iso_code
string
ISO code (Alpha-3-Code) for the country. This variable must be set if sustainability should be tracked. A list can be found on Wikipedia: https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes.sustain_region
string
Region within a country. Only available for USA and Canada. See the documentation of codecarbon for more information https://mlco2.github.io/codecarbon/parameters.html.sustain_interval
integer
Interval in seconds for measuring power usage.trace
bool
TRUE
if information about the progress should be printed to the console.pytorch_safetensors
bool
Only relevant for pytorch models.TRUE
: a 'pytorch' model is saved in safetensors format.FALSE
(or 'safetensors' is not available): model is saved in the standard pytorch format (.bin).
log_dir
Path to the directory where the log files should be saved.
log_write_interval
int
Time in seconds determining the interval in which the logger should try to update the log files. Only relevant iflog_dir
is notNULL
.
Method train()
This method can be used to train or fine-tune a transformer based on BERT
architecture with the
help of the python libraries transformers
, datasets
, and tokenizers
.
This method adds the following parameters to the temp
list:
log_file
loss_file
from_pt
from_tf
load_safe
raw_text_dataset
pt_safe_save
value_top
total_top
message_top
This method uses the following parameters from the temp
list:
log_file
raw_text_dataset
tokenized_dataset
tokenizer
Usage
.AIFEBaseTransformer$train(
ml_framework,
output_dir,
model_dir_path,
text_dataset,
p_mask,
whole_word,
val_size,
n_epoch,
batch_size,
chunk_size,
full_sequences_only,
min_seq_len,
learning_rate,
n_workers,
multi_process,
sustain_track,
sustain_iso_code,
sustain_region,
sustain_interval,
trace,
keras_trace,
pytorch_trace,
pytorch_safetensors,
log_dir,
log_write_interval
)
Arguments
ml_framework
string
Framework to use for training and inference.ml_framework = "tensorflow"
: for 'tensorflow'.ml_framework = "pytorch"
: for 'pytorch'.
output_dir
string
Path to the directory where the final model should be saved. If the directory does not exist, it will be created.model_dir_path
string
Path to the directory where the original model is stored.text_dataset
Object of class LargeDataSetForText.
p_mask
double
Ratio that determines the number of words/tokens used for masking.whole_word
bool
TRUE
: whole word masking should be applied.FALSE
: token masking is used.
val_size
double
Ratio that determines the amount of token chunks used for validation.n_epoch
int
Number of epochs for training.batch_size
int
Size of batches.chunk_size
int
Size of every chunk for training.full_sequences_only
bool
TRUE
for using only chunks with a sequence length equal tochunk_size
.min_seq_len
int
Only relevant iffull_sequences_only = FALSE
. Value determines the minimal sequence length included in training process.learning_rate
double
Learning rate for adam optimizer.n_workers
int
Number of workers. Only relevant ifml_framework = "tensorflow"
.multi_process
bool
TRUE
if multiple processes should be activated. Only relevant ifml_framework = "tensorflow"
.sustain_track
bool
IfTRUE
energy consumption is tracked during training via the python library codecarbon.sustain_iso_code
string
ISO code (Alpha-3-Code) for the country. This variable must be set if sustainability should be tracked. A list can be found on Wikipedia: https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes.sustain_region
string
Region within a country. Only available for USA and Canada. See the documentation of codecarbon for more information https://mlco2.github.io/codecarbon/parameters.html.sustain_interval
integer
Interval in seconds for measuring power usage.trace
bool
TRUE
if information about the progress should be printed to the console.keras_trace
int
keras_trace = 0
: does not print any information about the training process from keras on the console.keras_trace = 1
: prints a progress bar.keras_trace = 2
: prints one line of information for every epoch. Only relevant ifml_framework = "tensorflow"
.
pytorch_trace
int
pytorch_trace = 0
: does not print any information about the training process from pytorch on the console.pytorch_trace = 1
: prints a progress bar.
pytorch_safetensors
bool
Only relevant for pytorch models.TRUE
: a 'pytorch' model is saved in safetensors format.FALSE
(or 'safetensors' is not available): model is saved in the standard pytorch format (.bin).
log_dir
Path to the directory where the log files should be saved.
log_write_interval
int
Time in seconds determining the interval in which the logger should try to update the log files. Only relevant iflog_dir
is notNULL
.