Base R6 class for creation and definition of .AIFE*Transformer-like classes

This base class is used to create and define .AIFE*Transformer-like classes. It serves as a skeleton for a future concrete transformer and cannot be used to create an object of itself (an attempt to call new-method will produce an error).

See p.1 Base Transformer Class in Transformers for Developers for details.

Create

The create-method is a basic algorithm that is used to create a new transformer, but cannot be called directly.

Train

The train-method is a basic algorithm that is used to train and tune the transformer but cannot be called directly.

Concrete transformer implementation

There are already implemented concrete (child) transformers (e.g. BERT, DeBERTa-V2, etc.), to implement a new one see p.4 Implement A Custom Transformer in Transformers for Developers

References

Hugging Face transformers documantation:

Public fields

params

A list containing transformer's parameters ('static', 'dynamic' and 'dependent' parameters)

list() containing all the transformer parameters. Can be set with set_model_param().

'Static' parameters

Regardless of the transformer, the following parameters are always included:

ml_framework
text_dataset
sustain_track
sustain_iso_code
sustain_region
sustain_interval
trace
pytorch_safetensors
log_dir
log_write_interval

'Dynamic' parameters

In the case of create it also contains (see create-method for details):

model_dir
vocab_size
max_position_embeddings
hidden_size
hidden_act
hidden_dropout_prob
attention_probs_dropout_prob
intermediate_size
num_attention_heads

In the case of train it also contains (see train-method for details):

output_dir
model_dir_path
p_mask
whole_word
val_size
n_epoch
batch_size
chunk_size
min_seq_len
full_sequences_only
learning_rate
n_workers
multi_process
keras_trace
pytorch_trace

'Dependent' parameters

Depending on the transformer and the method used class may contain different parameters:

vocab_do_lower_case
num_hidden_layer
add_prefix_space
etc.

temp

A list containing temporary transformer's parameters

list() containing all the temporary local variables that need to be accessed between the step functions. Can be set with set_model_temp().

For example, it can be a variable tok_new that stores the tokenizer from steps_for_creation$create_tokenizer_draft. To train the tokenizer, access the variable tok_new in steps_for_creation$calculate_vocab through the temp list of this class.

Methods

Public methods

.AIFEBaseTransformer$new()
.AIFEBaseTransformer$set_title()
.AIFEBaseTransformer$set_model_param()
.AIFEBaseTransformer$set_model_temp()
.AIFEBaseTransformer$set_SFC_check_max_pos_emb()
.AIFEBaseTransformer$set_SFC_create_tokenizer_draft()
.AIFEBaseTransformer$set_SFC_calculate_vocab()
.AIFEBaseTransformer$set_SFC_save_tokenizer_draft()
.AIFEBaseTransformer$set_SFC_create_final_tokenizer()
.AIFEBaseTransformer$set_SFC_create_transformer_model()
.AIFEBaseTransformer$set_required_SFC()
.AIFEBaseTransformer$set_SFT_load_existing_model()
.AIFEBaseTransformer$set_SFT_cuda_empty_cache()
.AIFEBaseTransformer$set_SFT_create_data_collator()
.AIFEBaseTransformer$create()
.AIFEBaseTransformer$train()
.AIFEBaseTransformer$clone()

Method `new()`

An object of this class cannot be created. Thus, method's call will produce an error.

Usage

.AIFEBaseTransformer$new()

Returns

This method returns an error.

Method `set_title()`

Setter for the title. Sets a new value for the title private attribute.

Usage

.AIFEBaseTransformer$set_title(title)

Arguments

title: string A new title.

Returns

This method returns nothing.

Method `set_model_param()`

Setter for the parameters. Adds a new parameter and its value to the params list.

Usage

.AIFEBaseTransformer$set_model_param(param_name, param_value)

Arguments

param_name: string Parameter's name.
param_value: any Parameter's value.

Returns

This method returns nothing.

Method `set_model_temp()`

Setter for the temporary model's parameters. Adds a new temporary parameter and its value to the temp list.

Usage

.AIFEBaseTransformer$set_model_temp(temp_name, temp_value)

Arguments

temp_name: string Parameter's name.
temp_value: any Parameter's value.

Returns

This method returns nothing.

Method `set_SFC_check_max_pos_emb()`

Setter for the check_max_pos_emb element of the private steps_for_creation list. Sets a new fun function as the check_max_pos_emb step.

Usage

.AIFEBaseTransformer$set_SFC_check_max_pos_emb(fun)

Arguments

fun: function() A new function.

Returns

This method returns nothing.

Method `set_SFC_create_tokenizer_draft()`

Setter for the create_tokenizer_draft element of the private steps_for_creation list. Sets a new fun function as the create_tokenizer_draft step.

Usage

.AIFEBaseTransformer$set_SFC_create_tokenizer_draft(fun)

Arguments

fun: function() A new function.

Returns

This method returns nothing.

Method `set_SFC_calculate_vocab()`

Setter for the calculate_vocab element of the private steps_for_creation list. Sets a new fun function as the calculate_vocab step.

Usage

.AIFEBaseTransformer$set_SFC_calculate_vocab(fun)

Arguments

fun: function() A new function.

Returns

This method returns nothing.

Method `set_SFC_save_tokenizer_draft()`

Setter for the save_tokenizer_draft element of the private steps_for_creation list. Sets a new fun function as the save_tokenizer_draft step.

Usage

.AIFEBaseTransformer$set_SFC_save_tokenizer_draft(fun)

Arguments

fun: function() A new function.

Returns

This method returns nothing.

Method `set_SFC_create_final_tokenizer()`

Setter for the create_final_tokenizer element of the private steps_for_creation list. Sets a new fun function as the create_final_tokenizer step.

Usage

.AIFEBaseTransformer$set_SFC_create_final_tokenizer(fun)

Arguments

fun: function() A new function.

Returns

This method returns nothing.

Method `set_SFC_create_transformer_model()`

Setter for the create_transformer_model element of the private steps_for_creation list. Sets a new fun function as the create_transformer_model step.

Usage

.AIFEBaseTransformer$set_SFC_create_transformer_model(fun)

Arguments

fun: function() A new function.

Returns

This method returns nothing.

Method `set_required_SFC()`

Setter for all required elements of the private steps_for_creation list. Executes setters for all required creation steps.

Usage

.AIFEBaseTransformer$set_required_SFC(required_SFC)

Arguments

required_SFC: list() A list of all new required steps.

Returns

This method returns nothing.

Method `set_SFT_load_existing_model()`

Setter for the load_existing_model element of the private steps_for_training list. Sets a new fun function as the load_existing_model step.

Usage

.AIFEBaseTransformer$set_SFT_load_existing_model(fun)

Arguments

fun: function() A new function.

Returns

This method returns nothing.

Method `set_SFT_cuda_empty_cache()`

Setter for the cuda_empty_cache element of the private steps_for_training list. Sets a new fun function as the cuda_empty_cache step.

Usage

.AIFEBaseTransformer$set_SFT_cuda_empty_cache(fun)

Arguments

fun: function() A new function.

Returns

This method returns nothing.

Method `set_SFT_create_data_collator()`

Setter for the create_data_collator element of the private steps_for_training list. Sets a new fun function as the create_data_collator step. Use this method to make a custom data collator for a transformer.

Usage

.AIFEBaseTransformer$set_SFT_create_data_collator(fun)

Arguments

fun: function() A new function.

Returns

This method returns nothing.

Method `create()`

This method creates a transformer configuration based on the child-transformer architecture and a vocabulary using the python libraries transformers and tokenizers.

This method adds the following parameters to the temp list:

log_file
raw_text_dataset
pt_safe_save
value_top
total_top
message_top

This method uses the following parameters from the temp list:

log_file
raw_text_dataset
tokenizer

Usage

.AIFEBaseTransformer$create(
  ml_framework,
  model_dir,
  text_dataset,
  vocab_size,
  max_position_embeddings,
  hidden_size,
  num_attention_heads,
  intermediate_size,
  hidden_act,
  hidden_dropout_prob,
  attention_probs_dropout_prob,
  sustain_track,
  sustain_iso_code,
  sustain_region,
  sustain_interval,
  trace,
  pytorch_safetensors,
  log_dir,
  log_write_interval
)

Arguments

ml_framework

string Framework to use for training and inference.

ml_framework = "tensorflow": for 'tensorflow'.
ml_framework = "pytorch": for 'pytorch'.

model_dir

string Path to the directory where the model should be saved.

text_dataset

Object of class LargeDataSetForText.

vocab_size

int Size of the vocabulary.

max_position_embeddings

int Number of maximum position embeddings. This parameter also determines the maximum length of a sequence which can be processed with the model.

hidden_size

int Number of neurons in each layer. This parameter determines the dimensionality of the resulting text embedding.

num_attention_heads

int Number of attention heads.

intermediate_size

int Number of neurons in the intermediate layer of the attention mechanism.

hidden_act

string Name of the activation function.

hidden_dropout_prob

double Ratio of dropout.

attention_probs_dropout_prob

double Ratio of dropout for attention probabilities.

sustain_track

bool If TRUE energy consumption is tracked during training via the python library codecarbon.

sustain_iso_code

string ISO code (Alpha-3-Code) for the country. This variable must be set if sustainability should be tracked. A list can be found on Wikipedia: https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes.

sustain_region

string Region within a country. Only available for USA and Canada. See the documentation of codecarbon for more information https://mlco2.github.io/codecarbon/parameters.html.

sustain_interval

integer Interval in seconds for measuring power usage.

trace

bool TRUE if information about the progress should be printed to the console.

pytorch_safetensors

bool Only relevant for pytorch models.

TRUE: a 'pytorch' model is saved in safetensors format.
FALSE (or 'safetensors' is not available): model is saved in the standard pytorch format (.bin).

log_dir

Path to the directory where the log files should be saved.

log_write_interval

int Time in seconds determining the interval in which the logger should try to update the log files. Only relevant if log_dir is not NULL.

Returns

This method does not return an object. Instead, it saves the configuration and vocabulary of the new model to disk.

Method `train()`

This method can be used to train or fine-tune a transformer based on BERT architecture with the help of the python libraries transformers, datasets, and tokenizers.

This method adds the following parameters to the temp list:

log_file
loss_file
from_pt
from_tf
load_safe
raw_text_dataset
pt_safe_save
value_top
total_top
message_top

This method uses the following parameters from the temp list:

log_file
raw_text_dataset
tokenized_dataset
tokenizer

Usage

.AIFEBaseTransformer$train(
  ml_framework,
  output_dir,
  model_dir_path,
  text_dataset,
  p_mask,
  whole_word,
  val_size,
  n_epoch,
  batch_size,
  chunk_size,
  full_sequences_only,
  min_seq_len,
  learning_rate,
  n_workers,
  multi_process,
  sustain_track,
  sustain_iso_code,
  sustain_region,
  sustain_interval,
  trace,
  keras_trace,
  pytorch_trace,
  pytorch_safetensors,
  log_dir,
  log_write_interval
)

Arguments

ml_framework

string Framework to use for training and inference.

ml_framework = "tensorflow": for 'tensorflow'.
ml_framework = "pytorch": for 'pytorch'.

output_dir

string Path to the directory where the final model should be saved. If the directory does not exist, it will be created.

model_dir_path

string Path to the directory where the original model is stored.

text_dataset

Object of class LargeDataSetForText.

p_mask

double Ratio that determines the number of words/tokens used for masking.

whole_word

bool

TRUE: whole word masking should be applied.
FALSE: token masking is used.

val_size

double Ratio that determines the amount of token chunks used for validation.

n_epoch

int Number of epochs for training.

batch_size

int Size of batches.

chunk_size

int Size of every chunk for training.

full_sequences_only

bool TRUE for using only chunks with a sequence length equal to chunk_size.

min_seq_len

int Only relevant if full_sequences_only = FALSE. Value determines the minimal sequence length included in training process.

learning_rate

double Learning rate for adam optimizer.

n_workers

int Number of workers. Only relevant if ml_framework = "tensorflow".

multi_process

bool TRUE if multiple processes should be activated. Only relevant if ml_framework = "tensorflow".

sustain_track

bool If TRUE energy consumption is tracked during training via the python library codecarbon.

sustain_iso_code

sustain_region

string Region within a country. Only available for USA and Canada. See the documentation of codecarbon for more information https://mlco2.github.io/codecarbon/parameters.html.

sustain_interval

integer Interval in seconds for measuring power usage.

trace

bool TRUE if information about the progress should be printed to the console.

keras_trace

int

keras_trace = 0: does not print any information about the training process from keras on the console.
keras_trace = 1: prints a progress bar.
keras_trace = 2: prints one line of information for every epoch. Only relevant if ml_framework = "tensorflow".

pytorch_trace

int

pytorch_trace = 0: does not print any information about the training process from pytorch on the console.
pytorch_trace = 1: prints a progress bar.

pytorch_safetensors

bool Only relevant for pytorch models.

TRUE: a 'pytorch' model is saved in safetensors format.
FALSE (or 'safetensors' is not available): model is saved in the standard pytorch format (.bin).

log_dir

Path to the directory where the log files should be saved.

log_write_interval

int Time in seconds determining the interval in which the logger should try to update the log files. Only relevant if log_dir is not NULL.

Returns

This method does not return an object. Instead, it saves the configuration and vocabulary of the new model to disk.

Method `clone()`

The objects of this class are cloneable with this method.

Usage

.AIFEBaseTransformer$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Base R6 class for creation and definition of .AIFE*Transformer-like classes

Create

Train

Concrete transformer implementation

References

See also

Public fields

'Static' parameters

'Dynamic' parameters

'Dependent' parameters

Methods

Public methods

Method new()

Usage

Returns

Method set_title()

Usage

Arguments

Returns

Method set_model_param()

Usage

Arguments

Returns

Method set_model_temp()

Usage

Arguments

Returns

Method set_SFC_check_max_pos_emb()

Usage

Arguments

Returns

Method set_SFC_create_tokenizer_draft()

Usage

Arguments

Returns

Method set_SFC_calculate_vocab()

Usage

Arguments

Returns

Method set_SFC_save_tokenizer_draft()

Usage

Arguments

Returns

Method set_SFC_create_final_tokenizer()

Usage

Arguments

Returns

Method set_SFC_create_transformer_model()

Usage

Arguments

Returns

Method set_required_SFC()

Usage

Arguments

Returns

Method set_SFT_load_existing_model()

Usage

Arguments

Returns

Method set_SFT_cuda_empty_cache()

Usage

Arguments

Returns

Method set_SFT_create_data_collator()

Usage

Arguments

Returns

Method create()

Usage

Arguments

Returns

Method train()

Usage

Arguments

Returns

Method clone()

Usage

Arguments

Base `R6` class for creation and definition of `.AIFE*Transformer-like` classes

Method `new()`

Method `set_title()`

Method `set_model_param()`

Method `set_model_temp()`

Method `set_SFC_check_max_pos_emb()`

Method `set_SFC_create_tokenizer_draft()`

Method `set_SFC_calculate_vocab()`

Method `set_SFC_save_tokenizer_draft()`

Method `set_SFC_create_final_tokenizer()`

Method `set_SFC_create_transformer_model()`

Method `set_required_SFC()`

Method `set_SFT_load_existing_model()`

Method `set_SFT_cuda_empty_cache()`

Method `set_SFT_create_data_collator()`

Method `create()`

Method `train()`

Method `clone()`