Represents models based on MPNet.
References
Song,K., Tan, X., Qin, T., Lu, J. & Liu, T.-Y. (2020). MPNet: Masked and Permuted Pre-training for Language Understanding. doi:10.48550/arXiv.2004.09297
See also
Other Base Model:
BaseModelBert,
BaseModelDebertaV2,
BaseModelFunnel,
BaseModelModernBert,
BaseModelRoberta
Super classes
aifeducation::AIFEMaster -> aifeducation::AIFEBaseModel -> aifeducation::BaseModelCore -> BaseModelMPNet
Methods
Inherited methods
aifeducation::AIFEMaster$get_all_fields()aifeducation::AIFEMaster$get_documentation_license()aifeducation::AIFEMaster$get_ml_framework()aifeducation::AIFEMaster$get_model_config()aifeducation::AIFEMaster$get_model_description()aifeducation::AIFEMaster$get_model_info()aifeducation::AIFEMaster$get_model_license()aifeducation::AIFEMaster$get_package_versions()aifeducation::AIFEMaster$get_private()aifeducation::AIFEMaster$get_publication_info()aifeducation::AIFEMaster$get_sustainability_data()aifeducation::AIFEMaster$is_configured()aifeducation::AIFEMaster$is_trained()aifeducation::AIFEMaster$set_documentation_license()aifeducation::AIFEMaster$set_model_description()aifeducation::AIFEMaster$set_model_license()aifeducation::BaseModelCore$calc_flops_architecture_based()aifeducation::BaseModelCore$count_parameter()aifeducation::BaseModelCore$create_from_hf()aifeducation::BaseModelCore$estimate_sustainability_inference_fill_mask()aifeducation::BaseModelCore$fill_mask()aifeducation::BaseModelCore$get_final_size()aifeducation::BaseModelCore$get_flops_estimates()aifeducation::BaseModelCore$get_model()aifeducation::BaseModelCore$get_model_type()aifeducation::BaseModelCore$get_special_tokens()aifeducation::BaseModelCore$get_tokenizer_statistics()aifeducation::BaseModelCore$load_from_disk()aifeducation::BaseModelCore$plot_training_history()aifeducation::BaseModelCore$save()aifeducation::BaseModelCore$set_publication_info()
Method configure()
Configures a new object of this class.
Usage
BaseModelMPNet$configure(
tokenizer,
max_position_embeddings = 512L,
hidden_size = 768L,
num_hidden_layers = 12L,
num_attention_heads = 12L,
intermediate_size = 3072L,
hidden_act = "GELU",
hidden_dropout_prob = 0.1,
attention_probs_dropout_prob = 0.1
)Arguments
tokenizerTokenizerBaseTokenizer for the model.max_position_embeddingsintNumber of maximum position embeddings. This parameter also determines the maximum length of a sequence which can be processed with the model. Allowed values:10 <= x <= 4048hidden_sizeintNumber of neurons in each layer. This parameter determines the dimensionality of the resulting text embedding. Allowed values:1 <= x <= 2048num_hidden_layersintNumber of hidden layers. Allowed values:1 <= xnum_attention_headsintdetermining the number of attention heads for a self-attention layer. Only relevant ifattention_type='multihead'Allowed values:0 <= xintermediate_sizeintdetermining the size of the projection layer within a each transformer encoder. Allowed values:1 <= xhidden_actstringName of the activation function. Allowed values: 'GELU', 'relu', 'silu', 'gelu_new'hidden_dropout_probdoubleRatio of dropout. Allowed values:0 <= x <= 0.6attention_probs_dropout_probdoubleRatio of dropout for attention probabilities. Allowed values:0 <= x <= 0.6
Method train()
Traines a BaseModel
Usage
BaseModelMPNet$train(
text_dataset,
p_mask = 0.15,
p_perm = 0.15,
whole_word = TRUE,
val_size = 0.1,
n_epoch = 1L,
batch_size = 12L,
max_sequence_length = 250L,
full_sequences_only = FALSE,
min_seq_len = 50L,
learning_rate = 0.003,
sustain_track = FALSE,
sustain_iso_code = NULL,
sustain_region = NULL,
sustain_interval = 15L,
sustain_log_level = "warning",
trace = TRUE,
pytorch_trace = 1L,
log_dir = NULL,
log_write_interval = 2L
)