Represents models based on RoBERTa.
References
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019). RoBERTa: A Robustly Optimized BERT Pretraining Approach. doi:10.48550/arXiv.1907.11692
See also
Other Base Model:
BaseModelBert,
BaseModelDebertaV2,
BaseModelFunnel,
BaseModelMPNet,
BaseModelModernBert
Super classes
aifeducation::AIFEMaster -> aifeducation::AIFEBaseModel -> aifeducation::BaseModelCore -> BaseModelRoberta
Methods
Inherited methods
aifeducation::AIFEMaster$get_all_fields()aifeducation::AIFEMaster$get_documentation_license()aifeducation::AIFEMaster$get_ml_framework()aifeducation::AIFEMaster$get_model_config()aifeducation::AIFEMaster$get_model_description()aifeducation::AIFEMaster$get_model_info()aifeducation::AIFEMaster$get_model_license()aifeducation::AIFEMaster$get_package_versions()aifeducation::AIFEMaster$get_private()aifeducation::AIFEMaster$get_publication_info()aifeducation::AIFEMaster$get_sustainability_data()aifeducation::AIFEMaster$is_configured()aifeducation::AIFEMaster$is_trained()aifeducation::AIFEMaster$set_documentation_license()aifeducation::AIFEMaster$set_model_description()aifeducation::AIFEMaster$set_model_license()aifeducation::BaseModelCore$calc_flops_architecture_based()aifeducation::BaseModelCore$count_parameter()aifeducation::BaseModelCore$create_from_hf()aifeducation::BaseModelCore$estimate_sustainability_inference_fill_mask()aifeducation::BaseModelCore$fill_mask()aifeducation::BaseModelCore$get_final_size()aifeducation::BaseModelCore$get_flops_estimates()aifeducation::BaseModelCore$get_model()aifeducation::BaseModelCore$get_model_type()aifeducation::BaseModelCore$get_special_tokens()aifeducation::BaseModelCore$get_tokenizer_statistics()aifeducation::BaseModelCore$load_from_disk()aifeducation::BaseModelCore$plot_training_history()aifeducation::BaseModelCore$save()aifeducation::BaseModelCore$set_publication_info()aifeducation::BaseModelCore$train()
Method configure()
Configures a new object of this class.
Usage
BaseModelRoberta$configure(
tokenizer,
max_position_embeddings = 512L,
hidden_size = 768L,
num_hidden_layers = 12L,
num_attention_heads = 12L,
intermediate_size = 3072L,
hidden_act = "GELU",
hidden_dropout_prob = 0.1,
attention_probs_dropout_prob = 0.1
)Arguments
tokenizerTokenizerBaseTokenizer for the model.max_position_embeddingsintNumber of maximum position embeddings. This parameter also determines the maximum length of a sequence which can be processed with the model. Allowed values:10 <= x <= 4048hidden_sizeintNumber of neurons in each layer. This parameter determines the dimensionality of the resulting text embedding. Allowed values:1 <= x <= 2048num_hidden_layersintNumber of hidden layers. Allowed values:1 <= xnum_attention_headsintdetermining the number of attention heads for a self-attention layer. Only relevant ifattention_type='multihead'Allowed values:0 <= xintermediate_sizeintdetermining the size of the projection layer within a each transformer encoder. Allowed values:1 <= xhidden_actstringName of the activation function. Allowed values: 'GELU', 'relu', 'silu', 'gelu_new'hidden_dropout_probdoubleRatio of dropout. Allowed values:0 <= x <= 0.6attention_probs_dropout_probdoubleRatio of dropout for attention probabilities. Allowed values:0 <= x <= 0.6