Represents models based on the Funnel-Transformer.
References
Dai, Z., Lai, G., Yang, Y. & Le, Q. V. (2020). Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing. doi:10.48550/arXiv.2006.03236
See also
Other Base Model:
BaseModelBert
,
BaseModelDebertaV2
,
BaseModelMPNet
,
BaseModelModernBert
,
BaseModelRoberta
Super classes
aifeducation::AIFEMaster
-> aifeducation::AIFEBaseModel
-> aifeducation::BaseModelCore
-> BaseModelFunnel
Methods
Inherited methods
aifeducation::AIFEMaster$get_all_fields()
aifeducation::AIFEMaster$get_documentation_license()
aifeducation::AIFEMaster$get_ml_framework()
aifeducation::AIFEMaster$get_model_config()
aifeducation::AIFEMaster$get_model_description()
aifeducation::AIFEMaster$get_model_info()
aifeducation::AIFEMaster$get_model_license()
aifeducation::AIFEMaster$get_package_versions()
aifeducation::AIFEMaster$get_private()
aifeducation::AIFEMaster$get_publication_info()
aifeducation::AIFEMaster$get_sustainability_data()
aifeducation::AIFEMaster$is_configured()
aifeducation::AIFEMaster$is_trained()
aifeducation::AIFEMaster$set_documentation_license()
aifeducation::AIFEMaster$set_model_description()
aifeducation::AIFEMaster$set_model_license()
aifeducation::BaseModelCore$calc_flops_architecture_based()
aifeducation::BaseModelCore$count_parameter()
aifeducation::BaseModelCore$create_from_hf()
aifeducation::BaseModelCore$estimate_sustainability_inference_fill_mask()
aifeducation::BaseModelCore$fill_mask()
aifeducation::BaseModelCore$get_final_size()
aifeducation::BaseModelCore$get_flops_estimates()
aifeducation::BaseModelCore$get_model()
aifeducation::BaseModelCore$get_model_type()
aifeducation::BaseModelCore$get_special_tokens()
aifeducation::BaseModelCore$get_tokenizer_statistics()
aifeducation::BaseModelCore$load_from_disk()
aifeducation::BaseModelCore$plot_training_history()
aifeducation::BaseModelCore$save()
aifeducation::BaseModelCore$set_publication_info()
aifeducation::BaseModelCore$train()
Method configure()
Configures a new object of this class.
Usage
BaseModelFunnel$configure(
tokenizer,
max_position_embeddings = 512L,
hidden_size = 768L,
block_sizes = c(4L, 4L, 4L),
num_attention_heads = 12L,
intermediate_size = 3072L,
num_decoder_layers = 2L,
d_head = 64L,
funnel_pooling_type = "Mean",
hidden_act = "GELU",
hidden_dropout_prob = 0.1,
attention_probs_dropout_prob = 0.1,
activation_dropout = 0
)
Arguments
tokenizer
TokenizerBase
Tokenizer for the model.max_position_embeddings
int
Number of maximum position embeddings. This parameter also determines the maximum length of a sequence which can be processed with the model. Allowed values:10 <= x <= 4048
hidden_size
int
Number of neurons in each layer. This parameter determines the dimensionality of the resulting text embedding. Allowed values:1 <= x <= 2048
block_sizes
vector
vector
ofint
determining the number and sizes of each block.num_attention_heads
int
determining the number of attention heads for a self-attention layer. Only relevant ifattention_type='multihead'
Allowed values:0 <= x
intermediate_size
int
determining the size of the projection layer within a each transformer encoder. Allowed values:1 <= x
num_decoder_layers
int
Number of decoding layers. Allowed values:1 <= x
d_head
int
Number of neurons of the final layer. Allowed values:1 <= x
funnel_pooling_type
string
Method for pooling over the seqence length. Allowed values: 'Mean', 'Max'hidden_act
string
Name of the activation function. Allowed values: 'GELU', 'relu', 'silu', 'gelu_new'hidden_dropout_prob
double
Ratio of dropout. Allowed values:0 <= x <= 0.6
attention_probs_dropout_prob
double
Ratio of dropout for attention probabilities. Allowed values:0 <= x <= 0.6
activation_dropout
double
Dropout probability between the layers of the feed-forward blocks. Allowed values:0 <= x <= 0.6
num_hidden_layers
int
Number of hidden layers. Allowed values:1 <= x