Skip to contents

Represents models based on the Funnel-Transformer.

Value

Does return a new object of this class.

References

Dai, Z., Lai, G., Yang, Y. & Le, Q. V. (2020). Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing. doi:10.48550/arXiv.2006.03236

Methods

Inherited methods


Method configure()

Configures a new object of this class. Please ensure that your chosen configuration comply with the following guidelines:

  • hidden_size is a multiple of num_attention_heads.

Usage

BaseModelFunnel$configure(
  tokenizer,
  max_position_embeddings = 512L,
  hidden_size = 768L,
  block_sizes = c(4L, 4L, 4L),
  num_attention_heads = 12L,
  intermediate_size = 3072L,
  num_decoder_layers = 2L,
  d_head = 64L,
  funnel_pooling_type = "Mean",
  hidden_act = "GELU",
  hidden_dropout_prob = 0.1,
  attention_probs_dropout_prob = 0.1,
  activation_dropout = 0
)

Arguments

tokenizer

TokenizerBase Tokenizer for the model.

max_position_embeddings

int Number of maximum position embeddings. This parameter also determines the maximum length of a sequence which can be processed with the model. Allowed values: \(10 <= x <= 4048\)

hidden_size

int Number of neurons in each layer. This parameter determines the dimensionality of the resulting text embedding. Allowed values: \(1 <= x <= 2048\)

block_sizes

vector vector of int determining the number and sizes of each block.

num_attention_heads

int determining the number of attention heads for a self-attention layer. Only relevant if attention_type='multihead' Allowed values: \(0 <= x \)

intermediate_size

int determining the size of the projection layer within a each transformer encoder. Allowed values: \(1 <= x \)

num_decoder_layers

int Number of decoding layers. Allowed values: \(1 <= x \)

d_head

int Number of neurons of the final layer. Allowed values: \(1 <= x \)

funnel_pooling_type

string Method for pooling over the seqence length. Allowed values: 'Mean', 'Max'

hidden_act

string Name of the activation function. Allowed values: 'GELU', 'relu', 'silu', 'gelu_new'

hidden_dropout_prob

double Ratio of dropout. Allowed values: \(0 <= x <= 0.6\)

attention_probs_dropout_prob

double Ratio of dropout for attention probabilities. Allowed values: \(0 <= x <= 0.6\)

activation_dropout

double Dropout probability between the layers of the feed-forward blocks. Allowed values: \(0 <= x <= 0.6\)

num_hidden_layers

int Number of hidden layers. Allowed values: \(1 <= x \)

Returns

Does nothing return.


Method get_n_layers()

Number of layers.

Usage

BaseModelFunnel$get_n_layers()

Returns

Returns an int describing the number of layers available for embedding.


Method clone()

The objects of this class are cloneable with this method.

Usage

BaseModelFunnel$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.