Skip to contents

Classification Type

This object is a metric based classifer and represents in implementation of a prototypical network for few-shot learning as described by Snell, Swersky, and Zemel (2017). The network uses a multi way contrastive loss described by Zhang et al. (2019). The network learns to scale the metric as described by Oreshkin, Rodriguez, and Lacoste (2018).

Parallel Core Architecture

This model is based on a parallel architecture. An input is passed to different types of layers separately. At the end the outputs are combined to create the final output of the whole model.

Transformer Encoder Layers

Description

The transformer encoder layers follow the structure of the encoder layers used in transformer models. A single layer is designed as described by Chollet, Kalinowski, and Allaire (2022, p. 373) with the exception that single components of the layers (such as the activation function, the kind of residual connection, the kind of normalization or the kind of attention) can be customized. All parameters with the prefix tf_ can be used to configure this layer.

Feature Layer

Description

The feature layer is a dense layer that can be used to increase or decrease the number of features of the input data before passing the data into your model. The aim of this layer is to increase or reduce the complexity of the data for your model. The output size of this layer determines the number of features for all following layers. In the special case that the requested number of features equals the number of features of the text embeddings this layer is reduced to a dropout layer with masking capabilities. All parameters with the prefix feat_ can be used to configure this layer.

Dense Layers

Description

A fully connected layer. The layer is applied to every step of a sequence. All parameters with the prefix dense_ can be used to configure this layer.

Multiple N-Gram Layers

Description

This type of layer focuses on sub-sequence and performs an 1d convolutional operation. On a word and token level these sub-sequences can be interpreted as n-grams (Jacovi, Shalom & Goldberg 2018). The convolution is done across all features. The number of filters equals the number of features of the input tensor. Thus, the shape of the tensor is retained (Pham, Kruszewski & Boleda 2016).

The layer is able to consider multiple n-grams at the same time. In this case the convolution of the n-grams is done seprately and the resulting tensors are concatenated along the feature dimension. The number of filters for every n-gram is set to num_features/num_n-grams. Thus, the resulting tensor has the same shape as the input tensor.

Sub-sequences that are masked in the input are also masked in the output.

The output of this layer can be understand as the results of the n-gram filters. Stacking this layer allows the model to perform n-gram detection of n-grams (meta perspective). All parameters with the prefix ng_conv_ can be used to configure this layer.

Recurrent Layers

Description

A regular recurrent layer either as Gated Recurrent Unit (GRU) or Long Short-Term Memory (LSTM) layer. Uses PyTorchs implementation. All parameters with the prefix rec_ can be used to configure this layer.

Merge Layer

Description

Layer for combining the output of different layers. All inputs must be sequential data of shape (Batch, Times, Features). First, pooling over time is applied extracting the minimal and/or maximal features. Second, the pooled tensors are combined by calculating their weighted sum. Different attention mechanism can be used to dynamically calculate the corresponding weights. This allows the model to decide which part of the data is most usefull. Finally, pooling over features is applied extracting a specific number of maximal and/or minimal features. A normalization of all input at the begining of the layer is possible. All parameters with the prefix merge_ can be used to configure this layer.

Training and Prediction

For the creation and training of a classifier an object of class EmbeddedText or LargeDataSetForTextEmbeddings on the one hand and a factor on the other hand are necessary.

The object of class EmbeddedText or LargeDataSetForTextEmbeddings contains the numerical text representations (text embeddings) of the raw texts generated by an object of class TextEmbeddingModel. For supporting large data sets it is recommended to use LargeDataSetForTextEmbeddings instead of EmbeddedText.

The factor contains the classes/categories for every text. Missing values (unlabeled cases) are supported and can be used for pseudo labeling.

For predictions an object of class EmbeddedText or LargeDataSetForTextEmbeddings has to be used which was created with the same TextEmbeddingModel as for training.

Value

Returns a new object of this class ready for configuration or for loading a saved classifier.

References

Oreshkin, B. N., Rodriguez, P. & Lacoste, A. (2018). TADAM: Task dependent adaptive metric for improved few-shot learning. https://doi.org/10.48550/arXiv.1805.10123

Snell, J., Swersky, K. & Zemel, R. S. (2017). Prototypical Networks for Few-shot Learning. https://doi.org/10.48550/arXiv.1703.05175

Zhang, X., Nie, J., Zong, L., Yu, H. & Liang, W. (2019). One Shot Learning with Margin. In Q. Yang, Z.-H. Zhou, Z. Gong, M.-L. Zhang & S.-J. Huang (Eds.), Lecture Notes in Computer Science. Advances in Knowledge Discovery and Data Mining (Vol. 11440, pp. 305–317). Springer International Publishing. https://doi.org/10.1007/978-3-030-16145-3_24

Methods

Inherited methods


Method configure()

Creating a new instance of this class.

Usage

TEClassifierParallelPrototype$configure(
  name = NULL,
  label = NULL,
  text_embeddings = NULL,
  feature_extractor = NULL,
  target_levels = NULL,
  metric_type = "Euclidean",
  shared_feat_layer = TRUE,
  feat_act_fct = "ELU",
  feat_size = 50,
  feat_bias = TRUE,
  feat_dropout = 0,
  feat_parametrizations = "None",
  feat_normalization_type = "LayerNorm",
  ng_conv_act_fct = "ELU",
  ng_conv_n_layers = 1,
  ng_conv_ks_min = 2,
  ng_conv_ks_max = 4,
  ng_conv_bias = FALSE,
  ng_conv_dropout = 0.1,
  ng_conv_parametrizations = "None",
  ng_conv_normalization_type = "LayerNorm",
  ng_conv_residual_type = "ResidualGate",
  dense_act_fct = "ELU",
  dense_n_layers = 1,
  dense_dropout = 0.5,
  dense_bias = FALSE,
  dense_parametrizations = "None",
  dense_normalization_type = "LayerNorm",
  dense_residual_type = "ResidualGate",
  rec_act_fct = "Tanh",
  rec_n_layers = 1,
  rec_type = "GRU",
  rec_bidirectional = FALSE,
  rec_dropout = 0.2,
  rec_bias = FALSE,
  rec_parametrizations = "None",
  rec_normalization_type = "LayerNorm",
  rec_residual_type = "ResidualGate",
  tf_act_fct = "ELU",
  tf_dense_dim = 50,
  tf_n_layers = 1,
  tf_dropout_rate_1 = 0.1,
  tf_dropout_rate_2 = 0.5,
  tf_attention_type = "MultiHead",
  tf_positional_type = "absolute",
  tf_num_heads = 1,
  tf_bias = FALSE,
  tf_parametrizations = "None",
  tf_normalization_type = "LayerNorm",
  tf_residual_type = "ResidualGate",
  merge_attention_type = "multi_head",
  merge_num_heads = 1,
  merge_normalization_type = "LayerNorm",
  merge_pooling_features = 50,
  merge_pooling_type = "MinMax",
  embedding_dim = 2
)

Arguments

name

string Name of the new model. Please refer to common name conventions. Free text can be used with parameter label. If set to NULL a unique ID is generated automatically. Allowed values: any

label

string Label for the new model. Here you can use free text. Allowed values: any

text_embeddings

EmbeddedText, LargeDataSetForTextEmbeddings Object of class EmbeddedText or LargeDataSetForTextEmbeddings.

feature_extractor

TEFeatureExtractor Object of class TEFeatureExtractor which should be used in order to reduce the number of dimensions of the text embeddings. If no feature extractor should be applied set NULL.

target_levels

vector containing the levels (categories or classes) within the target data. Please note that order matters. For ordinal data please ensure that the levels are sorted correctly with later levels indicating a higher category/class. For nominal data the order does not matter.

metric_type

string Type of metric used for calculating the distance. Allowed values: 'Euclidean'

shared_feat_layer

bool If TRUE all streams use the same feature layer. If FALSE all streams use their own feature layer.

feat_act_fct

string Activation function for all layers. Allowed values: 'ELU', 'LeakyReLU', 'ReLU', 'GELU', 'Sigmoid', 'Tanh', 'PReLU'

feat_size

int Number of neurons for each dense layer. Allowed values: 2 <= x

feat_bias

bool If TRUE a bias term is added to all layers. If FALSE no bias term is added to the layers.

feat_dropout

double determining the dropout for the dense projection of the feature layer. Allowed values: 0 <= x <= 0.6

feat_parametrizations

string Re-Parametrizations of the weights of layers. Allowed values: 'None', 'OrthogonalWeights', 'WeightNorm', 'SpectralNorm'

feat_normalization_type

string Type of normalization applied to all layers and stack layers. Allowed values: 'LayerNorm', 'None'

ng_conv_act_fct

string Activation function for all layers. Allowed values: 'ELU', 'LeakyReLU', 'ReLU', 'GELU', 'Sigmoid', 'Tanh', 'PReLU'

ng_conv_n_layers

int determining how many times the n-gram layers should be added to the network. Allowed values: 0 <= x

ng_conv_ks_min

int determining the minimal window size for n-grams. Allowed values: 2 <= x

ng_conv_ks_max

int determining the maximal window size for n-grams. Allowed values: 2 <= x

ng_conv_bias

bool If TRUE a bias term is added to all layers. If FALSE no bias term is added to the layers.

ng_conv_dropout

double determining the dropout for n-gram convolution layers. Allowed values: 0 <= x <= 0.6

ng_conv_parametrizations

string Re-Parametrizations of the weights of layers. Allowed values: 'None', 'OrthogonalWeights', 'WeightNorm', 'SpectralNorm'

ng_conv_normalization_type

string Type of normalization applied to all layers and stack layers. Allowed values: 'LayerNorm', 'None'

ng_conv_residual_type

string Type of residual connenction for all layers and stack of layers. Allowed values: 'ResidualGate', 'Addition', 'None'

dense_act_fct

string Activation function for all layers. Allowed values: 'ELU', 'LeakyReLU', 'ReLU', 'GELU', 'Sigmoid', 'Tanh', 'PReLU'

dense_n_layers

int Number of dense layers. Allowed values: 0 <= x

dense_dropout

double determining the dropout between dense layers. Allowed values: 0 <= x <= 0.6

dense_bias

bool If TRUE a bias term is added to all layers. If FALSE no bias term is added to the layers.

dense_parametrizations

string Re-Parametrizations of the weights of layers. Allowed values: 'None', 'OrthogonalWeights', 'WeightNorm', 'SpectralNorm'

dense_normalization_type

string Type of normalization applied to all layers and stack layers. Allowed values: 'LayerNorm', 'None'

dense_residual_type

string Type of residual connenction for all layers and stack of layers. Allowed values: 'ResidualGate', 'Addition', 'None'

rec_act_fct

string Activation function for all layers. Allowed values: 'Tanh'

rec_n_layers

int Number of recurrent layers. Allowed values: 0 <= x

rec_type

string Type of the recurrent layers. rec_type='GRU' for Gated Recurrent Unit and rec_type='LSTM' for Long Short-Term Memory. Allowed values: 'GRU', 'LSTM'

rec_bidirectional

bool If TRUE a bidirectional version of the recurrent layers is used.

rec_dropout

double determining the dropout between recurrent layers. Allowed values: 0 <= x <= 0.6

rec_bias

bool If TRUE a bias term is added to all layers. If FALSE no bias term is added to the layers.

rec_parametrizations

string Re-Parametrizations of the weights of layers. Allowed values: 'None'

rec_normalization_type

string Type of normalization applied to all layers and stack layers. Allowed values: 'LayerNorm', 'None'

rec_residual_type

string Type of residual connenction for all layers and stack of layers. Allowed values: 'ResidualGate', 'Addition', 'None'

tf_act_fct

string Activation function for all layers. Allowed values: 'ELU', 'LeakyReLU', 'ReLU', 'GELU', 'Sigmoid', 'Tanh', 'PReLU'

tf_dense_dim

int determining the size of the projection layer within a each transformer encoder. Allowed values: 1 <= x

tf_n_layers

int determining how many times the encoder should be added to the network. Allowed values: 0 <= x

tf_dropout_rate_1

double determining the dropout after the attention mechanism within the transformer encoder layers. Allowed values: 0 <= x <= 0.6

tf_dropout_rate_2

double determining the dropout for the dense projection within the transformer encoder layers. Allowed values: 0 <= x <= 0.6

tf_attention_type

string Choose the attention type. Allowed values: 'Fourier', 'MultiHead'

tf_positional_type

string Type of processing positional information. Allowed values: 'absolute'

tf_num_heads

int determining the number of attention heads for a self-attention layer. Only relevant if attention_type='multihead' Allowed values: 0 <= x

tf_bias

bool If TRUE a bias term is added to all layers. If FALSE no bias term is added to the layers.

tf_parametrizations

string Re-Parametrizations of the weights of layers. Allowed values: 'None', 'OrthogonalWeights', 'WeightNorm', 'SpectralNorm'

tf_normalization_type

string Type of normalization applied to all layers and stack layers. Allowed values: 'LayerNorm', 'None'

tf_residual_type

string Type of residual connenction for all layers and stack of layers. Allowed values: 'ResidualGate', 'Addition', 'None'

merge_attention_type

string Choose the attention type. Allowed values: 'Fourier', 'MultiHead'

merge_num_heads

int determining the number of attention heads for a self-attention layer. Only relevant if attention_type='multihead' Allowed values: 0 <= x

merge_normalization_type

string Type of normalization applied to all layers and stack layers. Allowed values: 'LayerNorm', 'None'

merge_pooling_features

int Number of features to be extracted at the end of the model. Allowed values: 1 <= x

merge_pooling_type

string Type of extracting intermediate features. Allowed values: 'Max', 'Min', 'MinMax'

embedding_dim

int determining the number of dimensions for the embedding. Allowed values: 2 <= x

Returns

Function does nothing return. It modifies the current object.


Method clone()

The objects of this class are cloneable with this method.

Usage

TEClassifierParallelPrototype$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.