Text embedding classifier with a neural net
Source:R/obj_TEClassifierSequential.R
TEClassifierSequential.RdClassification Type
This is a probability classifier that predicts a probability distribution for different classes/categories. This is the standard case most common in literature.
Sequential Core Architecture
This model is based on a sequential architecture. The input is passed to a specific number of layers step by step. All layers are grouped by their kind into stacks.
Transformer Encoder Layers
Description
The transformer encoder layers follow the structure of the encoder layers used in transformer models. A single layer is designed as described by Chollet, Kalinowski, and Allaire (2022, p. 373) with the exception that single components of the layers (such as the activation function, the kind of residual connection, the kind of normalization or the kind of attention) can be customized. All parameters with the prefix tf_ can be used to configure this layer.
Feature Layer
Description
The feature layer is a dense layer that can be used to increase or decrease the number of features of the input data before passing the data into your model. The aim of this layer is to increase or reduce the complexity of the data for your model. The output size of this layer determines the number of features for all following layers. In the special case that the requested number of features equals the number of features of the text embeddings this layer is reduced to a dropout layer with masking capabilities. All parameters with the prefix feat_ can be used to configure this layer.
Dense Layers
Description
A fully connected layer. The layer is applied to every step of a sequence. All parameters with the prefix dense_ can be used to configure this layer.
Multiple N-Gram Layers
Description
This type of layer focuses on sub-sequence and performs an 1d convolutional operation. On a word and token level these sub-sequences can be interpreted as n-grams (Jacovi, Shalom & Goldberg 2018). The convolution is done across all features. The number of filters equals the number of features of the input tensor. Thus, the shape of the tensor is retained (Pham, Kruszewski & Boleda 2016).
The layer is able to consider multiple n-grams at the same time. In this case the convolution of the n-grams is done seprately and the resulting tensors are concatenated along the feature dimension. The number of filters for every n-gram is set to num_features/num_n-grams. Thus, the resulting tensor has the same shape as the input tensor.
Sub-sequences that are masked in the input are also masked in the output.
The output of this layer can be understand as the results of the n-gram filters. Stacking this layer allows the model to perform n-gram detection of n-grams (meta perspective). All parameters with the prefix ng_conv_ can be used to configure this layer.
Recurrent Layers
Description
A regular recurrent layer either as Gated Recurrent Unit (GRU) or Long Short-Term Memory (LSTM) layer. Uses PyTorchs implementation. All parameters with the prefix rec_ can be used to configure this layer.
Classifiction Pooling Layer
Description
Layer transforms sequences into a lower dimensional space that can be passed to dense layers. It performs two types of pooling. First, it extractes features across the time dimension selecting the maximal and/or minimal features. Second, it performs pooling over the remaining features selecting a speficifc number of the heighest and/or lowest features.
In the case of selecting the minmal and maximal features at the same time the minmal features are concatenated to the tensor of the maximal features resulting the in the shape $(Batch, Times, 2*Features)$ at the end of the first step. In the second step the number of requested features is halved. The first half is used for the maximal features and the second for the minimal features. All parameters with the prefix cls_pooling_ can be used to configure this layer.
Training and Prediction
For the creation and training of a classifier an object of class EmbeddedText or LargeDataSetForTextEmbeddings on the one hand and a factor on the other hand are necessary.
The object of class EmbeddedText or LargeDataSetForTextEmbeddings contains the numerical text representations (text embeddings) of the raw texts generated by an object of class TextEmbeddingModel. For supporting large data sets it is recommended to use LargeDataSetForTextEmbeddings instead of EmbeddedText.
The factor contains the classes/categories for every text. Missing values (unlabeled cases) are supported and can
be used for pseudo labeling.
For predictions an object of class EmbeddedText or LargeDataSetForTextEmbeddings has to be used which was created with the same TextEmbeddingModel as for training.
See also
Other Classification:
TEClassifierParallel,
TEClassifierParallelPrototype,
TEClassifierProtoNet,
TEClassifierRegular,
TEClassifierSequentialPrototype
Super classes
aifeducation::AIFEMaster -> aifeducation::AIFEBaseModel -> aifeducation::ModelsBasedOnTextEmbeddings -> aifeducation::ClassifiersBasedOnTextEmbeddings -> aifeducation::TEClassifiersBasedOnRegular -> TEClassifierSequential
Methods
Inherited methods
aifeducation::AIFEMaster$get_all_fields()aifeducation::AIFEMaster$get_documentation_license()aifeducation::AIFEMaster$get_ml_framework()aifeducation::AIFEMaster$get_model_config()aifeducation::AIFEMaster$get_model_description()aifeducation::AIFEMaster$get_model_info()aifeducation::AIFEMaster$get_model_license()aifeducation::AIFEMaster$get_package_versions()aifeducation::AIFEMaster$get_private()aifeducation::AIFEMaster$get_publication_info()aifeducation::AIFEMaster$get_sustainability_data()aifeducation::AIFEMaster$is_configured()aifeducation::AIFEMaster$is_trained()aifeducation::AIFEMaster$set_documentation_license()aifeducation::AIFEMaster$set_model_description()aifeducation::AIFEMaster$set_model_license()aifeducation::AIFEMaster$set_publication_info()aifeducation::AIFEBaseModel$count_parameter()aifeducation::ModelsBasedOnTextEmbeddings$get_text_embedding_model()aifeducation::ModelsBasedOnTextEmbeddings$get_text_embedding_model_name()aifeducation::ClassifiersBasedOnTextEmbeddings$adjust_target_levels()aifeducation::ClassifiersBasedOnTextEmbeddings$check_embedding_model()aifeducation::ClassifiersBasedOnTextEmbeddings$check_feature_extractor_object_type()aifeducation::ClassifiersBasedOnTextEmbeddings$load_from_disk()aifeducation::ClassifiersBasedOnTextEmbeddings$plot_coding_stream()aifeducation::ClassifiersBasedOnTextEmbeddings$plot_training_history()aifeducation::ClassifiersBasedOnTextEmbeddings$predict()aifeducation::ClassifiersBasedOnTextEmbeddings$requires_compression()aifeducation::ClassifiersBasedOnTextEmbeddings$save()aifeducation::TEClassifiersBasedOnRegular$train()
Method configure()
Creating a new instance of this class.
Usage
TEClassifierSequential$configure(
name = NULL,
label = NULL,
text_embeddings = NULL,
feature_extractor = NULL,
target_levels = NULL,
skip_connection_type = "ResidualGate",
cls_pooling_features = NULL,
cls_pooling_type = "MinMax",
feat_act_fct = "ELU",
feat_size = 50L,
feat_bias = TRUE,
feat_dropout = 0,
feat_parametrizations = "None",
feat_normalization_type = "LayerNorm",
ng_conv_act_fct = "ELU",
ng_conv_n_layers = 1L,
ng_conv_ks_min = 2L,
ng_conv_ks_max = 4L,
ng_conv_bias = FALSE,
ng_conv_dropout = 0.1,
ng_conv_parametrizations = "None",
ng_conv_normalization_type = "LayerNorm",
ng_conv_residual_type = "ResidualGate",
dense_act_fct = "ELU",
dense_n_layers = 1,
dense_dropout = 0.5,
dense_bias = FALSE,
dense_parametrizations = "None",
dense_normalization_type = "LayerNorm",
dense_residual_type = "ResidualGate",
rec_act_fct = "Tanh",
rec_n_layers = 1L,
rec_type = "GRU",
rec_bidirectional = FALSE,
rec_dropout = 0.2,
rec_bias = FALSE,
rec_parametrizations = "None",
rec_normalization_type = "LayerNorm",
rec_residual_type = "ResidualGate",
tf_act_fct = "ELU",
tf_dense_dim = 50L,
tf_n_layers = 1L,
tf_dropout_rate_1 = 0.1,
tf_dropout_rate_2 = 0.5,
tf_attention_type = "MultiHead",
tf_positional_type = "absolute",
tf_num_heads = 1,
tf_bias = FALSE,
tf_parametrizations = "None",
tf_normalization_type = "LayerNorm",
tf_residual_type = "ResidualGate"
)Arguments
namestringName of the new model. Please refer to common name conventions. Free text can be used with parameterlabel. If set toNULLa unique ID is generated automatically. Allowed values: anylabelstringLabel for the new model. Here you can use free text. Allowed values: anytext_embeddingsEmbeddedText, LargeDataSetForTextEmbeddingsObject of class EmbeddedText or LargeDataSetForTextEmbeddings.feature_extractorTEFeatureExtractorObject of class TEFeatureExtractor which should be used in order to reduce the number of dimensions of the text embeddings. If no feature extractor should be applied setNULL.target_levelsvectorcontaining the levels (categories or classes) within the target data. Please note that order matters. For ordinal data please ensure that the levels are sorted correctly with later levels indicating a higher category/class. For nominal data the order does not matter.skip_connection_typestringType of residual connenction for all layers and stack of layers. Allowed values: 'ResidualGate', 'Addition', 'None'cls_pooling_featuresintNumber of features to be extracted at the end of the model. Allowed values:1 <= xcls_pooling_typestringType of extracting intermediate features. Allowed values: 'Max', 'Min', 'MinMax'feat_act_fctstringActivation function for all layers. Allowed values: 'ELU', 'LeakyReLU', 'ReLU', 'GELU', 'Sigmoid', 'Tanh', 'PReLU'feat_sizeintNumber of neurons for each dense layer. Allowed values:2 <= xfeat_biasboolIfTRUEa bias term is added to all layers. IfFALSEno bias term is added to the layers.feat_dropoutdoubledetermining the dropout for the dense projection of the feature layer. Allowed values:0 <= x <= 0.6feat_parametrizationsstringRe-Parametrizations of the weights of layers. Allowed values: 'None', 'OrthogonalWeights', 'WeightNorm', 'SpectralNorm'feat_normalization_typestringType of normalization applied to all layers and stack layers. Allowed values: 'LayerNorm', 'None'ng_conv_act_fctstringActivation function for all layers. Allowed values: 'ELU', 'LeakyReLU', 'ReLU', 'GELU', 'Sigmoid', 'Tanh', 'PReLU'ng_conv_n_layersintdetermining how many times the n-gram layers should be added to the network. Allowed values:0 <= xng_conv_ks_minintdetermining the minimal window size for n-grams. Allowed values:2 <= xng_conv_ks_maxintdetermining the maximal window size for n-grams. Allowed values:2 <= xng_conv_biasboolIfTRUEa bias term is added to all layers. IfFALSEno bias term is added to the layers.ng_conv_dropoutdoubledetermining the dropout for n-gram convolution layers. Allowed values:0 <= x <= 0.6ng_conv_parametrizationsstringRe-Parametrizations of the weights of layers. Allowed values: 'None', 'OrthogonalWeights', 'WeightNorm', 'SpectralNorm'ng_conv_normalization_typestringType of normalization applied to all layers and stack layers. Allowed values: 'LayerNorm', 'None'ng_conv_residual_typestringType of residual connenction for all layers and stack of layers. Allowed values: 'ResidualGate', 'Addition', 'None'dense_act_fctstringActivation function for all layers. Allowed values: 'ELU', 'LeakyReLU', 'ReLU', 'GELU', 'Sigmoid', 'Tanh', 'PReLU'dense_n_layersintNumber of dense layers. Allowed values:0 <= xdense_dropoutdoubledetermining the dropout between dense layers. Allowed values:0 <= x <= 0.6dense_biasboolIfTRUEa bias term is added to all layers. IfFALSEno bias term is added to the layers.dense_parametrizationsstringRe-Parametrizations of the weights of layers. Allowed values: 'None', 'OrthogonalWeights', 'WeightNorm', 'SpectralNorm'dense_normalization_typestringType of normalization applied to all layers and stack layers. Allowed values: 'LayerNorm', 'None'dense_residual_typestringType of residual connenction for all layers and stack of layers. Allowed values: 'ResidualGate', 'Addition', 'None'rec_act_fctstringActivation function for all layers. Allowed values: 'Tanh'rec_n_layersintNumber of recurrent layers. Allowed values:0 <= xrec_typestringType of the recurrent layers.rec_type='GRU'for Gated Recurrent Unit andrec_type='LSTM'for Long Short-Term Memory. Allowed values: 'GRU', 'LSTM'rec_bidirectionalboolIfTRUEa bidirectional version of the recurrent layers is used.rec_dropoutdoubledetermining the dropout between recurrent layers. Allowed values:0 <= x <= 0.6rec_biasboolIfTRUEa bias term is added to all layers. IfFALSEno bias term is added to the layers.rec_parametrizationsstringRe-Parametrizations of the weights of layers. Allowed values: 'None'rec_normalization_typestringType of normalization applied to all layers and stack layers. Allowed values: 'LayerNorm', 'None'rec_residual_typestringType of residual connenction for all layers and stack of layers. Allowed values: 'ResidualGate', 'Addition', 'None'tf_act_fctstringActivation function for all layers. Allowed values: 'ELU', 'LeakyReLU', 'ReLU', 'GELU', 'Sigmoid', 'Tanh', 'PReLU'tf_dense_dimintdetermining the size of the projection layer within a each transformer encoder. Allowed values:1 <= xtf_n_layersintdetermining how many times the encoder should be added to the network. Allowed values:0 <= xtf_dropout_rate_1doubledetermining the dropout after the attention mechanism within the transformer encoder layers. Allowed values:0 <= x <= 0.6tf_dropout_rate_2doubledetermining the dropout for the dense projection within the transformer encoder layers. Allowed values:0 <= x <= 0.6tf_attention_typestringChoose the attention type. Allowed values: 'Fourier', 'MultiHead'tf_positional_typestringType of processing positional information. Allowed values: 'None', 'absolute'tf_num_headsintdetermining the number of attention heads for a self-attention layer. Only relevant ifattention_type='multihead'Allowed values:0 <= xtf_biasboolIfTRUEa bias term is added to all layers. IfFALSEno bias term is added to the layers.tf_parametrizationsstringRe-Parametrizations of the weights of layers. Allowed values: 'None', 'OrthogonalWeights', 'WeightNorm', 'SpectralNorm'tf_normalization_typestringType of normalization applied to all layers and stack layers. Allowed values: 'LayerNorm', 'None'tf_residual_typestringType of residual connenction for all layers and stack of layers. Allowed values: 'ResidualGate', 'Addition', 'None'