Skip to contents

Object of class R6 which stores the text embeddings generated by an object of class TextEmbeddingModel via the method embed().

Value

Returns an object of class EmbeddedText. These objects are used for storing and managing the text embeddings created with objects of class TextEmbeddingModel. Objects of class EmbeddedText serve as input for classifiers of class TextEmbeddingClassifierNeuralNet. The main aim of this class is to provide a structured link between embedding models and classifiers. Since objects of this class save information on the text embedding model that created the text embedding it ensures that only embedding generated with same embedding model are combined. Furthermore, the stored information allows classifiers to check if embeddings of the correct text embedding model are used for training and predicting.

See also

Other Text Embedding: TextEmbeddingModel, combine_embeddings()

Public fields

embeddings

('data.frame()')
data.frame containing the text embeddings for all chunks. Documents are in the rows. Embedding dimensions are in the columns.

Methods


Method new()

Creates a new object representing text embeddings.

Usage

EmbeddedText$new(
  model_name = NA,
  model_label = NA,
  model_date = NA,
  model_method = NA,
  model_version = NA,
  model_language = NA,
  param_seq_length = NA,
  param_chunks = NULL,
  param_overlap = NULL,
  param_emb_layer_min = NULL,
  param_emb_layer_max = NULL,
  param_emb_pool_type = NULL,
  param_aggregation = NULL,
  embeddings
)

Arguments

model_name

string Name of the model that generates this embedding.

model_label

string Label of the model that generates this embedding.

model_date

string Date when the embedding generating model was created.

model_method

string Method of the underlying embedding model.

model_version

string Version of the model that generated this embedding.

model_language

string Language of the model that generated this embedding.

param_seq_length

int Maximum number of tokens that processes the generating model for a chunk.

param_chunks

int Maximum number of chunks which are supported by the generating model.

param_overlap

int Number of tokens that were added at the beginning of the sequence for the next chunk by this model.

param_emb_layer_min

int or string determining the first layer to be included in the creation of embeddings.

param_emb_layer_max

int or string determining the last layer to be included in the creation of embeddings.

param_emb_pool_type

string determining the method for pooling the token embeddings within each layer.

param_aggregation

string Aggregation method of the hidden states. Deprecated. Only included for backward compatibility.

embeddings

data.frame containing the text embeddings.

Returns

Returns an object of class EmbeddedText which stores the text embeddings produced by an objects of class TextEmbeddingModel. The object serves as input for objects of class TextEmbeddingClassifierNeuralNet.


Method get_model_info()

Method for retrieving information about the model that generated this embedding.

Usage

EmbeddedText$get_model_info()

Returns

list contains all saved information about the underlying text embedding model.


Method get_model_label()

Method for retrieving the label of the model that generated this embedding.

Usage

EmbeddedText$get_model_label()

Returns

string Label of the corresponding text embedding model


Method clone()

The objects of this class are cloneable with this method.

Usage

EmbeddedText$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.