Skip to contents

This function creates synthetic cases for balancing the training with an object of the class TextEmbeddingClassifierNeuralNet.

Usage

get_synthetic_cases(
  embedding,
  times,
  features,
  target,
  method = c("smote"),
  max_k = 6
)

Arguments

embedding

Named data.frame containing the text embeddings. In most cases, this object is taken from EmbeddedText$embeddings.

times

int for the number of sequences/times.

features

int for the number of features within each sequence.

target

Named factor containing the labels of the corresponding embeddings.

method

vector containing strings of the requested methods for generating new cases. Currently "smote","dbsmote", and "adas" from the package smotefamily are available.

max_k

int The maximum number of nearest neighbors during sampling process.

Value

list with the following components.

  • syntetic_embeddings: Named data.frame containing the text embeddings of the synthetic cases.

  • syntetic_targets Named factor containing the labels of the corresponding synthetic cases.

  • n_syntetic_units table showing the number of synthetic cases for every label/category.