Skip to contents

This function creates synthetic cases for balancing the training with an object of the class TEClassifierRegular or TEClassifierProtoNet.

Usage

get_synthetic_cases_from_matrix(
  matrix_form,
  times,
  features,
  target,
  sequence_length,
  method = c("smote"),
  min_k = 1,
  max_k = 6
)

Arguments

matrix_form

Named matrix containing the text embeddings in a matrix form.

times

int for the number of sequences/times.

features

int for the number of features within each sequence.

target

Named factor containing the labels of the corresponding embeddings.

sequence_length

int Length of the text embedding sequences.

method

vector containing strings of the requested methods for generating new cases. Currently "smote", "dbsmote", and "adas" from the package smotefamily are available.

min_k

int The minimal number of nearest neighbors during sampling process.

max_k

int The maximum number of nearest neighbors during sampling process.

Value

list with the following components:

  • syntetic_embeddings: Named data.frame containing the text embeddings of the synthetic cases.

  • syntetic_targets: Named factor containing the labels of the corresponding synthetic cases.

  • n_syntetic_units: table showing the number of synthetic cases for every label/category.

See also

Other data_management_utils: create_synthetic_units_from_matrix(), get_n_chunks()