Skip to contents

Function for creating synthetic cases in order to balance the data for training with TEClassifierRegular or TEClassifierProtoNet]. This is an auxiliary function for use with get_synthetic_cases_from_matrix to allow parallel computations.

Usage

create_synthetic_units_from_matrix(
  matrix_form,
  target,
  required_cases,
  k,
  method,
  cat,
  k_s,
  max_k
)

Arguments

matrix_form

Named matrix containing the text embeddings in matrix form. In most cases this object is taken from EmbeddedText$embeddings.

target

Named factor containing the labels/categories of the corresponding cases.

required_cases

int Number of cases necessary to fill the gab between the frequency of the class under investigation and the major class.

k

int The number of nearest neighbors during sampling process.

method

vector containing strings of the requested methods for generating new cases. Currently "smote","dbsmote", and "adas" from the package smotefamily are available.

cat

string The category for which new cases should be created.

k_s

int Number of ks in the complete generation process.

max_k

int The maximum number of nearest neighbors during sampling process.

Value

Returns a list which contains the text embeddings of the new synthetic cases as a named data.frame and their labels as a named factor.

See also

Other data_management_utils: get_n_chunks(), get_synthetic_cases_from_matrix()