Skip to contents

This function creates synthetic cases for balancing the training with classifier models.

Usage

get_synthetic_cases_from_matrix(
  matrix_form,
  times,
  features,
  target,
  sequence_length,
  method = c("knnor"),
  min_k = 1,
  max_k = 6
)

Arguments

matrix_form

Named matrix containing the text embeddings in a matrix form.

times

int for the number of sequences/times.

features

int for the number of features within each sequence.

target

Named factor containing the labels of the corresponding embeddings.

sequence_length

int Length of the text embedding sequences.

method

vector containing strings of the requested methods for generating new cases. Currently "knnor" from this package is available.

min_k

int The minimal number of nearest neighbors during sampling process.

max_k

int The maximum number of nearest neighbors during sampling process.

Value

list with the following components:

  • syntetic_embeddings: Named data.frame containing the text embeddings of the synthetic cases.

  • syntetic_targets: Named factor containing the labels of the corresponding synthetic cases.

  • n_syntetic_units: table showing the number of synthetic cases for every label/category.