Skip to contents

Function for creating synthetic cases in order to balance the data for training with TextEmbeddingClassifierNeuralNet. This is an auxiliary function for use with get_synthetic_cases to allow parallel computations.

Usage

create_synthetic_units(embedding, target, k, max_k, method, cat, cat_freq)

Arguments

embedding

Named data.frame containing the text embeddings. In most cases this object is taken from EmbeddedText$embeddings.

target

Named factor containing the labels/categories of the corresponding cases.

k

int The number of nearest neighbors during sampling process.

max_k

int The maximum number of nearest neighbors during sampling process.

method

vector containing strings of the requested methods for generating new cases. Currently "smote","dbsmote", and "adas" from the package smotefamily are available.

cat

string The category for which new cases should be created.

cat_freq

Object of class "table" containing the absolute frequencies of every category/label.

Value

Returns a list which contains the text embeddings of the new synthetic cases as a named data.frame and their labels as a named factor.