Create synthetic units

Function for creating synthetic cases in order to balance the data for training with TextEmbeddingClassifierNeuralNet. This is an auxiliary function for use with get_synthetic_cases to allow parallel computations.

Usage

create_synthetic_units(embedding, target, k, max_k, method, cat, cat_freq)

Arguments

embedding: Named data.frame containing the text embeddings. In most cases this object is taken from EmbeddedText$embeddings.
target: Named factor containing the labels/categories of the corresponding cases.
k: int The number of nearest neighbors during sampling process.
max_k: int The maximum number of nearest neighbors during sampling process.
method: vector containing strings of the requested methods for generating new cases. Currently "smote","dbsmote", and "adas" from the package smotefamily are available.
cat: string The category for which new cases should be created.
cat_freq: Object of class "table" containing the absolute frequencies of every category/label.

Value

Returns a list which contains the text embeddings of the new synthetic cases as a named data.frame and their labels as a named factor.

Usage

Arguments

Value

See also