Function for creating synthetic cases in order to balance the data for training with TextEmbeddingClassifierNeuralNet. This is an auxiliary function for use with get_synthetic_cases to allow parallel computations.
Arguments
- embedding
Named
data.frame
containing the text embeddings. In most cases this object is taken from EmbeddedText$embeddings.- target
Named
factor
containing the labels/categories of the corresponding cases.- k
int
The number of nearest neighbors during sampling process.- max_k
int
The maximum number of nearest neighbors during sampling process.- method
vector
containing strings of the requested methods for generating new cases. Currently "smote","dbsmote", and "adas" from the package smotefamily are available.- cat
string
The category for which new cases should be created.- cat_freq
Object of class
"table"
containing the absolute frequencies of every category/label.
Value
Returns a list
which contains the text embeddings of the
new synthetic cases as a named data.frame
and their labels as a named
factor
.
See also
Other Auxiliary Functions:
array_to_matrix()
,
calc_standard_classification_measures()
,
check_embedding_models()
,
clean_pytorch_log_transformers()
,
create_iota2_mean_object()
,
generate_id()
,
get_coder_metrics()
,
get_folds()
,
get_n_chunks()
,
get_stratified_train_test_split()
,
get_synthetic_cases()
,
get_train_test_split()
,
is.null_or_na()
,
matrix_to_array_c()
,
split_labeled_unlabeled()
,
summarize_tracked_sustainability()
,
to_categorical_c()