Function for creating synthetic cases in order to balance the data for training with TEClassifierRegular or TEClassifierProtoNet]. This is an auxiliary function for use with get_synthetic_cases_from_matrix to allow parallel computations.
Usage
create_synthetic_units_from_matrix(
matrix_form,
target,
required_cases,
k,
method,
cat,
k_s,
max_k
)
Arguments
- matrix_form
Named
matrix
containing the text embeddings in matrix form. In most cases this object is taken from EmbeddedText$embeddings.- target
Named
factor
containing the labels/categories of the corresponding cases.- required_cases
int
Number of cases necessary to fill the gab between the frequency of the class under investigation and the major class.- k
int
The number of nearest neighbors during sampling process.- method
vector
containing strings of the requested methods for generating new cases. Currently "smote","dbsmote", and "adas" from the package smotefamily are available.- cat
string
The category for which new cases should be created.- k_s
int
Number of ks in the complete generation process.- max_k
int
The maximum number of nearest neighbors during sampling process.
Value
Returns a list
which contains the text embeddings of the new synthetic cases as a named data.frame
and
their labels as a named factor
.
See also
Other data_management_utils:
get_n_chunks()
,
get_synthetic_cases_from_matrix()