Create synthetic cases for balancing training data
Source:R/utils_DataManager.R
get_synthetic_cases_from_matrix.Rd
This function creates synthetic cases for balancing the training with an object of the class TEClassifierRegular or TEClassifierProtoNet.
Usage
get_synthetic_cases_from_matrix(
matrix_form,
times,
features,
target,
sequence_length,
method = c("smote"),
min_k = 1,
max_k = 6
)
Arguments
- matrix_form
Named
matrix
containing the text embeddings in a matrix form.- times
int
for the number of sequences/times.- features
int
for the number of features within each sequence.- target
Named
factor
containing the labels of the corresponding embeddings.- sequence_length
int
Length of the text embedding sequences.- method
vector
containing strings of the requested methods for generating new cases. Currently "smote", "dbsmote", and "adas" from the package smotefamily are available.- min_k
int
The minimal number of nearest neighbors during sampling process.- max_k
int
The maximum number of nearest neighbors during sampling process.
Value
list
with the following components:
syntetic_embeddings
: Nameddata.frame
containing the text embeddings of the synthetic cases.syntetic_targets
: Namedfactor
containing the labels of the corresponding synthetic cases.n_syntetic_units
:table
showing the number of synthetic cases for every label/category.
See also
Other data_management_utils:
create_synthetic_units_from_matrix()
,
get_n_chunks()