This function creates synthetic cases for balancing the training with an object of the class TextEmbeddingClassifierNeuralNet.
Usage
get_synthetic_cases(
embedding,
times,
features,
target,
method = c("smote"),
max_k = 6
)
Arguments
- embedding
Named
data.frame
containing the text embeddings. In most cases, this object is taken from EmbeddedText$embeddings.- times
int
for the number of sequences/times.- features
int
for the number of features within each sequence.- target
Named
factor
containing the labels of the corresponding embeddings.- method
vector
containing strings of the requested methods for generating new cases. Currently "smote","dbsmote", and "adas" from the package smotefamily are available.- max_k
int
The maximum number of nearest neighbors during sampling process.
Value
list
with the following components.
syntetic_embeddings:
Nameddata.frame
containing the text embeddings of the synthetic cases.syntetic_targets
Namedfactor
containing the labels of the corresponding synthetic cases.n_syntetic_units
table
showing the number of synthetic cases for every label/category.
See also
Other Auxiliary Functions:
array_to_matrix()
,
calc_standard_classification_measures()
,
check_embedding_models()
,
clean_pytorch_log_transformers()
,
create_iota2_mean_object()
,
create_synthetic_units()
,
generate_id()
,
get_coder_metrics()
,
get_folds()
,
get_n_chunks()
,
get_stratified_train_test_split()
,
get_train_test_split()
,
is.null_or_na()
,
matrix_to_array_c()
,
split_labeled_unlabeled()
,
summarize_tracked_sustainability()
,
to_categorical_c()