Function creates cross-validation samples and ensures that the relative frequency for every category/label within a fold equals the relative frequency of the category/label within the initial data.
Arguments
- target
Named
factor
containing the relevant labels/categories. Missing cases should be declared withNA
.- k_folds
int
number of folds.
Value
Return a list
with the following components:
val_sample:
vector
ofstrings
containing the names of cases of the validation sample.train_sample:
vector
ofstrings
containing the names of cases of the train sample.n_folds:
int
Number of realized folds.unlabeled_cases:
vector
ofstrings
containing the names of the unlabeled cases.
Note
The parameter target
allows cases with missing categories/labels.
These should be declared with NA
. All these cases are ignored for creating the
different folds. Their names are saved within the component unlabeled_cases
.
These cases can be used for Pseudo Labeling.
the function checks the absolute frequencies of every category/label. If the absolute frequency is not sufficient to ensure at least four cases in every fold, the number of folds is adjusted. In these cases, a warning is printed to the console. At least four cases per fold are necessary to ensure that the training of TextEmbeddingClassifierNeuralNet works well with all options turned on.
See also
Other Auxiliary Functions:
array_to_matrix()
,
calc_standard_classification_measures()
,
check_embedding_models()
,
clean_pytorch_log_transformers()
,
create_iota2_mean_object()
,
create_synthetic_units()
,
generate_id()
,
get_coder_metrics()
,
get_n_chunks()
,
get_stratified_train_test_split()
,
get_synthetic_cases()
,
get_train_test_split()
,
is.null_or_na()
,
matrix_to_array_c()
,
split_labeled_unlabeled()
,
summarize_tracked_sustainability()
,
to_categorical_c()