Create cross-validation samples

Function creates cross-validation samples and ensures that the relative frequency for every category/label within a fold equals the relative frequency of the category/label within the initial data.

Usage

get_folds(target, k_folds)

Arguments

target: Named factor containing the relevant labels/categories. Missing cases should be declared with NA.
k_folds: int number of folds.

Value

Return a list with the following components:

val_sample: vector of strings containing the names of cases of the validation sample.
train_sample: vector of strings containing the names of cases of the train sample.
n_folds: int Number of realized folds.
unlabeled_cases: vector of strings containing the names of the unlabeled cases.

Note

The parameter target allows cases with missing categories/labels. These should be declared with NA. All these cases are ignored for creating the different folds. Their names are saved within the component unlabeled_cases. These cases can be used for Pseudo Labeling.

the function checks the absolute frequencies of every category/label. If the absolute frequency is not sufficient to ensure at least four cases in every fold, the number of folds is adjusted. In these cases, a warning is printed to the console. At least four cases per fold are necessary to ensure that the training of TextEmbeddingClassifierNeuralNet works well with all options turned on.

Usage

Arguments

Value

Note

See also