Skip to contents

This object contains public and private methods which may be useful for every large data sets. Objects of this class are not intended to be used directly. LargeDataSetForTextEmbeddings or LargeDataSetForText.

Value

Returns a new object of this class.

Methods


Method n_cols()

Number of columns in the data set.

Usage

LargeDataSetBase$n_cols()

Returns

int describing the number of columns in the data set.


Method n_rows()

Number of rows in the data set.

Usage

LargeDataSetBase$n_rows()

Returns

int describing the number of rows in the data set.


Method get_colnames()

Get names of the columns in the data set.

Usage

LargeDataSetBase$get_colnames()

Returns

vector containing the names of the columns as strings.


Method get_dataset()

Get data set.

Usage

LargeDataSetBase$get_dataset()

Returns

Returns the data set of this object as an object of class datasets.arrow_dataset.Dataset.


Method reduce_to_unique_ids()

Reduces the data set to a data set containing only unique ids. In the case an id exists multiple times in the data set the first case remains in the data set. The other cases are dropped.

Attention Calling this method will change the data set in place.

Usage

LargeDataSetBase$reduce_to_unique_ids()

Returns

Method does not return anything. It changes the data set of this object in place.


Method select()

Returns a data set which contains only the cases belonging to the specific indices.

Usage

LargeDataSetBase$select(indicies)

Arguments

indicies

vector of int for selecting rows in the data set. Attention The indices are zero-based.

Returns

Returns a data set of class datasets.arrow_dataset.Dataset with the selected rows.


Method get_ids()

Get ids

Usage

LargeDataSetBase$get_ids()

Returns

Returns a vector containing the ids of every row as strings.


Method save()

Saves a data set to disk.

Usage

LargeDataSetBase$save(dir_path, folder_name, create_dir = TRUE)

Arguments

dir_path

Path where to store the data set.

folder_name

string Name of the folder for storing the data set.

create_dir

bool If True the directory will be created if it does not exist.

Returns

Method does not return anything. It write the data set to disk.


Method load_from_disk()

loads an object of class LargeDataSetBase from disk 'and updates the object to the current version of the package.

Usage

LargeDataSetBase$load_from_disk(dir_path)

Arguments

dir_path

Path where the data set set is stored.

Returns

Method does not return anything. It loads an object from disk.


Method load()

Loads a data set from disk.

Usage

LargeDataSetBase$load(dir_path)

Arguments

dir_path

Path where the data set is stored.

Returns

Method does not return anything. It loads a data set from disk.


Method get_all_fields()

Return all fields.

Usage

LargeDataSetBase$get_all_fields()

Returns

Method returns a list containing all public and private fields of the object.


Method clone()

The objects of this class are cloneable with this method.

Usage

LargeDataSetBase$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.