Function for creating a first draft of a vocabulary This function creates a list of tokens which refer to specific universal part-of-speech tags (UPOS) and provides the corresponding lemmas.

Usage

bow_pp_create_vocab_draft(
  path_language_model,
  data,
  upos = c("NOUN", "ADJ", "VERB"),
  label_language_model = NULL,
  language = NULL,
  chunk_size = 100,
  trace = TRUE
)

Arguments

path_language_model: string Path to a udpipe language model that should be used for tagging and lemmatization.
data: vector containing the raw texts.
upos: vector containing the universal part-of-speech tags which should be used to build the vocabulary.
label_language_model: string Label for the udpipe language model used.
language: string Name of the language (e.g., English, German)
chunk_size: int Number of raw texts which should be processed at once.
trace: bool TRUE if information about the progress should be printed to console.

Value

list with the following components.

vocab: data.frame containing the tokens, lemmas, tokens in lower case, and lemmas in lower case.
ud_language_model udpipe language model that is used for tagging.
label_language_model Label of the udpipe language model.
language Language of the raw texts.
upos Used univerisal part-of-speech tags.
n_sentence int Estimated number of sentences in the raw texts.
n_token int Estimated number of tokens in the raw texts.
n_document_segments int Estimated number of document segments/raw texts.

Note

A list of possible tags can be found here: https://universaldependencies.org/u/pos/index.html.

A huge number of models can be found here: https://ufal.mff.cuni.cz/udpipe/2/models.

Function for creating a first draft of a vocabulary This function creates a list of tokens which refer to specific universal part-of-speech tags (UPOS) and provides the corresponding lemmas.

Usage

Arguments

Value

Note

See also