Read texts and create a keyATM_docs
object, which is a list of texts.
keyATM_read(
texts,
encoding = "UTF-8",
check = TRUE,
keep_docnames = FALSE,
split = 0
)
input. keyATM takes a quanteda dfm (dgCMatrix), data.frame, tibble tbl_df, or a vector of file paths.
character. Only used when texts
is a vector of file paths. Default is UTF-8
.
logical. If TRUE
, check whether there is anything wrong with the structure of texts. Default is TRUE
.
logical. If TRUE
, it keeps the document names in a quanteda dfm. Default is FALSE
.
numeric. This option works only with a quanteda dfm. It creates a two subset of the dfm by randomly splitting each document (i.e., the total number of documents is the same between two subsets). This option specifies the split proportion. Default is 0
.
a keyATM_docs object. The first element is a list whose elements are split texts. The length of the list equals to the number of documents.
if (FALSE) {
# Use quanteda dfm
keyATM_docs <- keyATM_read(texts = quanteda_dfm)
# Use data.frame or tibble (texts should be stored in a column named `text`)
keyATM_docs <- keyATM_read(texts = data_frame_object)
keyATM_docs <- keyATM_read(texts = tibble_object)
# Use a vector that stores full paths to the text files
files <- list.files(doc_folder, pattern = "*.txt", full.names = TRUE)
keyATM_docs <- keyATM_read(texts = files)
}