keyATM main function

Fit keyATM models.

keyATM(
  docs,
  model,
  no_keyword_topics,
  keywords = list(),
  model_settings = list(),
  priors = list(),
  options = list(),
  keep = c()
)

Arguments

docs

texts read via keyATM_read().

model

keyATM model: base, covariates, and dynamic.

no_keyword_topics

the number of regular topics.

keywords

a list of keywords.

model_settings

a list of model specific settings (details are in the online documentation).

priors

a list of priors of parameters.

options

a list of options

seed: A numeric value for random seed. If it is not provided, the package randomly selects a seed.
iterations: An integer. Number of iterations. Default is 1500.
verbose: If TRUE, it prints loglikelihood and perplexity. Default is FALSE.
llk_per: An integer. If the value is j keyATM stores loglikelihood and perplexity every \(j\) iteration. Default value is 10 per iterations
use_weights: If TRUE use weight. Default is TRUE.
weights_type: There are four types of weights. Weights based on the information theory (information-theory) and inverse frequency (inv-freq) and normalized versions of them (information-theory-normalized and inv-freq-normalized). Default is information-theory.
prune: If TRUE rume keywords that do not appear in the corpus. Default is TRUE.
store_theta: If TRUE or 1, it stores \(\theta\) (document-topic distribution) for the iteration specified by thinning. Default is FALSE (same as 0).
store_pi: If TRUE or 1, it stores \(\pi\) (the probability of using keyword topic word distribution) for the iteration specified by thinning. Default is FALSE (same as 0).
thinning: An integer. If the value is j keyATM stores following parameters every j iteration. The default is 5.
- theta: For all models. If store_theta is TRUE document-level topic assignment is stored (sufficient statistics to calculate document-topic distributions theta).
- alpha: For the base and dynamic models. In the base model alpha is shared across all documents whereas each state has different alpha in the dynamic model.
- lambda: coefficients in the covariate model.
- R: For the dynamic model. The state each document belongs to.
- P: For the dynamic model. The state transition probability.
parallel_init: Parallelize processes to speed up initialization. Default is FALSE. Please plan() before use this feature.
resume: The resume argument is used to save and load the intermediate results of the keyATM fitting process, allowing you to resume the fitting from a previous state. The default value is NULL (do not resume).

keep

a vector of the names of elements you want to keep in output.

Value

A keyATM_output object containing:

keyword_k: number of keyword topics
no_keyword_topics: number of no-keyword topics
V: number of terms (number of unique words)
N: number of documents
model: the name of the model
theta: topic proportions for each document (document-topic distribution)
phi: topic specific word generation probabilities (topic-word distribution)
topic_counts: number of tokens assigned to each topic
word_counts: number of times each word type appears
doc_lens: length of each document in tokens
vocab: words in the vocabulary (a vector of unique words)
priors: priors
options: options
keywords_raw: specified keywords
model_fit: perplexity and log-likelihood
pi: estimated \(\pi\) (the probability of using keyword topic word distribution) for the last iteration
values_iter: values stored during iterations
kept_values: outputs you specified to store in keep option
information: information about the fitting

Examples

if (FALSE) { # \dontrun{
  library(keyATM)
  library(quanteda)
  data(keyATM_data_bills)
  bills_keywords <- keyATM_data_bills$keywords
  bills_dfm <- keyATM_data_bills$doc_dfm  # quanteda dfm object
  keyATM_docs <- keyATM_read(bills_dfm)

  # keyATM Base
  out <- keyATM(docs = keyATM_docs, model = "base",
                no_keyword_topics = 5, keywords = bills_keywords)

  # Visit our website for full examples: https://keyatm.github.io/keyATM/
} # }

Arguments

Value

See also

Examples