Options

keyATM takes various options. You can set options through a list.

my_options <- list(
  seed          = NULL, # automatically generate random seed
  iterations    = 1500,
  verbose       = FALSE,
  llk_per       = 10,
  use_weights   = TRUE,
  weights_type  = "information-theory",
  prune         = TRUE,
  thinning      = 5,
  store_theta   = FALSE,
  store_pi      = FALSE,
  parallel_init = FALSE,
  resume        = NULL
)

out <- keyATM(
  docs      = keyATM_docs,    # text input
  regular_k = 3,              # number of regular topics
  keywords  = bills_keywords, # keywords
  model     = "basic",        # select the model
  options   = my_options,     # use your own option list
  keep      = c("Z")          # keep a specific object in the output
)

seed

This is a seed used to generate random numbers. The same seed is used for initialization and fitting the model (set.seed() is executed before both initialization and fitting). If you do not provide seed, keyATM randomly selects a seed for you.

iterations

The default value is 1500.

verbose

Default is FALSE. If it is TRUE, it shows values of log-likelihood and perplexity.

llk_per

keyATM calculates and stores the log-likelihood and perplexity. The default value is 10.

use_weights

The default value is TRUE (use weights). We follow the weighting Scheme in Wilson & Chew (2010). If you do not want to use weights, please set it to FALSE. Please check our paper for details.

weights_type

You can select one of four weights implemented in keyATM. The default is information-theory. keyATM can construct weights from the inverse frequency of the words, inv-frequency. There are normalized version of two: information-theory-normalized and inv-freq-normalized.

prune

Prune keywords that do not appear in the documents.

thinning

The default value is 5 and keyATM keeps every \(5\)th draw from the sampling.

store_theta

The default value is FALSE. Storing the value of thetas allows the calculation of credible intervals.

store_pi

The default value is FALSE. Storing the value of \(\pi_k\) for all \(k\) (the probability that the topic \(k\) use keyword topic-word distribution).

parallel_init

Parallelize processes to speed up initialization. Default is FALSE. Note that even if you use the same seed, the initialization will become different between with and without parallelization.

resume

The resume argument is used to save and load the intermediate results of the keyATM fitting process, allowing you to resume the fitting from a previous state. The default value is NULL (do not resume).

Priors

You can manually set priors, but we do not recommend doing it unless you understandd the consequences.

alpha

Prior for the document-topic distribution. This option only works for base model.

beta

Prior for the topic-word distribution.

beta_s

Prior for the keyword topic-word distribution.

gamma

Prior for the probability of using keywords in a topic.

eta_1, eta_2, eta_1_regular, and eta_2_regular

Hyperprior for the alpha (used in the base and the dynamic models).

Keep

You can specify which output to keep (cf. Calculating heterogeneity).