Options / Priors / Keep

Options

keyATM takes various options. You can set options through a list.

my_options <- list(
  seed          = NULL, # automatically generate random seed
  iterations    = 1500,
  verbose       = FALSE,
  llk_per       = 10,
  use_weights   = TRUE,
  weights_type  = "information-theory",
  prune         = TRUE,
  thinning      = 5,
  store_theta   = FALSE,
  store_pi      = FALSE,
  parallel_init = FALSE,
  resume        = NULL
)

out <- keyATM(
  docs      = keyATM_docs, # text input
  regular_k = 3, # number of regular topics
  keywords  = bills_keywords, # keywords
  model     = "basic", # select the model
  options   = my_options, # use your own option list
  keep      = c("Z") # keep a specific object in the output
)

`seed`

This is a seed used to generate random numbers. The same seed is used for initialization and fitting the model (set.seed() is executed before both initialization and fitting). If you do not provide seed, keyATM randomly selects a seed for you.

`iterations`

The default value is 1500.

`verbose`

Default is FALSE. If it is TRUE, it shows values of log-likelihood and perplexity.

`llk_per`

keyATM calculates and stores the log-likelihood and perplexity. The default value is 10.

`use_weights`

The default value is TRUE (use weights). We follow the weighting Scheme in Wilson & Chew (2010). If you do not want to use weights, please set it to FALSE. Please check our paper for details.

`weights_type`

You can select one of four weights implemented in keyATM. The default is information-theory. keyATM can construct weights from the inverse frequency of the words, inv-frequency. There are normalized version of two: information-theory-normalized and inv-freq-normalized.

`prune`

Prune keywords that do not appear in the documents.

`thinning`

The default value is 5 and keyATM keeps every $5$ th draw from the sampling.

`store_theta`

The default value is FALSE. Storing the value of thetas allows the calculation of credible intervals.

`store_pi`

The default value is FALSE. Storing the value of $\pi_k$ for all $k$ (the probability that the topic $k$ use keyword topic-word distribution).

`parallel_init`

Parallelize processes to speed up initialization. Default is FALSE. Note that even if you use the same seed, the initialization will become different between with and without parallelization.

`resume`

The resume argument is used to save and load the intermediate results of the keyATM fitting process, allowing you to resume the fitting from a previous state. The default value is NULL (do not resume).

Priors

You can manually set priors, but we do not recommend doing it unless you understandd the consequences.

`alpha`

Prior for the document-topic distribution. This option only works for base model.

`beta`

Prior for the topic-word distribution.

`beta_s`

Prior for the keyword topic-word distribution.

`gamma`

Prior for the probability of using keywords in a topic.

`eta_1`, `eta_2`, `eta_1_regular`, and `eta_2_regular`

Hyperprior for the alpha (used in the base and the dynamic models).

Keep

You can specify which output to keep (cf. Calculating heterogeneity).

Options

seed

iterations

verbose

llk_per

use_weights

weights_type

prune

thinning

store_theta

store_pi

parallel_init

resume