keyATM takes various options. You can set options through a list.

my_options <- list(seed          = NULL, # automatically generate random seed 
                   iterations    = 1500,
                   verbose       = FALSE,
                   llk_per       = 10,
                   use_weights   = TRUE,
                   weights_type  = "information-theory",
                   prune         = TRUE,
                   thinning      = 5,
                   store_theta   = FALSE,
                   store_pi      = FALSE,
                   parallel_init = FALSE)

out <- keyATM(docs      = keyATM_docs,    # text input
              regular_k = 3,              # number of regular topics
              keywords  = bills_keywords, # keywords
              model     = "basic",        # select the model
              options   = my_options,     # use your own option list
              keep      = c("Z")          # keep a specific object in the output


This is a seed used to generate random numbers. The same seed is used for initialization and fitting the model (set.seed() is executed before both initialization and fitting). If you do not provide seed, keyATM randomly selects a seed for you.


The default value is 1500.


Default is FALSE. If it is TRUE, it shows values of log-likelihood and perplexity.


keyATM calculates and stores the log-likelihood and perplexity. The default value is 10.


The default value is TRUE (use weights). We follow the weighting Scheme in Wilson & Chew (2010). If you do not want to use weights, please set it to FALSE. Please check our paper for details.


You can select one of four weights implemented in keyATM. The default is information-theory. keyATM can construct weights from the inverse frequency of the words, inv-frequency. There are normalized version of two: information-theory-normalized and inv-freq-normalized.


Prune keywords that do not appear in the documents.


The default value is 5 and keyATM keeps every \(5\)th draw from the sampling.


The default value is FALSE. Storing the value of thetas allows the calculation of credible intervals.


The default value is FALSE. Storing the value of \(\pi_k\) for all \(k\) (the probability that the topic \(k\) use keyword topic-word distribution).


Parallelize processes to speed up initialization. Default is FALSE. Note that even if you use the same seed, the initialization will become different between with and without parallelization.


You can manually set priors, but we do not recommend doing it unless you understandd the consequences.


Prior for the document-topic distribution. This option only works for base model.


Prior for the topic-word distribution.


Prior for the keyword topic-word distribution.


Prior for the probability of using keywords in a topic.


You can specify which output to keep (cf. Calculating heterogeneity).