Fit keyATM models.
texts read via keyATM_read()
.
keyATM model: base
, covariates
, and dynamic
.
the number of regular topics.
a list of keywords.
a list of model specific settings (details are in the online documentation).
a list of priors of parameters.
a list of options
seed: A numeric value for random seed. If it is not provided, the package randomly selects a seed.
iterations: An integer. Number of iterations. Default is 1500
.
verbose: If TRUE
, it prints loglikelihood and perplexity. Default is FALSE
.
llk_per: An integer. If the value is j
keyATM stores loglikelihood and perplexity every \(j\) iteration. Default value is 10
per iterations
use_weights: If TRUE
use weight. Default is TRUE
.
weights_type: There are four types of weights. Weights based on the information theory (information-theory
) and inverse frequency (inv-freq
) and normalized versions of them (information-theory-normalized
and inv-freq-normalized
). Default is information-theory
.
prune: If TRUE
rume keywords that do not appear in the corpus. Default is TRUE
.
store_theta: If TRUE
or 1
, it stores \(\theta\) (document-topic distribution) for the iteration specified by thinning. Default is FALSE
(same as 0
).
store_pi: If TRUE
or 1
, it stores \(\pi\) (the probability of using keyword topic word distribution) for the iteration specified by thinning. Default is FALSE
(same as 0
).
thinning: An integer. If the value is j
keyATM stores following parameters every j
iteration. The default is 5
.
theta: For all models. If store_theta
is TRUE
document-level topic assignment is stored (sufficient statistics to calculate document-topic distributions theta
).
alpha: For the base and dynamic models. In the base model alpha is shared across all documents whereas each state has different alpha in the dynamic model.
lambda: coefficients in the covariate model.
R: For the dynamic model. The state each document belongs to.
P: For the dynamic model. The state transition probability.
parallel_init: Parallelize processes to speed up initialization. Default is FALSE
. Please plan()
before use this feature.
resume: The resume argument is used to save and load the intermediate results of the keyATM fitting process, allowing you to resume the fitting from a previous state. The default value is NULL
(do not resume).
a vector of the names of elements you want to keep in output.
A keyATM_output
object containing:
number of keyword topics
number of no-keyword topics
number of terms (number of unique words)
number of documents
the name of the model
topic proportions for each document (document-topic distribution)
topic specific word generation probabilities (topic-word distribution)
number of tokens assigned to each topic
number of times each word type appears
length of each document in tokens
words in the vocabulary (a vector of unique words)
priors
options
specified keywords
perplexity and log-likelihood
estimated \(\pi\) (the probability of using keyword topic word distribution) for the last iteration
values stored during iterations
outputs you specified to store in keep
option
information about the fitting
if (FALSE) {
library(keyATM)
library(quanteda)
data(keyATM_data_bills)
bills_keywords <- keyATM_data_bills$keywords
bills_dfm <- keyATM_data_bills$doc_dfm # quanteda dfm object
keyATM_docs <- keyATM_read(bills_dfm)
# keyATM Base
out <- keyATM(docs = keyATM_docs, model = "base",
no_keyword_topics = 5, keywords = bills_keywords)
# Visit our website for full examples: https://keyatm.github.io/keyATM/
}