Resume Fitting • keyATM

In some cases, you may want to resume the fitting iteration for a variety of reasons, such as working with a cluster computer or dealing with large datasets that take a significant amount of time to process. The resume feature in keyATM allows you to split the fitting process, ensuring that you can achieve the desired number of iterations without having to run them all at once.

To use the resume feature, you can simply split the desired number of iterations into smaller chunks and specify the resume argument. For example, if you want to run a total of 1000 iterations, you can run 500 iterations twice to achieve the same result as running 1000 iterations at once (you can resume as many times as you want).

In the following code, we demonstrate how to run a total of 50 iterations using the resume feature.

In the following code, the resume argument is used to save and load the results of the keyATM fitting process, allowing you to resume the fitting from a previous state. The argument specifies the path where the results will be saved as an RDS object named keyATM_resume.rds. In this case, the object will be saved in your current working directory.

# 50 iterations all at once
out <- keyATM(
  docs              = keyATM_docs, # text input
  no_keyword_topics = 5, # number of topics without keywords
  keywords          = keywords, # keywords
  model             = "base",
  options           = list(seed = 250, iterations = 50)
)

# Conducting 10 iterations five times to achieve the same result as above
for (i in 1:5) {
  resumed <- keyATM(
    docs              = keyATM_docs, # text input
    no_keyword_topics = 5, # number of topics without keywords
    keywords          = keywords, # keywords
    model             = "base",
    options           = list(seed = 250, iterations = 10, resume = "./keyATM_resume.rds")
  )
}

When running the keyATM() function wit the resume argument, the keyATM function will check if the ./keyATM_resume.rds file exists. If it does, the function will load the saved object and resume the fitting process from the last saved state. If the file does not exist, the keyATM will perform the fitting process from the beginning and save the intermediate result as keyATM_resume.rds in the specified location.

We can confirm that the last perplexity is the same for both cases.

tail(out$model_fit$Perplexity, 1) # 50 iterations all at once

## [1] 1864.717

tail(resumed$model_fit$Perplexity, 1) # using the `resume` argument

## [1] 1864.717

Please note that the intermediate results are saved only after the completion of the specified number of iterations. For instance, if you set iterations = 1000 and specify the resume argument, keyATM will only save the output after all 1000 iterations have been completed. It does not save the output intermittently during the iteration process.