Preparing documents and labels

Please read Preparation for the reading of documents and creating a list of keywords. We use bills data we prapared (documents and keywords).

library(keyATM)
library(quanteda)

data(keyATM_data_bills)
bills_keywords <- keyATM_data_bills$keywords
bills_dfm <- keyATM_data_bills$doc_dfm  # quanteda dfm object
keyATM_docs <- keyATM_read(bills_dfm)

Label model needs a vector of labels. Values of the vector represent topics. If a document does not have a label, the corresponding element in the label vector should have NA.

labels_use <- keyATM_data_bills$labels
labels_use
##   [1] NA NA NA NA NA NA  3 NA NA  3 NA  3 NA NA NA NA  2 NA NA NA NA NA NA NA NA
##  [26] NA NA  2 NA NA NA  3 NA NA NA NA NA NA  3 NA NA  3 NA NA  1 NA NA NA NA NA
##  [51] NA NA NA NA  3 NA NA NA  3  1 NA NA NA NA  3 NA NA NA NA NA  1 NA  3 NA NA
##  [76] NA NA NA  3 NA  2 NA  3 NA NA  2  3 NA NA NA NA NA NA NA NA  1 NA NA NA NA
## [101] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA  2 NA NA NA  2 NA NA NA NA  1
## [126]  3 NA  1 NA NA  1 NA NA NA NA  3 NA NA NA NA
length(labels_use)  # should be the same as the number of documents
## [1] 140

Please make sure that the order of label index is the same as the order of documents.

Fitting the model

out <- keyATM(
              docs              = keyATM_docs,                # text input
              no_keyword_topics = 3,                          # number of topics without keywords
              keywords          = bills_keywords,             # keywords
              model             = "label",                    # select the model
              model_settings    = list(labels = labels_use),  # set labels
              options           = list(seed = 250)
             )
## Label model is an experimental function.

Saving the model

Once you fit the model, you can save the model with save() for replication. This is the same as the Base model.

Checking top words

##      1_Education       2_Law     3_Health        4_Drug       Other_1
## 1  education [✓]     law [✓]   health [✓]      drug [✓]    commission
## 2         school   court [✓]         care       project    electronic
## 3          grant        code         date       control      facility
## 4    educational      action   public [✓] administrator   information
## 5    student [✓]      person subparagraph         grant communication
## 6          local enforcement       report           air         party
## 7     public [3]    district     congress    assistance         level
## 8       describe     chapter       follow         water         waste
## 9        library       crime     describe     substance       compact
## 10      eligible        term   individual        report           low
##        Other_2      Other_3
## 1     research   management
## 2     national     wildlife
## 3       center         land
## 4   technology      coastal
## 5     director conservation
## 6     activity     national
## 7    establish         fish
## 8     standard       system
## 9  development       refuge
## 10   committee     resource