Preprocess the external control datasets for propensity score modeling. This involves combining (i.e., grouping) categorical variables, deriving continuous variables (i.e., time since initial diagnosis), and identifying each of the distinct analyses for the study (defined by a unique clinical trial and pairwise comparison between an experimental and comparator arm).

preprocess(data, combine_levels = TRUE, drop_nos = TRUE)

Arguments

data

The unprocessed ecdata::nsclc dataset.

combine_levels

If TRUE, then levels of factor variables with low counts (typically less than 10) are combined into a single category. This current only applies to the race variable.

drop_nos

In some instance, patients with a not otherwise specified (NOS) histology are part of the external control but not part of the trial. If TRUE, then these patients are removed from the external control cohort; if FALSE, then they are not.

Value

A dplyr::tibble.