Skip to contents

The function stratify() takes as input any data frame with observations (rows) that you wish to stratify into clusters. Typically, the goal of such stratification is developing a sampling design for maximizing generalizability. This function, and the others in this package, are designed to mimic the website https://www.thegeneralizer.org/.

Usage

stratify(
  data = NULL,
  guided = TRUE,
  n_strata = NULL,
  variables = NULL,
  idvar = NULL,
  verbose = TRUE
)

Arguments

data

data.frame object containing the population data to be stratified (observations as rows); must include a unique id variable for each observation, as well as covariates.

guided

logical, defaults to TRUE. Whether the function should be guided (ask questions and behave interactively throughout) or not. If set to FALSE, the user must provide values for other arguments below

n_strata

integer, defaults to NULL. If guided is set to FALSE, must provide a number of strata in which to divide to cluster population

variables

character, defaults to NULL. If guided is set to FALSE, must provide a character vector of the names of stratifying variables (from population data frame)

idvar

character, defaults to NULL. If guided is set to FALSE, must provide a character vector of the name of the ID variable (from population data frame)

verbose

logical, defaults to TRUE.

Value

The function returns a list of class "generalizeR_stratify" that can be provided as input to recruit(). More information on the components of this list can be found above under "Details."

Details

The list contains 14 components: idvar, variables, dataset, n_strata, solution, pop_data_by_stratum, summary_stats, data_omitted, cont_data_stats, cat_data_levels, heat_data, heat_data_simple, heat_data_kable, and heat_plot.

pop_data_by_stratum:

a tibble with number of rows equal to the number of rows in the inference population (data) and number of columns equal to the number of stratifying variables (dummy-coded if applicable) plus the ID column (idvar) and a column representing stratum membership, Stratum

References

Tipton, E. (2014). Stratified sampling using cluster analysis: A sample selection strategy for improved generalizations from experiments. Evaluation Review, 37(2), 109-139.

Tipton, E. (2014). How generalizable is your experiment? An index for comparing experimental samples and populations. Journal of Educational and Behavioral Statistics, 39(6), 478-501.

Examples

library(tidyverse)

selection_covariates <- c("total", "pct_black_or_african_american",
                          "pct_white", "pct_female", "pct_free_and_reduced_lunch")
stratify(generalizeR:::inference_pop, guided = FALSE, n_strata = 4,
         variables = selection_covariates, idvar= "ncessch")
#> 
#> This might take a little while. Please bear with us.
#> 
#> Calculated distance matrix.
#>  
#> iteration: 1 --> total WCSS: 338.562  -->  squared norm: 1.40551
#> iteration: 2 --> total WCSS: 205.024  -->  squared norm: 0.138412
#> iteration: 3 --> total WCSS: 203.903  -->  squared norm: 0.0419178
#> iteration: 4 --> total WCSS: 203.775  -->  squared norm: 0.0313237
#> iteration: 5 --> total WCSS: 203.729  -->  squared norm: 0
#>  
#> ===================== end of initialization 1 =====================
#>