Estimate Weights for Generalizing Average Treatment Effect

Usage

weighting(
  data,
  sample_indicator,
  treatment_indicator = NULL,
  outcome = NULL,
  covariates,
  estimation_method = "lr",
  disjoint_data = TRUE
)

Arguments

data: data frame comprised of "stacked" sample and target population data
sample_indicator: variable name denoting binary sample membership (1 = in sample, 0 = out of sample)
treatment_indicator: variable name denoting binary treatment assignment (ok if only available in sample, not population)
outcome: variable name denoting outcome
covariates: vector of covariate names in data set that predict sample membership
estimation_method: method to estimate the probability of sample membership (propensity scores). Default is logistic regression ("lr").Other methods supported are Random Forests ("rf") and Lasso ("lasso")
disjoint_data: logical. If TRUE, then sample and population data are considered disjoint. This affects calculation of the weights - see details for more information.

Value

A summary of propensity scores, covariates, and ASMD for both weighted and unweighted data, as well as a summary of the weights. Also weighted and unweighted TATE if outcome and treatment are given

Examples

library(tidyverse)

# creating a stratified sample and recruiting from the sample to prepare for assess
selection_covariates <- c("total", "pct_black_or_african_american", "pct_white",
                          "pct_female", "pct_free_and_reduced_lunch")

strat_output <- stratify(generalizeR:::inference_pop, guided = FALSE, n_strata = 4,
                         variables = selection_covariates, idvar = "ncessch")
#> 
#> This might take a little while. Please bear with us.
#> 
#> Calculated distance matrix.
#>  
#> iteration: 1 --> total WCSS: 338.562  -->  squared norm: 1.40551
#> iteration: 2 --> total WCSS: 205.024  -->  squared norm: 0.138412
#> iteration: 3 --> total WCSS: 203.903  -->  squared norm: 0.0419178
#> iteration: 4 --> total WCSS: 203.775  -->  squared norm: 0.0313237
#> iteration: 5 --> total WCSS: 203.729  -->  squared norm: 0
#>  
#> ===================== end of initialization 1 =====================
#>  
rec_output <- recruit(strat_output, guided = FALSE, sample_size = 40)
#> 
#> The 'generalizeR_stratify' object you've supplied consists of 324 population units 
#> divided into 4 strata along these variables:
#> 
#> total, pct_black_or_african_american, pct_white, pct_female, pct_free_and_reduced_lunch.
#> 
#> 4 recruitment lists have been generated, one per stratum. Each list contains the ID 
#> information for the units, which have been ranked in order of desirability.
#> 
#> The following table (also shown in the Viewer pane to the right) displays the stratum 
#> sizes, their proportion relative to the total population size, and consequent 
#> recruitment number for each stratum. Ideally, units should be recruited across strata 
#> according to these numbers. Doing so will lead to the least amount of bias and no 
#> increase in standard errors. Note that the recruitment numbers have been rounded to 
#> integers in such a way as to ensure their sum equals the desired total sample size.
#> 
#> Recruitment Table
#>                      Stratum 1 Stratum 2 Stratum 3 Stratum 4
#>     Population Units   153.000    75.000     39.00    57.000
#>  Sampling Proportion     0.472     0.231      0.12     0.176
#>   Recruitment Number    19.000     9.000      5.00     7.000
#> <table class="table table-striped table-hover" style="margin-left: auto; margin-right: auto;">
#> <caption>Recruitment Table</caption>
#>  <thead>
#> <tr>
#> <th style="empty-cells: hide;border-bottom:hidden;" colspan="1"></th>
#> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="4"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Stratum</div></th>
#> </tr>
#>   <tr>
#>    <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;">   </th>
#>    <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> 1 </th>
#>    <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> 2 </th>
#>    <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> 3 </th>
#>    <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> 4 </th>
#>   </tr>
#>  </thead>
#> <tbody>
#>   <tr>
#>    <td style="text-align:center;font-weight: bold;border-right:1px solid;"> Population Units </td>
#>    <td style="text-align:center;"> 153.000 </td>
#>    <td style="text-align:center;"> 75.000 </td>
#>    <td style="text-align:center;"> 39.00 </td>
#>    <td style="text-align:center;"> 57.000 </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:center;font-weight: bold;border-right:1px solid;"> Sampling Proportion </td>
#>    <td style="text-align:center;"> 0.472 </td>
#>    <td style="text-align:center;"> 0.231 </td>
#>    <td style="text-align:center;"> 0.12 </td>
#>    <td style="text-align:center;"> 0.176 </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:center;font-weight: bold;border-right:1px solid;background-color: rgba(92, 200, 99, 255) !important;"> Recruitment Number </td>
#>    <td style="text-align:center;background-color: rgba(92, 200, 99, 255) !important;"> 19.000 </td>
#>    <td style="text-align:center;background-color: rgba(92, 200, 99, 255) !important;"> 9.000 </td>
#>    <td style="text-align:center;background-color: rgba(92, 200, 99, 255) !important;"> 5.00 </td>
#>    <td style="text-align:center;background-color: rgba(92, 200, 99, 255) !important;"> 7.000 </td>
#>   </tr>
#> </tbody>
#> </table>
#> Attempt to recruit units starting from the top of each recruitment list. If you are 
#> unsuccessful in recruiting a particular unit, move on to the next one in the list and 
#> continue until you have reached the ideal recruitment number in each stratum.
#> 
#> If you have stored the output of 'recruit()' in an object, you can use it to access 
#> these lists by typing the name of the object followed by '$recruitment_lists'.

# creating the sample dataset from the output of recruit
sample_list <- c(rec_output$recruitment_lists[[1]]$ncessch[1:5],
                  rec_output$recruitment_lists[[2]]$ncessch[1:20],
                  rec_output$recruitment_lists[[3]]$ncessch[1:11],
                  rec_output$recruitment_lists[[4]]$ncessch[1:4])
inference_pop_sample <- mutate(generalizeR:::inference_pop,
                               sample = if_else(ncessch %in% sample_list, 1, 0))

# weighting the sample with the given covariates
weighting_output <- weighting(inference_pop_sample, sample_indicator = "sample",
                              covariates = selection_covariates, disjoint_data = FALSE)