Skip to contents

Estimate Weights for Generalizing Average Treatment Effect

Usage

weighting(
  data,
  sample_indicator,
  treatment_indicator = NULL,
  outcome = NULL,
  covariates,
  estimation_method = "lr",
  disjoint_data = TRUE
)

Arguments

data

data frame comprised of "stacked" sample and target population data

sample_indicator

variable name denoting binary sample membership (1 = in sample, 0 = out of sample)

treatment_indicator

variable name denoting binary treatment assignment (ok if only available in sample, not population)

outcome

variable name denoting outcome

covariates

vector of covariate names in data set that predict sample membership

estimation_method

method to estimate the probability of sample membership (propensity scores). Default is logistic regression ("lr").Other methods supported are Random Forests ("rf") and Lasso ("lasso")

disjoint_data

logical. If TRUE, then sample and population data are considered disjoint. This affects calculation of the weights - see details for more information.

Value

A summary of propensity scores, covariates, and ASMD for both weighted and unweighted data, as well as a summary of the weights. Also weighted and unweighted TATE if outcome and treatment are given

Examples

library(tidyverse)

# creating a stratified sample and recruiting from the sample to prepare for assess
selection_covariates <- c("total", "pct_black_or_african_american", "pct_white",
                          "pct_female", "pct_free_and_reduced_lunch")

strat_output <- stratify(generalizeR:::inference_pop, guided = FALSE, n_strata = 4,
                         variables = selection_covariates, idvar = "ncessch")
#> 
#> This might take a little while. Please bear with us.
#> 
#> Calculated distance matrix.
#>  
#> iteration: 1 --> total WCSS: 338.562  -->  squared norm: 1.40551
#> iteration: 2 --> total WCSS: 205.024  -->  squared norm: 0.138412
#> iteration: 3 --> total WCSS: 203.903  -->  squared norm: 0.0419178
#> iteration: 4 --> total WCSS: 203.775  -->  squared norm: 0.0313237
#> iteration: 5 --> total WCSS: 203.729  -->  squared norm: 0
#>  
#> ===================== end of initialization 1 =====================
#>  
rec_output <- recruit(strat_output, guided = FALSE, sample_size = 40)
#> 
#> The 'generalizeR_stratify' object you've supplied consists of 324 population units 
#> divided into 4 strata along these variables:
#> 
#> total, pct_black_or_african_american, pct_white, pct_female, pct_free_and_reduced_lunch.
#> 
#> 4 recruitment lists have been generated, one per stratum. Each list contains the ID 
#> information for the units, which have been ranked in order of desirability.
#> 
#> The following table (also shown in the Viewer pane to the right) displays the stratum 
#> sizes, their proportion relative to the total population size, and consequent 
#> recruitment number for each stratum. Ideally, units should be recruited across strata 
#> according to these numbers. Doing so will lead to the least amount of bias and no 
#> increase in standard errors. Note that the recruitment numbers have been rounded to 
#> integers in such a way as to ensure their sum equals the desired total sample size.
#> 
#> Recruitment Table
#>                      Stratum 1 Stratum 2 Stratum 3 Stratum 4
#>     Population Units   153.000    75.000     39.00    57.000
#>  Sampling Proportion     0.472     0.231      0.12     0.176
#>   Recruitment Number    19.000     9.000      5.00     7.000
#> <table class="table table-striped table-hover" style="margin-left: auto; margin-right: auto;">
#> <caption>Recruitment Table</caption>
#>  <thead>
#> <tr>
#> <th style="empty-cells: hide;border-bottom:hidden;" colspan="1"></th>
#> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="4"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Stratum</div></th>
#> </tr>
#>   <tr>
#>    <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;">   </th>
#>    <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> 1 </th>
#>    <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> 2 </th>
#>    <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> 3 </th>
#>    <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> 4 </th>
#>   </tr>
#>  </thead>
#> <tbody>
#>   <tr>
#>    <td style="text-align:center;font-weight: bold;border-right:1px solid;"> Population Units </td>
#>    <td style="text-align:center;"> 153.000 </td>
#>    <td style="text-align:center;"> 75.000 </td>
#>    <td style="text-align:center;"> 39.00 </td>
#>    <td style="text-align:center;"> 57.000 </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:center;font-weight: bold;border-right:1px solid;"> Sampling Proportion </td>
#>    <td style="text-align:center;"> 0.472 </td>
#>    <td style="text-align:center;"> 0.231 </td>
#>    <td style="text-align:center;"> 0.12 </td>
#>    <td style="text-align:center;"> 0.176 </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:center;font-weight: bold;border-right:1px solid;background-color: rgba(92, 200, 99, 255) !important;"> Recruitment Number </td>
#>    <td style="text-align:center;background-color: rgba(92, 200, 99, 255) !important;"> 19.000 </td>
#>    <td style="text-align:center;background-color: rgba(92, 200, 99, 255) !important;"> 9.000 </td>
#>    <td style="text-align:center;background-color: rgba(92, 200, 99, 255) !important;"> 5.00 </td>
#>    <td style="text-align:center;background-color: rgba(92, 200, 99, 255) !important;"> 7.000 </td>
#>   </tr>
#> </tbody>
#> </table>
#> Attempt to recruit units starting from the top of each recruitment list. If you are 
#> unsuccessful in recruiting a particular unit, move on to the next one in the list and 
#> continue until you have reached the ideal recruitment number in each stratum.
#> 
#> If you have stored the output of 'recruit()' in an object, you can use it to access 
#> these lists by typing the name of the object followed by '$recruitment_lists'.

# creating the sample dataset from the output of recruit
sample_list <- c(rec_output$recruitment_lists[[1]]$ncessch[1:5],
                  rec_output$recruitment_lists[[2]]$ncessch[1:20],
                  rec_output$recruitment_lists[[3]]$ncessch[1:11],
                  rec_output$recruitment_lists[[4]]$ncessch[1:4])
inference_pop_sample <- mutate(generalizeR:::inference_pop,
                               sample = if_else(ncessch %in% sample_list, 1, 0))

# weighting the sample with the given covariates
weighting_output <- weighting(inference_pop_sample, sample_indicator = "sample",
                              covariates = selection_covariates, disjoint_data = FALSE)