Estimate Weights for Generalizing Average Treatment Effect
Usage
weighting(
data,
sample_indicator,
treatment_indicator = NULL,
outcome = NULL,
covariates,
estimation_method = "lr",
disjoint_data = TRUE
)
Arguments
- data
data frame comprised of "stacked" sample and target population data
- sample_indicator
variable name denoting binary sample membership (1 = in sample, 0 = out of sample)
- treatment_indicator
variable name denoting binary treatment assignment (ok if only available in sample, not population)
- outcome
variable name denoting outcome
- covariates
vector of covariate names in data set that predict sample membership
- estimation_method
method to estimate the probability of sample membership (propensity scores). Default is logistic regression ("lr").Other methods supported are Random Forests ("rf") and Lasso ("lasso")
- disjoint_data
logical. If TRUE, then sample and population data are considered disjoint. This affects calculation of the weights - see details for more information.
Value
A summary of propensity scores, covariates, and ASMD for both weighted and unweighted data, as well as a summary of the weights. Also weighted and unweighted TATE if outcome and treatment are given
Examples
library(tidyverse)
# creating a stratified sample and recruiting from the sample to prepare for assess
selection_covariates <- c("total", "pct_black_or_african_american", "pct_white",
"pct_female", "pct_free_and_reduced_lunch")
strat_output <- stratify(generalizeR:::inference_pop, guided = FALSE, n_strata = 4,
variables = selection_covariates, idvar = "ncessch")
#>
#> This might take a little while. Please bear with us.
#>
#> Calculated distance matrix.
#>
#> iteration: 1 --> total WCSS: 338.562 --> squared norm: 1.40551
#> iteration: 2 --> total WCSS: 205.024 --> squared norm: 0.138412
#> iteration: 3 --> total WCSS: 203.903 --> squared norm: 0.0419178
#> iteration: 4 --> total WCSS: 203.775 --> squared norm: 0.0313237
#> iteration: 5 --> total WCSS: 203.729 --> squared norm: 0
#>
#> ===================== end of initialization 1 =====================
#>
rec_output <- recruit(strat_output, guided = FALSE, sample_size = 40)
#>
#> The 'generalizeR_stratify' object you've supplied consists of 324 population units
#> divided into 4 strata along these variables:
#>
#> total, pct_black_or_african_american, pct_white, pct_female, pct_free_and_reduced_lunch.
#>
#> 4 recruitment lists have been generated, one per stratum. Each list contains the ID
#> information for the units, which have been ranked in order of desirability.
#>
#> The following table (also shown in the Viewer pane to the right) displays the stratum
#> sizes, their proportion relative to the total population size, and consequent
#> recruitment number for each stratum. Ideally, units should be recruited across strata
#> according to these numbers. Doing so will lead to the least amount of bias and no
#> increase in standard errors. Note that the recruitment numbers have been rounded to
#> integers in such a way as to ensure their sum equals the desired total sample size.
#>
#> Recruitment Table
#> Stratum 1 Stratum 2 Stratum 3 Stratum 4
#> Population Units 153.000 75.000 39.00 57.000
#> Sampling Proportion 0.472 0.231 0.12 0.176
#> Recruitment Number 19.000 9.000 5.00 7.000
#> <table class="table table-striped table-hover" style="margin-left: auto; margin-right: auto;">
#> <caption>Recruitment Table</caption>
#> <thead>
#> <tr>
#> <th style="empty-cells: hide;border-bottom:hidden;" colspan="1"></th>
#> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="4"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Stratum</div></th>
#> </tr>
#> <tr>
#> <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> </th>
#> <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> 1 </th>
#> <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> 2 </th>
#> <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> 3 </th>
#> <th style="text-align:center;position: sticky; top:0; background-color: #FFFFFF;"> 4 </th>
#> </tr>
#> </thead>
#> <tbody>
#> <tr>
#> <td style="text-align:center;font-weight: bold;border-right:1px solid;"> Population Units </td>
#> <td style="text-align:center;"> 153.000 </td>
#> <td style="text-align:center;"> 75.000 </td>
#> <td style="text-align:center;"> 39.00 </td>
#> <td style="text-align:center;"> 57.000 </td>
#> </tr>
#> <tr>
#> <td style="text-align:center;font-weight: bold;border-right:1px solid;"> Sampling Proportion </td>
#> <td style="text-align:center;"> 0.472 </td>
#> <td style="text-align:center;"> 0.231 </td>
#> <td style="text-align:center;"> 0.12 </td>
#> <td style="text-align:center;"> 0.176 </td>
#> </tr>
#> <tr>
#> <td style="text-align:center;font-weight: bold;border-right:1px solid;background-color: rgba(92, 200, 99, 255) !important;"> Recruitment Number </td>
#> <td style="text-align:center;background-color: rgba(92, 200, 99, 255) !important;"> 19.000 </td>
#> <td style="text-align:center;background-color: rgba(92, 200, 99, 255) !important;"> 9.000 </td>
#> <td style="text-align:center;background-color: rgba(92, 200, 99, 255) !important;"> 5.00 </td>
#> <td style="text-align:center;background-color: rgba(92, 200, 99, 255) !important;"> 7.000 </td>
#> </tr>
#> </tbody>
#> </table>
#> Attempt to recruit units starting from the top of each recruitment list. If you are
#> unsuccessful in recruiting a particular unit, move on to the next one in the list and
#> continue until you have reached the ideal recruitment number in each stratum.
#>
#> If you have stored the output of 'recruit()' in an object, you can use it to access
#> these lists by typing the name of the object followed by '$recruitment_lists'.
# creating the sample dataset from the output of recruit
sample_list <- c(rec_output$recruitment_lists[[1]]$ncessch[1:5],
rec_output$recruitment_lists[[2]]$ncessch[1:20],
rec_output$recruitment_lists[[3]]$ncessch[1:11],
rec_output$recruitment_lists[[4]]$ncessch[1:4])
inference_pop_sample <- mutate(generalizeR:::inference_pop,
sample = if_else(ncessch %in% sample_list, 1, 0))
# weighting the sample with the given covariates
weighting_output <- weighting(inference_pop_sample, sample_indicator = "sample",
covariates = selection_covariates, disjoint_data = FALSE)