-
Notifications
You must be signed in to change notification settings - Fork 94
Add sample selection models #235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…and create new clean class
…utils_selection_manual
@mychaelka i have updated the data class with a selection indicator doubleml-for-py/doubleml/irm/ssm.py Line 138 in 5b09f66
Do you think the stratification by treatment and selection variable is fine for both scores? I would like to update the documentation over the next weeks. Can you send me your simulation notebook (via mail if possible). |
@SvenKlaassen thank you, and yes, the stratification should be fine for both. I could send the notebook(s) tomorrow, but right now they only contain some simulations with only a few comments. I can adjust them to look similar to the example ones that you have already available during the weekend and send the final version next week. |
Thank you. |
Description
This PR contains an implementation of two estimators of sample selection models from Michela Bia, Martin Huber & Lukáš Lafférs (2023) Double Machine Learning for Sample Selection Models, Journal of Business & Economic Statistics, DOI: 10.1080/07350015.2023.2271071 -- identification under missingness at random and under nonignorable nonresponse, along with basic tests on simulated data. For testing, the file
conftest.py
was also modified to include the DGP for these models. Original implementation of these estimators is available in the Rcausalweight
package (https://cran.r-project.org/web/packages/causalweight/index.html).Reference to Issues or PRs
Implemented from @mychaelka. Original PR #231
Comments
These estimators require a sample selection indicator to be present in the data (1 if outcome is observed, 0 otherwise). The
DoubleMLData
interface does not have a selection indicator available yet, so the implementation uses the time indicatort
in its place. The third estimator in the paper (identification under sequential conditional independence) is not implemented yet, as it would require interfering with the implementation ofDoubleMLData
, as it requires the covariates to be split into two parts -- observed pre-treatment and observed post-treatment.Additional Changes
apply_crossfitting
anddml_procedure
DoubleMLFramework
classDoubleMLData
PR Checklist
Please fill out this PR checklist (see our contributing guidelines for details).