Skip to content

Add sample selection models #235

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 71 commits into from
May 29, 2024
Merged

Add sample selection models #235

merged 71 commits into from
May 29, 2024

Conversation

SvenKlaassen
Copy link
Member

@SvenKlaassen SvenKlaassen commented Mar 28, 2024

Description

This PR contains an implementation of two estimators of sample selection models from Michela Bia, Martin Huber & Lukáš Lafférs (2023) Double Machine Learning for Sample Selection Models, Journal of Business & Economic Statistics, DOI: 10.1080/07350015.2023.2271071 -- identification under missingness at random and under nonignorable nonresponse, along with basic tests on simulated data. For testing, the file conftest.py was also modified to include the DGP for these models. Original implementation of these estimators is available in the R causalweight package (https://cran.r-project.org/web/packages/causalweight/index.html).

Reference to Issues or PRs

Implemented from @mychaelka. Original PR #231

Comments

These estimators require a sample selection indicator to be present in the data (1 if outcome is observed, 0 otherwise). The DoubleMLData interface does not have a selection indicator available yet, so the implementation uses the time indicator t in its place. The third estimator in the paper (identification under sequential conditional independence) is not implemented yet, as it would require interfering with the implementation of DoubleMLData, as it requires the covariates to be split into two parts -- observed pre-treatment and observed post-treatment.

Additional Changes

  • Remove apply_crossfitting and dml_procedure
  • Update to use DoubleMLFramework class
  • Add selection indicator to DoubleMLData
  • Implement external predictions (not yet)
  • Implement sensitvity analysis (not yet)

PR Checklist

Please fill out this PR checklist (see our contributing guidelines for details).

  • The title of the pull request summarizes the changes made.
  • The PR contains a detailed description of all changes and additions.
  • References to related issues or PRs are added.
  • The code passes all (unit) tests.
  • Enhancements or new feature are equipped with unit tests.
  • The changes adhere to the PEP8 standards.

Michaela Kecskésová added 30 commits February 9, 2024 10:25
@SvenKlaassen
Copy link
Member Author

@mychaelka i have updated the data class with a selection indicator s and extended the unit tests.
Further, I have added sampling stratification for both scores to the top of the class definition

self._strata = self._dml_data.d.reshape(-1, 1) + 2 * self._dml_data.s.reshape(-1, 1)

Do you think the stratification by treatment and selection variable is fine for both scores?

I would like to update the documentation over the next weeks. Can you send me your simulation notebook (via mail if possible).
I am sorry, that the changes were quite slow. I was quite busy over the last month.

@mychaelka
Copy link

@SvenKlaassen thank you, and yes, the stratification should be fine for both. I could send the notebook(s) tomorrow, but right now they only contain some simulations with only a few comments. I can adjust them to look similar to the example ones that you have already available during the weekend and send the final version next week.
And no need to apologize for being busy, I am in a similar situation right now :)

@SvenKlaassen
Copy link
Member Author

Thank you.
A slightly adjusted version would be great but you can take your time. It doesn't need to be next week.

@SvenKlaassen SvenKlaassen merged commit f04fef0 into main May 29, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants