-
Notifications
You must be signed in to change notification settings - Fork 94
Implementation of sample selection estimators #231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation of sample selection estimators #231
Conversation
…and create new clean class
doubleml/double_ml_selection.py
Outdated
'preds': np.full(shape=self._dml_data.n_obs, fill_value=np.nan) | ||
} | ||
mu_hat_d0 = copy.deepcopy(mu_hat_d1) | ||
pi_hat = copy.deepcopy(mu_hat_d1) |
Check warning
Code scanning / CodeQL
Variable defined multiple times
doubleml/double_ml_selection.py
Outdated
from .double_ml_score_mixins import LinearScoreMixin | ||
|
||
|
||
class DoubleMLS(LinearScoreMixin, DoubleML): |
Check warning
Code scanning / CodeQL
Conflicting attributes in base classes
doubleml/double_ml_selection.py
Outdated
draw_sample_splitting, | ||
apply_cross_fitting) | ||
|
||
self._external_predictions_implemented = False |
Check warning
Code scanning / CodeQL
Overwriting attribute in super-class or sub-class
doubleml/double_ml_selection.py
Outdated
apply_cross_fitting) | ||
|
||
self._external_predictions_implemented = False | ||
self._sensitivity_implemented = True |
Check warning
Code scanning / CodeQL
Overwriting attribute in super-class or sub-class
doubleml/double_ml_selection.py
Outdated
self._learner = {'ml_mu': clone(ml_mu), | ||
'ml_pi': clone(ml_pi), | ||
'ml_p': clone(ml_p), | ||
} |
Check warning
Code scanning / CodeQL
Overwriting attribute in super-class or sub-class
Thank you very much for the contribution to the package. I think the idea with using the time variable |
Thank you Sven, there still remains one estimator in the Sample selection paper (identification under sequential conditional independence) that I have not yet implemented, as mentioned in the comments. This estimator accounts for the covariates X being observed both pre-treatment and post-treatment. Would it be possible to also add such (optional) distinction into |
I guess it would be possible. I will check out the paper and try to come up with a solution. |
I was actually thinking about that, since I don't really like the class name that I am using now ( |
Both I have thought a bit about the additional implementation of covariates |
|
@SvenKlaassen I added model defaults and return type tests for the sample selection models into the existing files and created a new file with exception tests. I am also working on example notebooks with both simulated and real data. |
Sorry for the late reply. |
Sorry, I couldn't see any comments on the code, nor did I receive any notification. Until now I thought you had been busy and did not have time to go through the code, so I was working on the things you mentioned in email. I still cannot see any comments on the code though... |
I mean the comments in this PR. I am not sure why you are not able to see them. |
I can only see the comments in this conversation. I tried looking under changed files and commits, but I can only see the warnings from Codacy there. We can definitely do a short call -- I am currently away but will be available today around 4pm again. I will also be available during the weekend and for the most of next week. |
Sorry, it was completely my fault. I forgot to submit the review.... If some points need discussion, we can still talk (just ask via mail). |
Thank you, I will try to go over the comments during this weekend. I can already see that some of the comments might be a huge help for the parts that I was struggling with a bit. |
…rmalization to false
…e of nonignorable nonresponse
…write tests accordingly
@SvenKlaassen I refactored the code to use only one sample splitting procedure for the nested estimation. I also fixed the ordering -- now the order of predictions should match the input data. If you come across any other issue (or find the refactored code to still have bugs), please let me know. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really good. Only very small comments.
If the models are included, we can merge this and I can then start to update the documentation.
I might also start to add external predictions and sensitivity analysis before merging in onto main
.
I will merge the PR onto the extra branch and try to merge it onto |
Of course, thank you. If you need any help with the documentation or anything else, please let me know. I already have some example notebooks but I have to clean them up first (so far they have been only experimental). |
Description
This PR contains an implementation of two estimators of sample selection models from Michela Bia, Martin Huber & Lukáš Lafférs (2023) Double Machine Learning for Sample Selection Models, Journal of Business & Economic Statistics, DOI: 10.1080/07350015.2023.2271071 -- identification under missingness at random and under nonignorable nonresponse, along with basic tests on simulated data. For testing, the file
conftest.py
was also modified to include the DGP for these models. Original implementation of these estimators is available in the Rcausalweight
package (https://cran.r-project.org/web/packages/causalweight/index.html).Reference to Issues or PRs
None
Comments
These estimators require a sample selection indicator to be present in the data (1 if outcome is observed, 0 otherwise). The
DoubleMLData
interface does not have a selection indicator available yet, so the implementation uses the time indicatort
in its place. The third estimator in the paper (identification under sequential conditional independence) is not implemented yet, as it would require interfering with the implementation ofDoubleMLData
, as it requires the covariates to be split into two parts -- observed pre-treatment and observed post-treatment.PR Checklist
Please fill out this PR checklist (see our contributing guidelines for details).