Skip to content

Adapt nuisance est for IV-type score (PLR) & new score IV-type for PLIV #151

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 55 commits into from
Jun 14, 2022

Conversation

MalteKurz
Copy link
Member

@MalteKurz MalteKurz commented May 20, 2022

Description

PLR

  • Nuisance estimation for IV type score: In this PR the nuisance estimation for the IV-type score in the PLR model is adapted to be in line with the DML paper Chernozhukov et al. (2018).
    • Results for the default score='partialling out' (Equation (4.4) in Chernozhukov et al. (2018)) are not affected by the changes in this PR. However, the naming of the nuisance parameter is changed from ml_g to ml_l (analogously predictions g_hat have been renamed to l_hat, etc.) to be better in line with Chernozhukov et al. (2018). To make the transition to the new naming smooth, depreciation warnings have been added (see below for an overview of the API changes and examples for the depreciation warnings).
    • For the score='IV-type' (Equation (4.3) in Chernozhukov et al. (2018)) the implementation now follows the approach described on pp. C31-C33 in Chernozhukov et al. (2018). This means that an initial estimate for theta_0 is obtained via the 'partialling out' score. Then an estimate for g_0(X) is obtained by regressing Y - theta_0 * D on X. Therefore, an additional learner (not needed to evaluate the score) needs to be provided, i.e., the nuisance function l_0(X) (needed for the preliminary theta_0 estimate) is estimated with learner ml_l and g_0(X) with learner ml_g. To make the transition to the new API (additional learner) smooth, depreciation warnings have been added (see below for an overview of the API changes and examples for the depreciation warnings). Especially, if only ml_g is specified but not ml_l, then ml_g = clone(ml_l) is being used and a warning is being thrown.

PLIV

  • In this PR a new score function for the PLIV model is implemented:
    • Results for the default score='partialling out' (Equation (4.8) in Chernozhukov et al. (2018)) are not affected by the changes in this PR. However, the naming of the nuisance parameter is changed from ml_g to ml_l (analogously predictions g_hat to l_hat, etc.) to be better in line with Chernozhukov et al. (2018). To make the transition to the new naming smooth, depreciation warnings have been added (see below for examples).
    • A new score='IV-type' (Equation (4.7) in Chernozhukov et al. (2018)) is now available for the PLIV model. The estimation of the nuisance parts follows the approach described on p. C33 in Chernozhukov et al. (2018). This means that an initial estimate for theta_0 is obtained via the 'partialling out' score. Then an estimate for g_0(X) is obtained by regressing Y - theta_0 * D on X. Therefore, two additional learners (not needed to evaluate the score) need to be provided, i.e., the nuisance functions l_0(X) and r_0(X) (needed for the preliminary theta_0 estimate) are estimated with learner ml_l and ml_r. g_0(X) is estimated with learner ml_g.

API changes

PLR

  • API changed from DoubleMLPLR(obj_dml_data, ml_g, ml_m [, ...]) to DoubleMLPLR(obj_dml_data, ml_l, ml_m, ml_g [, ...]).
    • For score='partialling out' ml_l & ml_m are needed.
    • For score='IV-type' ml_l, ml_m & ml_g.
    • For callable scores ml_l & ml_m are mandatory and ml_g optional.
  • The signature of callable scores changed from psi_a, psi_b = score(y, d, g_hat, m_hat, smpls) to psi_a, psi_b = score(y, d, l_hat, m_hat, g_hat, smpls).

PLIV

  • API changed from DoubleMLPLIV(obj_dml_data, ml_g, ml_m, ml_r [, ...]) to DoubleMLPLIV(obj_dml_data, ml_g, ml_m, ml_r, ml_g [, ...]).
    • For score='partialling out' ml_l, ml_m & ml_r are needed.
    • For score='IV-type' ml_l, ml_m, ml_r & ml_g.
    • For callable scores ml_l, ml_m & ml_r are mandatory and ml_g optional.
  • The signature of callable scores changed from psi_a, psi_b = score(y, z, d, g_hat, m_hat, r_hat, smpls) to psi_a, psi_b = score(y, z, d, l_hat, m_hat, r_hat, g_hat, smpls).

Depreciation warnings for the API changes for DoubleMLPLR and DoubleMLPLIV

  • Initialization code for the following code examples:
import numpy as np
import doubleml as dml
from doubleml.datasets import make_plr_CCDDHNR2018, make_pliv_CHS2015
from sklearn.ensemble import RandomForestRegressor
from sklearn.base import clone

learner = RandomForestRegressor(n_estimators=100, max_features=20, max_depth=5, min_samples_leaf=2)
ml_l = clone(learner)
ml_m = clone(learner)
ml_r = clone(learner)
ml_g = clone(learner)
plr_data = make_plr_CCDDHNR2018(n_obs=500)
pliv_data = make_pliv_CHS2015(n_obs=500)
  • For PLR & PLIV with score='partialling out' and if the learners are provided as positional arguments, nothing changed.
dml_plr_obj = dml.DoubleMLPLR(plr_data, ml_l, ml_m, score='partialling out')
dml_pliv_obj = dml.DoubleMLPLIV(pliv_data, ml_l, ml_m, ml_r, score='partialling out')

-- >Note however that, if, besides the learner, other arguments have also been provided as positional arguments, the changed API causes exceptions because the additional learner was added as fourth (PLR) / fifth (PLIV) argument

  • For PLR with score='partialling out' and keyword arguments ml_g and ml_m (old API naming), the learner provided for ml_g is used for ml_l and a warning is issued.
dml_plr_obj = dml.DoubleMLPLR(plr_data, ml_g=ml_g, ml_m=ml_m, score='partialling out')
DeprecationWarning: The required positional argument ml_g was renamed to ml_l. Please adapt the argument name accordingly. ml_g is redirected to ml_l. The redirection will be removed in a future version.
  • For PLR with score='IV-type' and keyword arguments ml_g and ml_m (old API naming), the learner provided for ml_g is also used for ml_l and a warning is issued. (Note it is first redirected to ml_l and then cloned to ml_g)
dml_plr_obj = dml.DoubleMLPLR(plr_data, ml_g=ml_g, ml_m=ml_m, score='IV-type')
DeprecationWarning: The required positional argument ml_g was renamed to ml_l. Please adapt the argument name accordingly. ml_g is redirected to ml_l. The redirection will be removed in a future version.
UserWarning: For score = 'IV-type', learners ml_l and ml_g should be specified. Set ml_g = clone(ml_l).
  • For PLR with score='IV-type' and only two learners as positional arguments, the learner provided for ml_g is used for ml_l and a warning is issued.
dml_plr_obj = dml.DoubleMLPLR(plr_data, ml_l, ml_m, score='IV-type')
UserWarning: For score = 'IV-type', learners ml_l and ml_g should be specified. Set ml_g = clone(ml_l).
  • For PLR & PLIV with score score='partialling out', the methods set_ml_nuisance_params and tune redirect ml_g to ml_l.
dml_plr_obj = dml.DoubleMLPLR(plr_data, ml_l, ml_m, score='partialling out')
dml_plr_obj.set_ml_nuisance_params('ml_g', 'd', {'n_estimators':100, 'max_features':20})
DeprecationWarning: Learner ml_g was renamed to ml_l. Please adapt the argument learner accordingly. The provided parameters are set for ml_l. The redirection will be removed in a future version.

Miscellaneous

PR Checklist

  • The title of the pull request summarizes the changes made.
  • The PR contains a detailed description of all changes and additions.
  • The code passes all (unit) tests.
  • Enhancements or new feature are equipped with unit tests.
  • The changes adhere to the PEP8 standards.

@MalteKurz MalteKurz added the enhancement extension of existing feature label May 20, 2022
Copy link
Member

@PhilippBach PhilippBach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement extension of existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants