Skip to content

Enable external_predictions for short model in benchmarks #238

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

lucien1011
Copy link

@lucien1011 lucien1011 commented Apr 6, 2024

Description

This pull request adds an optional input arguments fit_args to the function sensitivity_benchmark in the class DoubleML. Most importantly, this addition will enable the usage of external_predictions when fitting short models for sensitivity analysis.

The new argument has to be in a nested dictionary like this:

dataset = dml.DoubleMLData(
    df,
    y_col='y',
    d_col='d',
    x_cols=cov_cols,
    force_all_x_finite=False,
)

dml_irm = dml.DoubleMLIRM(
    dataset,
    ml_g=RandomForestRegressor(), #dummy learner only
    ml_l=RandomForestClassifier(),
)

# Some user-specific codes to calculate external_predictions and put in the following columns
# df['d_prop'] df['y_pred_d0'] df['y_pred_d1']

bm = dml_irm.sensitivity_benchmark(
    benchmarking_set=['covariate_to_be_tested'],
    fit_args=dict(
        external_predictions=(
            d=dict(
                ml_m=df['d_prop'].to_numpy().reshape(-1,1),
                ml_g0=df['y_pred_d0'].to_numpy().reshape(-1,1),
                ml_g1=df['y_pred_d1'].to_numpy().reshape(-1,1),
            ),
        ),
    ),
)

Reference to Issues or PRs

No related issues or PRs to my knowledge.

PR Checklist

Please fill out this PR checklist (see our contributing guidelines for details).

  • The title of the pull request summarizes the changes made.
  • The PR contains a detailed description of all changes and additions.
  • References to related issues or PRs are added.
  • The code passes all (unit) tests.
  • Enhancements or new feature are equipped with unit tests.
  • The changes adhere to the PEP8 standards.

@SvenKlaassen
Copy link
Member

Thanks @lucien1011. I really like this addition.

Maybe you can change the default value to None (to avoid mutatable defaults).
And a small unit test would also be great. Maybe add a small comparison to compare the external predictions to fitted learners (as in https://github.com/DoubleML/doubleml-for-py/blob/main/doubleml/plm/tests/test_plr_external_predictions.py) to https://github.com/DoubleML/doubleml-for-py/blob/main/doubleml/tests/test_sensitivity.py
Further, a unit test for exceptions of input arguments in https://github.com/DoubleML/doubleml-for-py/blob/main/doubleml/tests/test_exceptions_ext_preds.py would be nice.

@lucien1011
Copy link
Author

@SvenKlaassen Thanks for the comments. I will implement those accordingly and update this PR.

@lucien1011
Copy link
Author

@SvenKlaassen I have updated the PR with the following three items:

  1. Change the default value of fit_args to be None
  2. Added a unit test to test the type of fit_args (it must be a dictionary)
  3. Added a unit test to test the values of delta_theta with and without external_predictions.

@SvenKlaassen SvenKlaassen changed the base branch from main to s-ext-pred-benchmark April 11, 2024 18:29
@SvenKlaassen
Copy link
Member

I will check the coverage on a different branch and then merge into main

@SvenKlaassen SvenKlaassen merged commit 3769f81 into DoubleML:s-ext-pred-benchmark Apr 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants