|
| 1 | +.. _heterogeneity: |
| 2 | + |
| 3 | +Heterogeneous Treatment Effects |
| 4 | +---------------------------------------- |
| 5 | + |
| 6 | +All implemented solutions focus on the :ref:`IRM <irm-model>` or :ref:`IIVM <iivm-model>` models, as for |
| 7 | +the :ref:`PLR <plr-model>` and :ref:`PLIV <pliv-model>` models heterogeneous treatment effects can be usually modelled |
| 8 | +via feature construction. |
| 9 | + |
| 10 | + |
| 11 | +.. _gates: |
| 12 | + |
| 13 | +Group Average Treatment Effects (GATEs) |
| 14 | +++++++++++++++++++++++++++++++++++++++++++++++ |
| 15 | + |
| 16 | +The ``DoubleMLIRM`` class contains the ``gate()`` method, which enables the estimation and construction of confidence intervals |
| 17 | +for GATEs after fitting the ``DoubleMLIRM`` object. To estimate GATEs, the user has to specify a pandas ``DataFrame`` containing |
| 18 | +the groups (dummy coded or one column with strings). |
| 19 | +This will construct and fit a ``DoubleMLBLP`` object. Confidence intervals can then be constructed via |
| 20 | +the ``confint()`` method. Jointly valid confidence intervals will be based on a gaussian multiplier bootstrap. |
| 21 | + |
| 22 | +.. tabbed:: Python |
| 23 | + |
| 24 | + .. ipython:: python |
| 25 | +
|
| 26 | + import numpy as np |
| 27 | + import pandas as pd |
| 28 | + import doubleml as dml |
| 29 | + from doubleml.datasets import make_irm_data |
| 30 | + from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier |
| 31 | +
|
| 32 | + ml_g = RandomForestRegressor(n_estimators=100, max_features=20, max_depth=5, min_samples_leaf=2) |
| 33 | + ml_m = RandomForestClassifier(n_estimators=100, max_features=20, max_depth=5, min_samples_leaf=2) |
| 34 | + np.random.seed(3333) |
| 35 | + data = make_irm_data(theta=0.5, n_obs=500, dim_x=20, return_type='DataFrame') |
| 36 | + obj_dml_data = dml.DoubleMLData(data, 'y', 'd') |
| 37 | + dml_irm_obj = dml.DoubleMLIRM(obj_dml_data, ml_g, ml_m) |
| 38 | + _ = dml_irm_obj.fit() |
| 39 | +
|
| 40 | + # define groups |
| 41 | + np.random.seed(42) |
| 42 | + groups = pd.DataFrame(np.random.choice(3, 500), columns=['Group'], dtype=str) |
| 43 | + print(groups.head()) |
| 44 | +
|
| 45 | + gate_obj = dml_irm_obj.gate(groups=groups) |
| 46 | + ci = gate_obj.confint() |
| 47 | + print(ci) |
| 48 | +
|
| 49 | +
|
| 50 | +A more detailed notebook on GATEs is available in the :ref:`example gallery <examplegallery>`. |
| 51 | + |
| 52 | +.. _cates: |
| 53 | + |
| 54 | +Conditional Average Treatment Effects (CATEs) |
| 55 | +++++++++++++++++++++++++++++++++++++++++++++++ |
| 56 | + |
| 57 | +The ``DoubleMLIRM`` class contains the ``cate()`` method, which enables the estimation and construction of confidence intervals |
| 58 | +for CATEs after fitting the ``DoubleMLIRM`` object. To estimate CATEs, the user has to specify a pandas ``DataFrame`` containing |
| 59 | +the basis (e.g. B-splines) for the conditional treatment effects. |
| 60 | +This will construct and fit a ``DoubleMLBLP`` object. Confidence intervals can then be constructed via |
| 61 | +the ``confint()`` method. Jointly valid confidence intervals will be based on a gaussian multiplier bootstrap. |
| 62 | + |
| 63 | +.. tabbed:: Python |
| 64 | + |
| 65 | + .. ipython:: python |
| 66 | +
|
| 67 | + import numpy as np |
| 68 | + import pandas as pd |
| 69 | + import patsy |
| 70 | +
|
| 71 | + import doubleml as dml |
| 72 | + from doubleml.datasets import make_irm_data |
| 73 | + from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier |
| 74 | +
|
| 75 | + ml_g = RandomForestRegressor(n_estimators=100, max_features=20, max_depth=5, min_samples_leaf=2) |
| 76 | + ml_m = RandomForestClassifier(n_estimators=100, max_features=20, max_depth=5, min_samples_leaf=2) |
| 77 | + np.random.seed(3333) |
| 78 | + data = make_irm_data(theta=0.5, n_obs=500, dim_x=20, return_type='DataFrame') |
| 79 | + obj_dml_data = dml.DoubleMLData(data, 'y', 'd') |
| 80 | + dml_irm_obj = dml.DoubleMLIRM(obj_dml_data, ml_g, ml_m) |
| 81 | + _ = dml_irm_obj.fit() |
| 82 | +
|
| 83 | + # define a basis with respect to the first variable |
| 84 | + design_matrix = patsy.dmatrix("bs(x, df=5, degree=2)", {"x":obj_dml_data.data["X1"]}) |
| 85 | + spline_basis = pd.DataFrame(design_matrix) |
| 86 | + print(spline_basis.head()) |
| 87 | +
|
| 88 | + cate_obj = dml_irm_obj.cate(basis=spline_basis) |
| 89 | + ci = cate_obj.confint() |
| 90 | + print(ci.head()) |
| 91 | +
|
| 92 | +
|
| 93 | +A more detailed notebook on CATEs is available in the :ref:`example gallery <examplegallery>`. |
| 94 | +The examples also include the construction of a two-dimensional basis with B-splines. |
| 95 | + |
| 96 | +.. _qtes: |
| 97 | + |
| 98 | +Quantiles |
| 99 | +++++++++++++++++++++++++++++++++++++++++++++++ |
| 100 | + |
| 101 | +The :ref:`DoubleML <doubleml-package>` package includes (local) quantile estimation for potential outcomes for |
| 102 | +:ref:`IRM <irm-model>` and :ref:`IIVM <iivm-model>` models. |
| 103 | + |
| 104 | +Potential Quantiles (PQs) |
| 105 | +******************************************* |
| 106 | + |
| 107 | +.. include:: ../shared/heterogeneity/pq.rst |
| 108 | + |
| 109 | +``DoubleMLPQ`` implements potential quantile estimation. Estimation is conducted via its ``fit()`` method: |
| 110 | + |
| 111 | +.. tabbed:: Python |
| 112 | + |
| 113 | + .. ipython:: python |
| 114 | +
|
| 115 | + import numpy as np |
| 116 | + import doubleml as dml |
| 117 | + from doubleml.datasets import make_irm_data |
| 118 | + from sklearn.ensemble import RandomForestClassifier |
| 119 | + np.random.seed(3141) |
| 120 | + ml_g = RandomForestClassifier(n_estimators=100, max_features=20, max_depth=10, min_samples_leaf=2) |
| 121 | + ml_m = RandomForestClassifier(n_estimators=100, max_features=20, max_depth=10, min_samples_leaf=2) |
| 122 | + data = make_irm_data(theta=0.5, n_obs=500, dim_x=20, return_type='DataFrame') |
| 123 | + obj_dml_data = dml.DoubleMLData(data, 'y', 'd') |
| 124 | + dml_pq_obj = dml.DoubleMLPQ(obj_dml_data, ml_g, ml_m, treatment=1, quantile=0.5) |
| 125 | + dml_pq_obj.fit().summary |
| 126 | +
|
| 127 | +``DoubleMLLPQ`` implements local potential quantile estimation. Estimation is conducted via its ``fit()`` method: |
| 128 | + |
| 129 | +.. tabbed:: Python |
| 130 | + |
| 131 | + .. ipython:: python |
| 132 | +
|
| 133 | + import numpy as np |
| 134 | + import doubleml as dml |
| 135 | + from doubleml.datasets import make_iivm_data |
| 136 | + from sklearn.ensemble import RandomForestClassifier |
| 137 | + np.random.seed(3141) |
| 138 | + ml_g = RandomForestClassifier(n_estimators=100, max_features=20, max_depth=10, min_samples_leaf=2) |
| 139 | + ml_m = RandomForestClassifier(n_estimators=100, max_features=20, max_depth=10, min_samples_leaf=2) |
| 140 | + data = make_iivm_data(theta=0.5, n_obs=1000, dim_x=20, return_type='DataFrame') |
| 141 | + obj_dml_data = dml.DoubleMLData(data, 'y', 'd', z_cols='z') |
| 142 | + dml_lpq_obj = dml.DoubleMLLPQ(obj_dml_data, ml_g, ml_m, treatment=1, quantile=0.5) |
| 143 | + dml_lpq_obj.fit().summary |
| 144 | +
|
| 145 | +Quantile Treatment Effects (QTEs) |
| 146 | +******************************************* |
| 147 | + |
| 148 | +A detailed notebook on PQs and QTEs is available in the :ref:`example gallery <examplegallery>`. |
| 149 | + |
| 150 | +Conditional Value at Risk (CVaR) |
| 151 | +++++++++++++++++++++++++++++++++++++++++++++ |
| 152 | + |
| 153 | +All implemented solutions focus on the :ref:`IRM <irm-model>` models |
| 154 | + |
| 155 | +CVaR of Potential Outcomes |
| 156 | +******************************************* |
| 157 | + |
| 158 | +CVaR Treatment Effect |
| 159 | +******************************************* |
| 160 | + |
| 161 | +A detailed notebook on conditional value at risk estimation |
| 162 | +is available in the :ref:`example gallery <examplegallery>`. |
| 163 | + |
| 164 | + |
| 165 | + |
| 166 | + |
0 commit comments