Skip to content

Commit cc4659a

Browse files
committed
extend guide for gate and cate
1 parent 5da0317 commit cc4659a

File tree

7 files changed

+192
-4
lines changed

7 files changed

+192
-4
lines changed

doc/examples/index.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11

22
:parenttoc: True
33

4+
.. _examplegallery:
5+
46
Examples
57
==========
68

doc/examples/py_double_ml_cate.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -222,7 +222,7 @@
222222
"collapsed": false
223223
},
224224
"source": [
225-
"To estimate the parameters to calculate the CATE estimate call the ``cate`` method and supply the dataframe of basis elements."
225+
"To estimate the parameters to calculate the CATE estimate call the ``cate()`` method and supply the dataframe of basis elements."
226226
]
227227
},
228228
{
@@ -244,7 +244,7 @@
244244
"collapsed": false
245245
},
246246
"source": [
247-
"To obtain the confidence intervals for the CATE, we have to call the ``confint`` method and a supply a dataframe of basis elements.\n",
247+
"To obtain the confidence intervals for the CATE, we have to call the ``confint()`` method and a supply a dataframe of basis elements.\n",
248248
"This could be the same basis as for fitting the CATE model or a new basis to e.g. evaluate the CATE model on a grid.\n",
249249
"Here, we will evaluate the CATE on a grid from 0.1 to 0.9 to plot the final results.\n",
250250
"Further, we construct uniform confidence intervals by setting the option ``joint`` and providing a number of bootstrap repetitions ``n_rep_boot``."

doc/examples/py_double_ml_gate.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -261,7 +261,7 @@
261261
"collapsed": false
262262
},
263263
"source": [
264-
"To calculate GATEs just call the ``gate`` method and supply the DataFrame with the group definitions and the ``level`` (with default of ``0.95``)."
264+
"To calculate GATEs just call the ``gate()`` method and supply the DataFrame with the group definitions and the ``level`` (with default of ``0.95``)."
265265
]
266266
},
267267
{

doc/guide/guide.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ User guide
1212
The basics of double/debiased machine learning <basics>
1313
The data-backend DoubleMLData <data_backend>
1414
Models <models>
15+
Heterogeneous Treatment Effects <heterogeneity>
1516
Score functions <scores>
1617
Double machine learning algorithms <algorithms>
1718
Learners, hyperparameters and hyperparameter tuning <learners>
@@ -21,7 +22,7 @@ User guide
2122

2223

2324
.. raw:: html
24-
25+
2526
<style>
2627
/* Border radius parameter */
2728
:root {

doc/guide/heterogeneity.rst

Lines changed: 166 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,166 @@
1+
.. _heterogeneity:
2+
3+
Heterogeneous Treatment Effects
4+
----------------------------------------
5+
6+
All implemented solutions focus on the :ref:`IRM <irm-model>` or :ref:`IIVM <iivm-model>` models, as for
7+
the :ref:`PLR <plr-model>` and :ref:`PLIV <pliv-model>` models heterogeneous treatment effects can be usually modelled
8+
via feature construction.
9+
10+
11+
.. _gates:
12+
13+
Group Average Treatment Effects (GATEs)
14+
++++++++++++++++++++++++++++++++++++++++++++++
15+
16+
The ``DoubleMLIRM`` class contains the ``gate()`` method, which enables the estimation and construction of confidence intervals
17+
for GATEs after fitting the ``DoubleMLIRM`` object. To estimate GATEs, the user has to specify a pandas ``DataFrame`` containing
18+
the groups (dummy coded or one column with strings).
19+
This will construct and fit a ``DoubleMLBLP`` object. Confidence intervals can then be constructed via
20+
the ``confint()`` method. Jointly valid confidence intervals will be based on a gaussian multiplier bootstrap.
21+
22+
.. tabbed:: Python
23+
24+
.. ipython:: python
25+
26+
import numpy as np
27+
import pandas as pd
28+
import doubleml as dml
29+
from doubleml.datasets import make_irm_data
30+
from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
31+
32+
ml_g = RandomForestRegressor(n_estimators=100, max_features=20, max_depth=5, min_samples_leaf=2)
33+
ml_m = RandomForestClassifier(n_estimators=100, max_features=20, max_depth=5, min_samples_leaf=2)
34+
np.random.seed(3333)
35+
data = make_irm_data(theta=0.5, n_obs=500, dim_x=20, return_type='DataFrame')
36+
obj_dml_data = dml.DoubleMLData(data, 'y', 'd')
37+
dml_irm_obj = dml.DoubleMLIRM(obj_dml_data, ml_g, ml_m)
38+
_ = dml_irm_obj.fit()
39+
40+
# define groups
41+
np.random.seed(42)
42+
groups = pd.DataFrame(np.random.choice(3, 500), columns=['Group'], dtype=str)
43+
print(groups.head())
44+
45+
gate_obj = dml_irm_obj.gate(groups=groups)
46+
ci = gate_obj.confint()
47+
print(ci)
48+
49+
50+
A more detailed notebook on GATEs is available in the :ref:`example gallery <examplegallery>`.
51+
52+
.. _cates:
53+
54+
Conditional Average Treatment Effects (CATEs)
55+
++++++++++++++++++++++++++++++++++++++++++++++
56+
57+
The ``DoubleMLIRM`` class contains the ``cate()`` method, which enables the estimation and construction of confidence intervals
58+
for CATEs after fitting the ``DoubleMLIRM`` object. To estimate CATEs, the user has to specify a pandas ``DataFrame`` containing
59+
the basis (e.g. B-splines) for the conditional treatment effects.
60+
This will construct and fit a ``DoubleMLBLP`` object. Confidence intervals can then be constructed via
61+
the ``confint()`` method. Jointly valid confidence intervals will be based on a gaussian multiplier bootstrap.
62+
63+
.. tabbed:: Python
64+
65+
.. ipython:: python
66+
67+
import numpy as np
68+
import pandas as pd
69+
import patsy
70+
71+
import doubleml as dml
72+
from doubleml.datasets import make_irm_data
73+
from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
74+
75+
ml_g = RandomForestRegressor(n_estimators=100, max_features=20, max_depth=5, min_samples_leaf=2)
76+
ml_m = RandomForestClassifier(n_estimators=100, max_features=20, max_depth=5, min_samples_leaf=2)
77+
np.random.seed(3333)
78+
data = make_irm_data(theta=0.5, n_obs=500, dim_x=20, return_type='DataFrame')
79+
obj_dml_data = dml.DoubleMLData(data, 'y', 'd')
80+
dml_irm_obj = dml.DoubleMLIRM(obj_dml_data, ml_g, ml_m)
81+
_ = dml_irm_obj.fit()
82+
83+
# define a basis with respect to the first variable
84+
design_matrix = patsy.dmatrix("bs(x, df=5, degree=2)", {"x":obj_dml_data.data["X1"]})
85+
spline_basis = pd.DataFrame(design_matrix)
86+
print(spline_basis.head())
87+
88+
cate_obj = dml_irm_obj.cate(basis=spline_basis)
89+
ci = cate_obj.confint()
90+
print(ci.head())
91+
92+
93+
A more detailed notebook on CATEs is available in the :ref:`example gallery <examplegallery>`.
94+
The examples also include the construction of a two-dimensional basis with B-splines.
95+
96+
.. _qtes:
97+
98+
Quantiles
99+
++++++++++++++++++++++++++++++++++++++++++++++
100+
101+
The :ref:`DoubleML <doubleml-package>` package includes (local) quantile estimation for potential outcomes for
102+
:ref:`IRM <irm-model>` and :ref:`IIVM <iivm-model>` models.
103+
104+
Potential Quantiles (PQs)
105+
*******************************************
106+
107+
.. include:: ../shared/heterogeneity/pq.rst
108+
109+
``DoubleMLPQ`` implements potential quantile estimation. Estimation is conducted via its ``fit()`` method:
110+
111+
.. tabbed:: Python
112+
113+
.. ipython:: python
114+
115+
import numpy as np
116+
import doubleml as dml
117+
from doubleml.datasets import make_irm_data
118+
from sklearn.ensemble import RandomForestClassifier
119+
np.random.seed(3141)
120+
ml_g = RandomForestClassifier(n_estimators=100, max_features=20, max_depth=10, min_samples_leaf=2)
121+
ml_m = RandomForestClassifier(n_estimators=100, max_features=20, max_depth=10, min_samples_leaf=2)
122+
data = make_irm_data(theta=0.5, n_obs=500, dim_x=20, return_type='DataFrame')
123+
obj_dml_data = dml.DoubleMLData(data, 'y', 'd')
124+
dml_pq_obj = dml.DoubleMLPQ(obj_dml_data, ml_g, ml_m, treatment=1, quantile=0.5)
125+
dml_pq_obj.fit().summary
126+
127+
``DoubleMLLPQ`` implements local potential quantile estimation. Estimation is conducted via its ``fit()`` method:
128+
129+
.. tabbed:: Python
130+
131+
.. ipython:: python
132+
133+
import numpy as np
134+
import doubleml as dml
135+
from doubleml.datasets import make_iivm_data
136+
from sklearn.ensemble import RandomForestClassifier
137+
np.random.seed(3141)
138+
ml_g = RandomForestClassifier(n_estimators=100, max_features=20, max_depth=10, min_samples_leaf=2)
139+
ml_m = RandomForestClassifier(n_estimators=100, max_features=20, max_depth=10, min_samples_leaf=2)
140+
data = make_iivm_data(theta=0.5, n_obs=1000, dim_x=20, return_type='DataFrame')
141+
obj_dml_data = dml.DoubleMLData(data, 'y', 'd', z_cols='z')
142+
dml_lpq_obj = dml.DoubleMLLPQ(obj_dml_data, ml_g, ml_m, treatment=1, quantile=0.5)
143+
dml_lpq_obj.fit().summary
144+
145+
Quantile Treatment Effects (QTEs)
146+
*******************************************
147+
148+
A detailed notebook on PQs and QTEs is available in the :ref:`example gallery <examplegallery>`.
149+
150+
Conditional Value at Risk (CVaR)
151+
++++++++++++++++++++++++++++++++++++++++++++
152+
153+
All implemented solutions focus on the :ref:`IRM <irm-model>` models
154+
155+
CVaR of Potential Outcomes
156+
*******************************************
157+
158+
CVaR Treatment Effect
159+
*******************************************
160+
161+
A detailed notebook on conditional value at risk estimation
162+
is available in the :ref:`example gallery <examplegallery>`.
163+
164+
165+
166+

doc/guide/models.rst

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,8 @@
33
Models
44
----------
55

6+
The :ref:`DoubleML <doubleml-package>` includes the following models.
7+
68
.. _plr-model:
79

810
Partially linear regression model (PLR)
@@ -55,6 +57,8 @@ Estimation is conducted via its ``fit()`` method:
5557
print(dml_plr_obj)
5658

5759

60+
.. _pliv-model:
61+
5862
Partially linear IV regression model (PLIV)
5963
+++++++++++++++++++++++++++++++++++++++++++
6064

@@ -155,6 +159,7 @@ Estimation is conducted via its ``fit()`` method:
155159
print(dml_irm_obj)
156160

157161

162+
.. _iivm-model:
158163

159164
Interactive IV model (IIVM)
160165
+++++++++++++++++++++++++++

doc/shared/heterogeneity/pq.rst

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
For a quantile :math:`\tau \in (0,1)` the target parameters :math:`\theta_0` of interest are the **potential quantile (PQ)**,
2+
3+
.. math::
4+
5+
P(Y(d) \le \theta_0) = \tau,
6+
7+
and **local potential quantiles (LPQs)**,
8+
9+
.. math::
10+
11+
P(Y(d) \le \theta_0|\text{Compliers}) = \tau.
12+
13+
where :math:`Y(d)` denotes the potential outcome with :math:`d \in \{0, 1\}`.
14+

0 commit comments

Comments
 (0)