extend guide for gate and cate

SvenKlaassen · SvenKlaassen · commit cc4659a37e7c · 2023-03-24T11:49:37.000+01:00
diff --git a/doc/examples/index.rst b/doc/examples/index.rst
@@ -1,6 +1,8 @@
 
 :parenttoc: True
 
+.. _examplegallery:
+
 Examples
 ==========
 
diff --git a/doc/examples/py_double_ml_cate.ipynb b/doc/examples/py_double_ml_cate.ipynb
@@ -222,7 +222,7 @@
     "collapsed": false
    },
    "source": [
-    "To estimate the parameters to calculate the CATE estimate call the ``cate`` method and supply the dataframe of basis elements."
+    "To estimate the parameters to calculate the CATE estimate call the ``cate()`` method and supply the dataframe of basis elements."
    ]
   },
   {
@@ -244,7 +244,7 @@
     "collapsed": false
    },
    "source": [
-    "To obtain the confidence intervals for the CATE, we have to call the ``confint`` method and a supply a dataframe of basis elements.\n",
+    "To obtain the confidence intervals for the CATE, we have to call the ``confint()`` method and a supply a dataframe of basis elements.\n",
     "This could be the same basis as for fitting the CATE model or a new basis to e.g. evaluate the CATE model on a grid.\n",
     "Here, we will evaluate the CATE on a grid from 0.1 to 0.9 to plot the final results.\n",
     "Further, we construct uniform confidence intervals by setting the option ``joint`` and providing a number of bootstrap repetitions ``n_rep_boot``."
diff --git a/doc/examples/py_double_ml_gate.ipynb b/doc/examples/py_double_ml_gate.ipynb
@@ -261,7 +261,7 @@
     "collapsed": false
    },
    "source": [
-    "To calculate GATEs just call the ``gate`` method and supply the DataFrame with the group definitions and the ``level`` (with default of ``0.95``)."
+    "To calculate GATEs just call the ``gate()`` method and supply the DataFrame with the group definitions and the ``level`` (with default of ``0.95``)."
    ]
   },
   {
diff --git a/doc/guide/guide.rst b/doc/guide/guide.rst
@@ -12,6 +12,7 @@ User guide
     The basics of double/debiased machine learning <basics>
     The data-backend DoubleMLData <data_backend>
     Models <models>
+    Heterogeneous Treatment Effects <heterogeneity>
     Score functions <scores>
     Double machine learning algorithms <algorithms>
     Learners, hyperparameters and hyperparameter tuning <learners>
@@ -21,7 +22,7 @@ User guide
 
 
 .. raw:: html
-   
+
    <style>
   /* Border radius parameter */
   :root {
diff --git a/doc/guide/heterogeneity.rst b/doc/guide/heterogeneity.rst
@@ -0,0 +1,166 @@
+.. _heterogeneity:
+
+Heterogeneous Treatment Effects
+----------------------------------------
+
+All implemented solutions focus on the :ref:`IRM <irm-model>` or :ref:`IIVM <iivm-model>` models, as for 
+the :ref:`PLR <plr-model>` and :ref:`PLIV <pliv-model>` models heterogeneous treatment effects can be usually modelled 
+via feature construction.
+
+
+.. _gates:
+
+Group Average Treatment Effects (GATEs)
+++++++++++++++++++++++++++++++++++++++++++++++
+
+The ``DoubleMLIRM`` class contains the ``gate()`` method, which enables the estimation and construction of confidence intervals
+for GATEs after fitting the ``DoubleMLIRM`` object. To estimate GATEs, the user has to specify a pandas ``DataFrame`` containing
+the groups (dummy coded or one column with strings).
+This will construct and fit a ``DoubleMLBLP`` object. Confidence intervals can then be constructed via 
+the ``confint()`` method. Jointly valid confidence intervals will be based on a gaussian multiplier bootstrap.
+
+.. tabbed:: Python
+
+    .. ipython:: python
+
+        import numpy as np
+        import pandas as pd
+        import doubleml as dml
+        from doubleml.datasets import make_irm_data
+        from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
+
+        ml_g = RandomForestRegressor(n_estimators=100, max_features=20, max_depth=5, min_samples_leaf=2)
+        ml_m = RandomForestClassifier(n_estimators=100, max_features=20, max_depth=5, min_samples_leaf=2)
+        np.random.seed(3333)
+        data = make_irm_data(theta=0.5, n_obs=500, dim_x=20, return_type='DataFrame')
+        obj_dml_data = dml.DoubleMLData(data, 'y', 'd')
+        dml_irm_obj = dml.DoubleMLIRM(obj_dml_data, ml_g, ml_m)
+        _ = dml_irm_obj.fit()
+
+        # define groups
+        np.random.seed(42)
+        groups = pd.DataFrame(np.random.choice(3, 500), columns=['Group'], dtype=str)
+        print(groups.head())
+
+        gate_obj = dml_irm_obj.gate(groups=groups)
+        ci = gate_obj.confint()
+        print(ci)
+
+
+A more detailed notebook on GATEs is available in the :ref:`example gallery <examplegallery>`.
+
+.. _cates:
+
+Conditional Average Treatment Effects (CATEs)
+++++++++++++++++++++++++++++++++++++++++++++++
+
+The ``DoubleMLIRM`` class contains the ``cate()`` method, which enables the estimation and construction of confidence intervals
+for CATEs after fitting the ``DoubleMLIRM`` object. To estimate CATEs, the user has to specify a pandas ``DataFrame`` containing
+the basis (e.g. B-splines) for the conditional treatment effects.
+This will construct and fit a ``DoubleMLBLP`` object. Confidence intervals can then be constructed via 
+the ``confint()`` method. Jointly valid confidence intervals will be based on a gaussian multiplier bootstrap.
+
+.. tabbed:: Python
+
+    .. ipython:: python
+
+        import numpy as np
+        import pandas as pd
+        import patsy
+
+        import doubleml as dml
+        from doubleml.datasets import make_irm_data
+        from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
+
+        ml_g = RandomForestRegressor(n_estimators=100, max_features=20, max_depth=5, min_samples_leaf=2)
+        ml_m = RandomForestClassifier(n_estimators=100, max_features=20, max_depth=5, min_samples_leaf=2)
+        np.random.seed(3333)
+        data = make_irm_data(theta=0.5, n_obs=500, dim_x=20, return_type='DataFrame')
+        obj_dml_data = dml.DoubleMLData(data, 'y', 'd')
+        dml_irm_obj = dml.DoubleMLIRM(obj_dml_data, ml_g, ml_m)
+        _ = dml_irm_obj.fit()
+
+        # define a basis with respect to the first variable
+        design_matrix = patsy.dmatrix("bs(x, df=5, degree=2)", {"x":obj_dml_data.data["X1"]})
+        spline_basis = pd.DataFrame(design_matrix)
+        print(spline_basis.head())
+
+        cate_obj = dml_irm_obj.cate(basis=spline_basis)
+        ci = cate_obj.confint()
+        print(ci.head())
+
+
+A more detailed notebook on CATEs is available in the :ref:`example gallery <examplegallery>`. 
+The examples also include the construction of a two-dimensional basis with B-splines.
+
+.. _qtes:
+
+Quantiles
+++++++++++++++++++++++++++++++++++++++++++++++
+
+The :ref:`DoubleML <doubleml-package>` package includes (local) quantile estimation for potential outcomes for
+:ref:`IRM <irm-model>` and :ref:`IIVM <iivm-model>` models.
+
+Potential Quantiles (PQs)
+*******************************************
+
+.. include:: ../shared/heterogeneity/pq.rst
+
+``DoubleMLPQ`` implements potential quantile estimation. Estimation is conducted via its ``fit()`` method: 
+
+.. tabbed:: Python
+
+    .. ipython:: python
+
+        import numpy as np
+        import doubleml as dml
+        from doubleml.datasets import make_irm_data
+        from sklearn.ensemble import RandomForestClassifier
+        np.random.seed(3141)
+        ml_g = RandomForestClassifier(n_estimators=100, max_features=20, max_depth=10, min_samples_leaf=2)
+        ml_m = RandomForestClassifier(n_estimators=100, max_features=20, max_depth=10, min_samples_leaf=2)
+        data = make_irm_data(theta=0.5, n_obs=500, dim_x=20, return_type='DataFrame')
+        obj_dml_data = dml.DoubleMLData(data, 'y', 'd')
+        dml_pq_obj = dml.DoubleMLPQ(obj_dml_data, ml_g, ml_m, treatment=1, quantile=0.5)
+        dml_pq_obj.fit().summary
+
+``DoubleMLLPQ`` implements local potential quantile estimation. Estimation is conducted via its ``fit()`` method: 
+
+.. tabbed:: Python
+
+    .. ipython:: python
+
+        import numpy as np
+        import doubleml as dml
+        from doubleml.datasets import make_iivm_data
+        from sklearn.ensemble import RandomForestClassifier
+        np.random.seed(3141)
+        ml_g = RandomForestClassifier(n_estimators=100, max_features=20, max_depth=10, min_samples_leaf=2)
+        ml_m = RandomForestClassifier(n_estimators=100, max_features=20, max_depth=10, min_samples_leaf=2)
+        data = make_iivm_data(theta=0.5, n_obs=1000, dim_x=20, return_type='DataFrame')
+        obj_dml_data = dml.DoubleMLData(data, 'y', 'd', z_cols='z')
+        dml_lpq_obj = dml.DoubleMLLPQ(obj_dml_data, ml_g, ml_m, treatment=1, quantile=0.5)
+        dml_lpq_obj.fit().summary
+
+Quantile Treatment Effects (QTEs)
+*******************************************
+
+A detailed notebook on PQs and QTEs is available in the :ref:`example gallery <examplegallery>`. 
+
+Conditional Value at Risk (CVaR)
+++++++++++++++++++++++++++++++++++++++++++++
+
+All implemented solutions focus on the :ref:`IRM <irm-model>` models
+
+CVaR of Potential Outcomes
+*******************************************
+
+CVaR Treatment Effect
+*******************************************
+
+A detailed notebook on conditional value at risk estimation
+is available in the :ref:`example gallery <examplegallery>`. 
+
+
+
+
diff --git a/doc/guide/models.rst b/doc/guide/models.rst
@@ -3,6 +3,8 @@
 Models
 ----------
 
+The :ref:`DoubleML <doubleml-package>` includes the following models.
+
 .. _plr-model:
 
 Partially linear regression model (PLR)
@@ -55,6 +57,8 @@ Estimation is conducted via its ``fit()`` method:
         print(dml_plr_obj)
 
 
+.. _pliv-model:
+
 Partially linear IV regression model (PLIV)
 +++++++++++++++++++++++++++++++++++++++++++
 
@@ -155,6 +159,7 @@ Estimation is conducted via its ``fit()`` method:
         print(dml_irm_obj)
 
 
+.. _iivm-model:
 
 Interactive IV model (IIVM)
 +++++++++++++++++++++++++++
diff --git a/doc/shared/heterogeneity/pq.rst b/doc/shared/heterogeneity/pq.rst
@@ -0,0 +1,14 @@
+For a quantile :math:`\tau \in (0,1)` the target parameters :math:`\theta_0` of interest are the **potential quantile (PQ)**,
+
+.. math::
+
+    P(Y(d) \le \theta_0) = \tau,
+
+and **local potential quantiles (LPQs)**,
+
+.. math::
+
+    P(Y(d) \le \theta_0|\text{Compliers}) = \tau.
+
+where :math:`Y(d)` denotes the potential outcome with :math:`d \in \{0, 1\}`.
+

Original file line number	Diff line number	Diff line change
`@@ -261,7 +261,7 @@`
`261`	`261`	`"collapsed": false`
`262`	`262`	`},`
`263`	`263`	`"source": [`
`264`		- "To calculate GATEs just call the ``gate`` method and supply the DataFrame with the group definitions and the ``level`` (with default of ``0.95``)."
	`264`	+ "To calculate GATEs just call the ``gate()`` method and supply the DataFrame with the group definitions and the ``level`` (with default of ``0.95``)."
`265`	`265`	`]`
`266`	`266`	`},`
`267`	`267`	`{`