|
12 | 12 | ]
|
13 | 13 | },
|
14 | 14 | {
|
| 15 | + "attachments": {}, |
15 | 16 | "cell_type": "markdown",
|
16 | 17 | "metadata": {
|
17 | 18 | "collapsed": false
|
18 | 19 | },
|
19 | 20 | "source": [
|
20 | 21 | "## Data\n",
|
21 | 22 | "\n",
|
22 |
| - "We define a data generating process to create synthetic data to compare the estimates to the true effect. The data generating process is based on the Monte Carlo simulation from this [paper](https://arxiv.org/abs/1806.03467) and this implementation from [EconML](https://github.com/microsoft/EconML)." |
| 23 | + "We define a data generating process to create synthetic data to compare the estimates to the true effect. The data generating process is based on the Monte Carlo simulation from [Oprescu et al. (2019)](http://proceedings.mlr.press/v97/oprescu19a.html) and this [notebook](https://github.com/py-why/EconML/blob/main/notebooks/Causal%20Forest%20and%20Orthogonal%20Random%20Forest%20Examples.ipynb) from [EconML](https://github.com/py-why/EconML)." |
23 | 24 | ]
|
24 | 25 | },
|
25 | 26 | {
|
|
80 | 81 | " return te\n",
|
81 | 82 | "\n",
|
82 | 83 | "def create_synthetic_data(n_samples=200, n_w=30, support_size=5, n_x=1):\n",
|
83 |
| - " \"\"\"\n", |
84 |
| - " Creates a simple synthetic example for conditional treatment effects.\n", |
85 |
| - "\n", |
86 |
| - " Parameters\n", |
87 |
| - " ----------\n", |
88 |
| - " n_samples : int\n", |
89 |
| - " Number of samples.\n", |
90 |
| - " Default is ``200``.\n", |
91 |
| - "\n", |
92 |
| - " n_w : int\n", |
93 |
| - " Dimension of covariates.\n", |
94 |
| - " Default is ``30``.\n", |
95 |
| - "\n", |
96 |
| - " support_size : int\n", |
97 |
| - " Number of relevant covariates.\n", |
98 |
| - " Default is ``5``.\n", |
99 |
| - "\n", |
100 |
| - " n_x : int\n", |
101 |
| - " Dimension of treatment variable.\n", |
102 |
| - " Default is ``1``.\n", |
103 |
| - "\n", |
104 |
| - " Returns\n", |
105 |
| - " -------\n", |
106 |
| - " data : pd.DataFrame\n", |
107 |
| - " A data frame.\n", |
108 |
| - "\n", |
109 |
| - " \"\"\"\n", |
110 | 84 | " # Outcome support\n",
|
111 | 85 | " # With the next two lines we are effectively choosing the matrix gamma in the example\n",
|
112 | 86 | " support_y = np.random.choice(np.arange(n_w), size=support_size, replace=False)\n",
|
|
219 | 193 | ]
|
220 | 194 | },
|
221 | 195 | {
|
| 196 | + "attachments": {}, |
222 | 197 | "cell_type": "markdown",
|
223 | 198 | "metadata": {
|
224 | 199 | "collapsed": false
|
225 | 200 | },
|
226 | 201 | "source": [
|
227 |
| - "To estimate the CATE, we rely on the best-linear-predictor of the linear score as in [Semenova et al.](https://doi.org/10.1093/ectj/utaa027) To approximate the target function $g(x)$ with a linear form, we have to define a data frame of basis functions. Here, we rely on [patsy](https://patsy.readthedocs.io/en/latest/) to construct a suitable basis of [B-splines](https://en.wikipedia.org/wiki/B-spline)." |
| 202 | + "To estimate the CATE, we rely on the best-linear-predictor of the linear score as in [Semenova et al. (2021)](https://doi.org/10.1093/ectj/utaa027) To approximate the target function $g(x)$ with a linear form, we have to define a data frame of basis functions. Here, we rely on [patsy](https://patsy.readthedocs.io/en/latest/) to construct a suitable basis of [B-splines](https://en.wikipedia.org/wiki/B-spline)." |
228 | 203 | ]
|
229 | 204 | },
|
230 | 205 | {
|
|
241 | 216 | ]
|
242 | 217 | },
|
243 | 218 | {
|
| 219 | + "attachments": {}, |
244 | 220 | "cell_type": "markdown",
|
245 | 221 | "metadata": {
|
246 | 222 | "collapsed": false
|
|
262 | 238 | ]
|
263 | 239 | },
|
264 | 240 | {
|
| 241 | + "attachments": {}, |
265 | 242 | "cell_type": "markdown",
|
266 | 243 | "metadata": {
|
267 | 244 | "collapsed": false
|
|
0 commit comments