Skip to content

Commit 072bf44

Browse files
authored
Merge pull request #148 from StochasticTree/docs-update-0.1.1
Updating python package and docs
2 parents 8dc5a92 + 3906791 commit 072bf44

13 files changed

+90
-180
lines changed

NEWS.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
1+
# stochtree 0.1.1
2+
3+
* Fixed initialization bug in several R package code examples for random effects models
4+
15
# stochtree 0.1.0
26

37
* Initial release on CRAN.

README.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,13 @@ pip install matplotlib seaborn jupyterlab
9090

9191
# R Package
9292

93-
The package can be installed in R via
93+
The R package can be installed from CRAN via
94+
95+
```
96+
install.packages("stochtree")
97+
```
98+
99+
The development version of `stochtree` can be installed from Github via
94100

95101
```
96102
remotes::install_github("StochasticTree/stochtree", ref="r-dev")

demo/notebooks/causal_inference.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
"cell_type": "markdown",
55
"metadata": {},
66
"source": [
7-
"# Causal Inference Demo Notebook"
7+
"# Causal Inference"
88
]
99
},
1010
{

demo/notebooks/causal_inference_feature_subsets.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
"cell_type": "markdown",
55
"metadata": {},
66
"source": [
7-
"# Causal Inference with Feature Subsets Demo Notebook\n",
7+
"# Causal Inference with Feature Subsets\n",
88
"\n",
99
"This is a duplicate of the main causal inference demo which shows how a user might decide to use only a subset of covariates in the treatment effect forest. \n",
1010
"Why might we want to do that? Well, in many cases it is plausible that some covariates (for example age, income, etc...) influence the outcome of interest \n",

demo/notebooks/heteroskedastic_supervised_learning.ipynb

Lines changed: 7 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
"cell_type": "markdown",
55
"metadata": {},
66
"source": [
7-
"# Supervised Learning with Heteroskedasticity Demo Notebook"
7+
"# Heteroskedastic Supervised Learning"
88
]
99
},
1010
{
@@ -118,13 +118,6 @@
118118
"s_x_test = s_x[test_inds]\n"
119119
]
120120
},
121-
{
122-
"cell_type": "markdown",
123-
"metadata": {},
124-
"source": [
125-
"## Demo 1: Using `W` in a linear leaf regression"
126-
]
127-
},
128121
{
129122
"cell_type": "markdown",
130123
"metadata": {},
@@ -139,9 +132,12 @@
139132
"outputs": [],
140133
"source": [
141134
"bart_model = BARTModel()\n",
142-
"bart_params = {'num_trees_mean': 100, 'num_trees_variance': 50, 'sample_sigma_global': True, 'sample_sigma_leaf': False}\n",
135+
"global_params = {'sample_sigma2_global': True}\n",
136+
"mean_params = {'num_trees': 100, 'sample_sigma2_leaf': False}\n",
137+
"variance_params = {'num_trees': 50}\n",
143138
"bart_model.sample(X_train=X_train, y_train=y_train, X_test=X_test, basis_train=basis_train, basis_test=basis_test,\n",
144-
" num_gfr=10, num_mcmc=100, params=bart_params)"
139+
" num_gfr=10, num_mcmc=100, general_params=global_params, mean_forest_params=mean_params, \n",
140+
" variance_forest_params=variance_params)"
145141
]
146142
},
147143
{
@@ -171,7 +167,7 @@
171167
"metadata": {},
172168
"outputs": [],
173169
"source": [
174-
"forest_preds_s_x_mcmc = bart_model.sigma_x_test\n",
170+
"forest_preds_s_x_mcmc = np.sqrt(bart_model.sigma2_x_test)\n",
175171
"s_x_avg_mcmc = np.squeeze(forest_preds_s_x_mcmc).mean(axis = 1, keepdims = True)\n",
176172
"s_x_df_mcmc = pd.DataFrame(np.concatenate((np.expand_dims(s_x_test,1), s_x_avg_mcmc), axis = 1), columns=[\"True standard deviation\", \"Average estimated standard deviation\"])\n",
177173
"sns.scatterplot(data=s_x_df_mcmc, x=\"Average estimated standard deviation\", y=\"True standard deviation\")\n",

demo/notebooks/multivariate_treatment_causal_inference.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
"cell_type": "markdown",
55
"metadata": {},
66
"source": [
7-
"# Causal Inference with Multivariate Treatments Demo Notebook"
7+
"# Multivariate Treatment Causal Inference"
88
]
99
},
1010
{
@@ -45,7 +45,7 @@
4545
"rng = np.random.default_rng()\n",
4646
"\n",
4747
"# Generate covariates and basis\n",
48-
"n = 5000\n",
48+
"n = 500\n",
4949
"p_X = 5\n",
5050
"X = rng.uniform(0, 1, (n, p_X))\n",
5151
"pi_X = np.c_[0.25 + 0.5*X[:,0], 0.75 - 0.5*X[:,1]]\n",

demo/notebooks/prototype_interface.ipynb

Lines changed: 12 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
"cell_type": "markdown",
55
"metadata": {},
66
"source": [
7-
"# Demo of the `StochTree` Prototype Interface"
7+
"# Low-Level Interface"
88
]
99
},
1010
{
@@ -106,7 +106,7 @@
106106
"rng = np.random.default_rng(random_seed)\n",
107107
"\n",
108108
"# Generate covariates and basis\n",
109-
"n = 1000\n",
109+
"n = 500\n",
110110
"p_X = 10\n",
111111
"p_W = 1\n",
112112
"X = rng.uniform(0, 1, (n, p_X))\n",
@@ -383,14 +383,14 @@
383383
"rng = np.random.default_rng(random_seed)\n",
384384
"\n",
385385
"# Generate covariates and basis\n",
386-
"n = 1000\n",
386+
"n = 500\n",
387387
"p_X = 5\n",
388388
"X = rng.uniform(0, 1, (n, p_X))\n",
389-
"pi_X = 0.25 + 0.5*X[:,0]\n",
389+
"pi_X = 0.35 + 0.3*X[:,0]\n",
390390
"Z = rng.binomial(1, pi_X, n).astype(float)\n",
391391
"\n",
392392
"# Define the outcome mean functions (prognostic and treatment effects)\n",
393-
"mu_X = pi_X*5\n",
393+
"mu_X = (pi_X - 0.5)*30\n",
394394
"# tau_X = np.sin(X[:,1]*2*np.pi)\n",
395395
"tau_X = X[:,1]*2\n",
396396
"\n",
@@ -423,24 +423,24 @@
423423
"min_samples_leaf_mu = 1\n",
424424
"num_trees_mu = 200\n",
425425
"cutpoint_grid_size_mu = 100\n",
426-
"tau_init_mu = 1/200\n",
426+
"tau_init_mu = 1/num_trees_mu\n",
427427
"leaf_prior_scale_mu = np.array([[tau_init_mu]], order='C')\n",
428428
"a_leaf_mu = 3.\n",
429-
"b_leaf_mu = 1/200\n",
429+
"b_leaf_mu = 1/num_trees_mu\n",
430430
"leaf_regression_mu = False\n",
431431
"feature_types_mu = np.repeat(0, p_X).astype(int) # 0 = numeric\n",
432432
"var_weights_mu = np.repeat(1/(p_X + 1), p_X + 1)\n",
433433
"\n",
434434
"# Treatment forest parameters\n",
435-
"alpha_tau = 0.25\n",
435+
"alpha_tau = 0.75\n",
436436
"beta_tau = 3.\n",
437437
"min_samples_leaf_tau = 1\n",
438438
"num_trees_tau = 50\n",
439439
"cutpoint_grid_size_tau = 100\n",
440-
"tau_init_tau = 1/50\n",
440+
"tau_init_tau = 1/num_trees_tau\n",
441441
"leaf_prior_scale_tau = np.array([[tau_init_tau]], order='C')\n",
442442
"a_leaf_tau = 3.\n",
443-
"b_leaf_tau = 1/50\n",
443+
"b_leaf_tau = 1/num_trees_tau\n",
444444
"leaf_regression_tau = True\n",
445445
"feature_types_tau = np.repeat(0, p_X).astype(int) # 0 = numeric\n",
446446
"var_weights_tau = np.repeat(1/p_X, p_X)\n",
@@ -466,7 +466,7 @@
466466
"source": [
467467
"# Prognostic Forest Dataset (covariates)\n",
468468
"dataset_mu = Dataset()\n",
469-
"dataset_mu.add_covariates(np.c_[X,pi_X])\n",
469+
"dataset_mu.add_covariates(np.c_[X, pi_X])\n",
470470
"\n",
471471
"# Treatment Forest Dataset (covariates and treatment variable)\n",
472472
"dataset_tau = Dataset()\n",
@@ -521,7 +521,7 @@
521521
"outputs": [],
522522
"source": [
523523
"num_warmstart = 10\n",
524-
"num_mcmc = 500\n",
524+
"num_mcmc = 100\n",
525525
"num_samples = num_warmstart + num_mcmc\n",
526526
"global_var_samples = np.concatenate((np.array([global_variance_init]), np.repeat(0, num_samples)))\n",
527527
"leaf_scale_samples_mu = np.concatenate((np.array([tau_init_mu]), np.repeat(0, num_samples)))\n",
@@ -562,8 +562,6 @@
562562
" forest_sampler_tau.sample_one_iteration(forest_container_tau, active_forest_tau, dataset_tau, residual, cpp_rng, \n",
563563
" feature_types_tau, cutpoint_grid_size_tau, leaf_prior_scale_tau, var_weights_tau, \n",
564564
" 0.0, 0.0, global_var_samples[i], 1, True, True, False)\n",
565-
" # leaf_scale_samples_tau[i+1] = leaf_var_model_tau.sample_one_iteration(forest_container_tau, cpp_rng, a_leaf_tau, b_leaf_tau)\n",
566-
" # leaf_prior_scale_tau[0,0] = leaf_scale_samples_tau[i+1]\n",
567565
" tau_x = np.squeeze(active_forest_tau.predict_raw(dataset_tau))\n",
568566
" s_tt0 = np.sum(tau_x*tau_x*(Z==0))\n",
569567
" s_tt1 = np.sum(tau_x*tau_x*(Z==1))\n",
@@ -606,8 +604,6 @@
606604
" forest_sampler_tau.sample_one_iteration(forest_container_tau, active_forest_tau, dataset_tau, residual, cpp_rng, \n",
607605
" feature_types_tau, cutpoint_grid_size_tau, leaf_prior_scale_tau, var_weights_tau, \n",
608606
" 0.0, 0.0, global_var_samples[i], 1, True, False, False)\n",
609-
" # leaf_scale_samples_tau[i+1] = leaf_var_model_tau.sample_one_iteration(forest_container_tau, cpp_rng, a_leaf_tau, b_leaf_tau, i)\n",
610-
" # leaf_prior_scale_tau[0,0] = leaf_scale_samples_tau[i+1]\n",
611607
" tau_x = np.squeeze(active_forest_tau.predict_raw(dataset_tau))\n",
612608
" s_tt0 = np.sum(tau_x*tau_x*(Z==0))\n",
613609
" s_tt1 = np.sum(tau_x*tau_x*(Z==1))\n",

demo/notebooks/serialization.ipynb

Lines changed: 20 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
"cell_type": "markdown",
55
"metadata": {},
66
"source": [
7-
"# Serialization Demo Notebook"
7+
"# Model Serialization"
88
]
99
},
1010
{
@@ -29,6 +29,7 @@
2929
"source": [
3030
"import json\n",
3131
"import numpy as np\n",
32+
"import os\n",
3233
"import pandas as pd\n",
3334
"import seaborn as sns\n",
3435
"import matplotlib.pyplot as plt\n",
@@ -120,7 +121,7 @@
120121
"outputs": [],
121122
"source": [
122123
"bart_model = BARTModel()\n",
123-
"bart_model.sample(X_train=X_train, y_train=y_train, basis_train=basis_train, X_test=X_test, basis_test=basis_test, num_gfr=10, num_mcmc=100)"
124+
"bart_model.sample(X_train=X_train, y_train=y_train, basis_train=basis_train, X_test=X_test, basis_test=basis_test, num_gfr=10, num_mcmc=10)"
124125
]
125126
},
126127
{
@@ -150,7 +151,7 @@
150151
"metadata": {},
151152
"outputs": [],
152153
"source": [
153-
"sigma_df_mcmc = pd.DataFrame(np.concatenate((np.expand_dims(np.arange(bart_model.num_samples - bart_model.num_gfr),axis=1), np.expand_dims(bart_model.global_var_samples,axis=1)), axis = 1), columns=[\"Sample\", \"Sigma\"])\n",
154+
"sigma_df_mcmc = pd.DataFrame(np.concatenate((np.expand_dims(np.arange(bart_model.num_samples),axis=1), np.expand_dims(bart_model.global_var_samples,axis=1)), axis = 1), columns=[\"Sample\", \"Sigma\"])\n",
154155
"sns.scatterplot(data=sigma_df_mcmc, x=\"Sample\", y=\"Sigma\")\n",
155156
"plt.show()"
156157
]
@@ -321,6 +322,22 @@
321322
"plt.axline((0, 0), slope=1, color=\"black\", linestyle=(0, (3,3)))\n",
322323
"plt.show()"
323324
]
325+
},
326+
{
327+
"cell_type": "markdown",
328+
"metadata": {},
329+
"source": [
330+
"Clean up JSON file"
331+
]
332+
},
333+
{
334+
"cell_type": "code",
335+
"execution_count": null,
336+
"metadata": {},
337+
"outputs": [],
338+
"source": [
339+
"os.remove('bart.json')"
340+
]
324341
}
325342
],
326343
"metadata": {

demo/notebooks/supervised_learning.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
"cell_type": "markdown",
55
"metadata": {},
66
"source": [
7-
"# Supervised Learning Demo Notebook"
7+
"# Supervised Learning"
88
]
99
},
1010
{

demo/notebooks/tree_inspection.ipynb

Lines changed: 29 additions & 139 deletions
Large diffs are not rendered by default.

include/stochtree/leaf_model.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -239,7 +239,7 @@ namespace StochTree {
239239
* \beta \sim N\left(0, \tau\right)
240240
* \f]
241241
*
242-
* Allowing for case / variance weights $w_i$ as above, we derive a reduced log marginal likelihood of
242+
* Allowing for case / variance weights \f$w_i\f$ as above, we derive a reduced log marginal likelihood of
243243
*
244244
* \f[
245245
* L(y) \propto \frac{1}{2} \log\left(\frac{\sigma^2}{s_{wxx,\ell} \tau + \sigma^2}\right) + \frac{\tau s_{wyx,\ell}^2}{2\sigma^2(s_{wxx,\ell} \tau + \sigma^2)}

include/stochtree/mainpage.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@
3333
* - <b>Leaf Model</b>: `stochtree`'s data structures are generalized to support a wide range of models, which are defined via specialized classes in the \ref leaf_model_group "leaf model layer".
3434
* - <b>Sampler</b>: helper functions that sample forests from training data comprise the \ref sampling_group "sampling layer" of `stochtree`.
3535
*
36-
* \section extending-stochtree Extending `stochtree`
36+
* \section extending-stochtree Extending stochtree
3737
*
3838
* \subsection custom-leaf-models Custom Leaf Models
3939
*

setup.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@ def build_extension(self, ext: CMakeExtension) -> None:
3636

3737
debug = int(os.environ.get("DEBUG", 0)) if self.debug is None else self.debug
3838
cfg = "Debug" if debug else "Release"
39+
use_dbg = "ON" if debug else "OFF"
3940

4041
# CMake lets you override the generator - we need to check this.
4142
# Can be set with Conda-Build, for example.
@@ -48,8 +49,8 @@ def build_extension(self, ext: CMakeExtension) -> None:
4849
f"-DCMAKE_LIBRARY_OUTPUT_DIRECTORY={extdir}{os.sep}",
4950
f"-DPYTHON_EXECUTABLE={sys.executable}",
5051
f"-DCMAKE_BUILD_TYPE={cfg}", # not used on MSVC, but no harm
51-
"-DUSE_DEBUG=OFF",
52-
"-DUSE_SANITIZER=OFF",
52+
f"-DUSE_DEBUG={use_dbg}",
53+
"-DUSE_SANITIZER=OFF",
5354
"-DBUILD_TEST=OFF",
5455
"-DBUILD_DEBUG_TARGETS=OFF",
5556
"-DBUILD_PYTHON=ON",
@@ -151,7 +152,7 @@ def run(self):
151152

152153
# The information here can also be placed in setup.cfg - better separation of
153154
# logic and declaration, and simpler if you include description/version in a file.
154-
__version__ = "0.0.1"
155+
__version__ = "0.1.1"
155156

156157
setup(
157158
name="stochtree",

0 commit comments

Comments
 (0)