Skip to content

Commit 7379b5c

Browse files
Edward J Kimjarednielsen
authored andcommitted
Add xgboost documentation (aws#78)
* Add xgboost documentation * Address comments by jarednielsen * Fix typo * Remove extra metrics row for xgboost
1 parent 5a8fbc0 commit 7379b5c

File tree

2 files changed

+105
-9
lines changed

2 files changed

+105
-9
lines changed

docs/api.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -100,18 +100,18 @@ will automatically place weights into the `smd.CollectionKeys.WEIGHTS` collectio
100100
| `GRADIENTS` | TensorFlow, PyTorch, MXNet | Matches all gradients tensors. In TensorFlow non-DLC, must use `hook.wrap_optimizer()`. |
101101
| `LOSSES` | TensorFlow, PyTorch, MXNet | Matches all loss tensors. |
102102
| `SCALARS` | TensorFlow, PyTorch, MXNet | Matches all scalar tensors, such as loss or accuracy. |
103-
| `METRICS` | TensorFlow, XGBoost | ??? |
103+
| `METRICS` | TensorFlow, XGBoost | Evaluation metrics computed by the algorithm. |
104104
| `INPUTS` | TensorFlow | Matches all inputs to a layer (outputs of the previous layer). |
105105
| `OUTPUTS` | TensorFlow | Matches all outputs of a layer (inputs of the following layer). |
106106
| `SEARCHABLE_SCALARS` | TensorFlow | Scalars that will go to SageMaker Metrics. |
107107
| `OPTIMIZER_VARIABLES` | TensorFlow | Matches all optimizer variables. |
108-
| `HYPERPARAMETERS` | XGBoost | ... |
109-
| `PREDICTIONS` | XGBoost | ... |
110-
| `LABELS` | XGBoost | ... |
111-
| `FEATURE_IMPORTANCE` | XGBoost | ... |
112-
| `AVERAGE_SHAP` | XGBoost | ... |
113-
| `FULL_SHAP` | XGBoost | ... |
114-
| `TREES` | XGBoost | ... |
108+
| `HYPERPARAMETERS` | XGBoost | [Booster paramameters](https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost_hyperparameters.html) |
109+
| `PREDICTIONS` | XGBoost | Predictions on validation set (if provided) |
110+
| `LABELS` | XGBoost | Labels on validation set (if provided) |
111+
| `FEATURE_IMPORTANCE` | XGBoost | Feature importance given by [get_score()](https://xgboost.readthedocs.io/en/latest/python/python_api.html#xgboost.Booster.get_score) |
112+
| `FULL_SHAP` | XGBoost | A matrix of (nsmaple, nfeatures + 1) with each record indicating the feature contributions ([SHAP values](https://github.com/slundberg/shap)) for that prediction. Computed on training data with [predict()](https://github.com/slundberg/shap) |
113+
| `AVERAGE_SHAP` | XGBoost | The sum of SHAP value magnitudes over all samples. Represents the impact each feature has on the model output. |
114+
| `TREES` | XGBoost | Boosted tree model given by [trees_to_dataframe()](https://xgboost.readthedocs.io/en/latest/python/python_api.html#xgboost.Booster.trees_to_dataframe) |
115115

116116

117117

docs/xgboost.md

Lines changed: 97 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,99 @@
11
# XGBoost
22

3-
TODO: Fill this out (ask Edward for examples).
3+
## Contents
4+
5+
- [SageMaker Example](#sagemaker-example)
6+
- [Full API](#full-api)
7+
8+
## SageMaker Example
9+
10+
### Use XGBoost as a built-in algorithm
11+
12+
The XGBoost algorithm can be used 1) as a built-in algorithm, or 2) as a framework such as MXNet, PyTorch, or Tensorflow.
13+
If SageMaker XGBoost is used as a built-in algorithm in container verision `0.90-2` or later, Amazon SageMaker Debugger will be available by default (i.e., zero code change experience).
14+
See [XGBoost Algorithm AWS docmentation](https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost.html) for more information on how to use XGBoost as a built-in algorithm.
15+
See [Amazon SageMaker Debugger examples](https://github.com/awslabs/amazon-sagemaker-examples/tree/master/sagemaker-debugger) for sample notebooks that demonstrate debugging and monitoring capabilities of Aamazon SageMaker Debugger.
16+
See [SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable/) for more information on how to configure the Amazon SageMaker Debugger from the Python SDK.
17+
18+
### Use XGBoost as a framework
19+
20+
When SageMaker XGBoost is used as a framework, it is recommended that the hook is configured from the [SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable/).
21+
By using SageMaker Python SDK, you can run different jobs (e.g., Processing jobs) on the SageMaker platform.
22+
You can retrieve the hook as follows.
23+
```python
24+
import xgboost as xgb
25+
from smdebug.xgboost import Hook
26+
27+
dtrain = xgb.DMatrix("train.libsvm")
28+
dtest = xgb.DMatrix("test.libsmv")
29+
30+
hook = Hook.create_from_json_file()
31+
hook.train_data = dtrain # required
32+
hook.validation_data = dtest # optional
33+
hook.hyperparameters = params # optional
34+
35+
bst = xgb.train(
36+
params,
37+
dtrain,
38+
callbacks=[hook],
39+
evals_result=[(dtrain, "train"), (dvalid, "validation")]
40+
)
41+
```
42+
43+
Alternatively, you can also create the hook from `smdebug`'s Python API as shown in the next section.
44+
45+
### Use the Debugger hook
46+
47+
If you are in a non-SageMaker environment, or even in SageMaker, if you want to configure the hook in a certain way in script mode, you can use the full Debugger hook API as follows.
48+
```python
49+
import xgboost as xgb
50+
from smdebug.xgboost import Hook
51+
52+
dtrain = xgb.DMatrix("train.libsvm")
53+
dvalid = xgb.DMatrix("validation.libsmv")
54+
55+
hook = Hook(
56+
out_dir=out_dir, # required
57+
train_data=dtrain, # required
58+
validation_data=dvalid, # optional
59+
hyperparameters=hyperparameters, # optional
60+
)
61+
```
62+
63+
## Full API
64+
65+
```python
66+
def __init__(
67+
self,
68+
out_dir,
69+
export_tensorboard = False,
70+
tensorboard_dir = None,
71+
dry_run = False,
72+
reduction_config = None,
73+
save_config = None,
74+
include_regex = None,
75+
include_collections = None,
76+
save_all = False,
77+
include_workers = "one",
78+
hyperparameters = None,
79+
train_data = None,
80+
validation_data = None,
81+
)
82+
```
83+
Initializes the hook. Pass this object as a callback to `xgboost.train()`.
84+
* `out_dir` (str): A path into which tensors and metadata will be written.
85+
* `export_tensorboard` (bool): Whether to use TensorBoard logs.
86+
* `tensorboard_dir` (str): Where to save TensorBoard logs.
87+
* `dry_run` (bool): If true, evaluations are not actually saved to disk.
88+
* `reduction_config` (ReductionConfig object): Not supported in XGBoost and will be ignored.
89+
* `save_config` (SaveConfig object): See the [Common API](https://github.com/awslabs/sagemaker-debugger/blob/master/docs/api.md).
90+
* `include_regex` (list[str]): List of additional regexes to save.
91+
* `include_collections` (list[str]): List of collections to save.
92+
* `save_all` (bool): Saves all tensors and collections. **WARNING: May be memory-intensive and slow.**
93+
* `include_workers` (str): Used for distributed training, can also be "all".
94+
* `hyperparameters` (dict): Booster params.
95+
* `train_data` (DMatrix object): Data to be trained.
96+
* `validation_data` (DMatrix object): Validation set for which metrics will evaluated during training.
97+
98+
See the [Common API](https://github.com/awslabs/sagemaker-debugger/blob/master/docs/api.md) page for details about Collection, SaveConfig, and ReductionConfig.\
99+
See the [Analysis](https://github.com/awslabs/sagemaker-debugger/blob/master/docs/analysis.md) page for details about analyzing a training job.

0 commit comments

Comments
 (0)