|
1 | 1 | # XGBoost
|
2 | 2 |
|
3 |
| -TODO: Fill this out (ask Edward for examples). |
| 3 | +## Contents |
| 4 | + |
| 5 | +- [SageMaker Example](#sagemaker-example) |
| 6 | +- [Full API](#full-api) |
| 7 | + |
| 8 | +## SageMaker Example |
| 9 | + |
| 10 | +### Use XGBoost as a built-in algorithm |
| 11 | + |
| 12 | +The XGBoost algorithm can be used 1) as a built-in algorithm, or 2) as a framework such as MXNet, PyTorch, or Tensorflow. |
| 13 | +If SageMaker XGBoost is used as a built-in algorithm in container verision `0.90-2` or later, Amazon SageMaker Debugger will be available by default (i.e., zero code change experience). |
| 14 | +See [XGBoost Algorithm AWS docmentation](https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost.html) for more information on how to use XGBoost as a built-in algorithm. |
| 15 | +See [Amazon SageMaker Debugger examples](https://github.com/awslabs/amazon-sagemaker-examples/tree/master/sagemaker-debugger) for sample notebooks that demonstrate debugging and monitoring capabilities of Aamazon SageMaker Debugger. |
| 16 | +See [SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable/) for more information on how to configure the Amazon SageMaker Debugger from the Python SDK. |
| 17 | + |
| 18 | +### Use XGBoost as a framework |
| 19 | + |
| 20 | +When SageMaker XGBoost is used as a framework, it is recommended that the hook is configured from the [SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable/). |
| 21 | +By using SageMaker Python SDK, you can run different jobs (e.g., Processing jobs) on the SageMaker platform. |
| 22 | +You can retrieve the hook as follows. |
| 23 | +```python |
| 24 | +import xgboost as xgb |
| 25 | +from smdebug.xgboost import Hook |
| 26 | + |
| 27 | +dtrain = xgb.DMatrix("train.libsvm") |
| 28 | +dtest = xgb.DMatrix("test.libsmv") |
| 29 | + |
| 30 | +hook = Hook.create_from_json_file() |
| 31 | +hook.train_data = dtrain # required |
| 32 | +hook.validation_data = dtest # optional |
| 33 | +hook.hyperparameters = params # optional |
| 34 | + |
| 35 | +bst = xgb.train( |
| 36 | + params, |
| 37 | + dtrain, |
| 38 | + callbacks=[hook], |
| 39 | + evals_result=[(dtrain, "train"), (dvalid, "validation")] |
| 40 | +) |
| 41 | +``` |
| 42 | + |
| 43 | +Alternatively, you can also create the hook from `smdebug`'s Python API as shown in the next section. |
| 44 | + |
| 45 | +### Use the Debugger hook |
| 46 | + |
| 47 | +If you are in a non-SageMaker environment, or even in SageMaker, if you want to configure the hook in a certain way in script mode, you can use the full Debugger hook API as follows. |
| 48 | +```python |
| 49 | +import xgboost as xgb |
| 50 | +from smdebug.xgboost import Hook |
| 51 | + |
| 52 | +dtrain = xgb.DMatrix("train.libsvm") |
| 53 | +dvalid = xgb.DMatrix("validation.libsmv") |
| 54 | + |
| 55 | +hook = Hook( |
| 56 | + out_dir=out_dir, # required |
| 57 | + train_data=dtrain, # required |
| 58 | + validation_data=dvalid, # optional |
| 59 | + hyperparameters=hyperparameters, # optional |
| 60 | +) |
| 61 | +``` |
| 62 | + |
| 63 | +## Full API |
| 64 | + |
| 65 | +```python |
| 66 | +def __init__( |
| 67 | + self, |
| 68 | + out_dir, |
| 69 | + export_tensorboard = False, |
| 70 | + tensorboard_dir = None, |
| 71 | + dry_run = False, |
| 72 | + reduction_config = None, |
| 73 | + save_config = None, |
| 74 | + include_regex = None, |
| 75 | + include_collections = None, |
| 76 | + save_all = False, |
| 77 | + include_workers = "one", |
| 78 | + hyperparameters = None, |
| 79 | + train_data = None, |
| 80 | + validation_data = None, |
| 81 | +) |
| 82 | +``` |
| 83 | +Initializes the hook. Pass this object as a callback to `xgboost.train()`. |
| 84 | +* `out_dir` (str): A path into which tensors and metadata will be written. |
| 85 | +* `export_tensorboard` (bool): Whether to use TensorBoard logs. |
| 86 | +* `tensorboard_dir` (str): Where to save TensorBoard logs. |
| 87 | +* `dry_run` (bool): If true, evaluations are not actually saved to disk. |
| 88 | +* `reduction_config` (ReductionConfig object): Not supported in XGBoost and will be ignored. |
| 89 | +* `save_config` (SaveConfig object): See the [Common API](https://github.com/awslabs/sagemaker-debugger/blob/master/docs/api.md). |
| 90 | +* `include_regex` (list[str]): List of additional regexes to save. |
| 91 | +* `include_collections` (list[str]): List of collections to save. |
| 92 | +* `save_all` (bool): Saves all tensors and collections. **WARNING: May be memory-intensive and slow.** |
| 93 | +* `include_workers` (str): Used for distributed training, can also be "all". |
| 94 | +* `hyperparameters` (dict): Booster params. |
| 95 | +* `train_data` (DMatrix object): Data to be trained. |
| 96 | +* `validation_data` (DMatrix object): Validation set for which metrics will evaluated during training. |
| 97 | + |
| 98 | +See the [Common API](https://github.com/awslabs/sagemaker-debugger/blob/master/docs/api.md) page for details about Collection, SaveConfig, and ReductionConfig.\ |
| 99 | +See the [Analysis](https://github.com/awslabs/sagemaker-debugger/blob/master/docs/analysis.md) page for details about analyzing a training job. |
0 commit comments