|
8 | 8 | "\n",
|
9 | 9 | "Kernel `Python 3 (Data Science)` works well with this notebook.\n",
|
10 | 10 | "\n",
|
11 |
| - "_This notebook was created and tested on an ml.m5.large notebook instance._\n", |
| 11 | + "_This notebook was created and tested on an ml.m5.xlarge notebook instance._\n", |
12 | 12 | "\n",
|
13 | 13 | "## Table of Contents\n",
|
14 | 14 | "\n",
|
|
101 | 101 | "source": [
|
102 | 102 | "import shap\n",
|
103 | 103 | "\n",
|
104 |
| - "from kernel_explainer_wrapper import KernelExplainerWrapper\n", |
| 104 | + "from shap import KernelExplainer\n", |
105 | 105 | "from shap import sample\n",
|
106 |
| - "from shap.common import LogitLink, IdentityLink\n", |
107 | 106 | "from scipy.special import expit\n",
|
108 | 107 | "\n",
|
109 | 108 | "# Initialize plugin to make plots interactive.\n",
|
|
235 | 234 | "metadata": {},
|
236 | 235 | "outputs": [],
|
237 | 236 | "source": [
|
238 |
| - "churn_data = pd.read_csv('./Data sets/churn.txt')\n", |
| 237 | + "churn_data = pd.read_csv('../Data sets/churn.txt')\n", |
239 | 238 | "data_without_target = churn_data.drop(columns=['Churn?'])\n",
|
240 | 239 | "\n",
|
241 | 240 | "background_data = sample(data_without_target, 50)"
|
|
252 | 251 | "cell_type": "markdown",
|
253 | 252 | "metadata": {},
|
254 | 253 | "source": [
|
255 |
| - "Next, we create the `KernelExplainer`. Note that since it's a black box explainer, `KernelExplainer` only requires a handle to the predict (or predict_proba) function and does not require any other information about the model. For classification it is recommended to derive feature importance scores in the log-odds space since additivity is a more natural assumption there thus we use `LogitLink`. For regression `IdentityLink` should be used." |
| 254 | + "Next, we create the `KernelExplainer`. Note that since it's a black box explainer, `KernelExplainer` only requires a handle to the\n", |
| 255 | + "predict (or predict_proba) function and does not require any other information about the model. For classification it is recommended to\n", |
| 256 | + "derive feature importance scores in the log-odds space since additivity is a more natural assumption there thus we use `logit`. For\n", |
| 257 | + "regression `identity` should be used." |
256 | 258 | ]
|
257 | 259 | },
|
258 | 260 | {
|
|
263 | 265 | "source": [
|
264 | 266 | "# Derive link function \n",
|
265 | 267 | "problem_type = automl_job.describe_auto_ml_job(job_name=automl_job_name)['ResolvedAttributes']['ProblemType'] \n",
|
266 |
| - "link_fn = IdentityLink if problem_type == 'Regression' else LogitLink \n", |
| 268 | + "link = \"identity\" if problem_type == 'Regression' else \"logit\"\n", |
267 | 269 | "\n",
|
268 |
| - "# the handle to predict_proba is passed to KernelExplainerWrapper since KernelSHAP requires the class probability\n", |
269 |
| - "explainer = KernelExplainerWrapper(automl_estimator.predict_proba, background_data, link=link_fn())" |
| 270 | + "# the handle to predict_proba is passed to KernelExplainer since KernelSHAP requires the class probability\n", |
| 271 | + "explainer = KernelExplainer(automl_estimator.predict_proba, background_data, link=link)" |
270 | 272 | ]
|
271 | 273 | },
|
272 | 274 | {
|
273 | 275 | "cell_type": "markdown",
|
274 | 276 | "metadata": {},
|
275 | 277 | "source": [
|
276 |
| - "Currently, `shap.KernelExplainer` only supports numeric data. A version of SHAP that supports text will become available soon. A workaround is provided by our wrapper `KernelExplainerWrapper`. Once a new version of SHAP is released, `shap.KernelExplainer` should be used instead of `KernelExplainerWrapper`.\n", |
277 | 278 | "\n",
|
278 | 279 | "By analyzing the background data `KernelExplainer` provides us with `explainer.expected_value` which is the model prediction with all features missing. Considering a customer for which we have no data at all (i.e. all features are missing) this should theoretically be the model prediction."
|
279 | 280 | ]
|
|
326 | 327 | "outputs": [],
|
327 | 328 | "source": [
|
328 | 329 | "# Since shap_values are provided in the log-odds space, we convert them back to the probability space by using LogitLink\n",
|
329 |
| - "shap.force_plot(explainer.expected_value, shap_values, x, link=link_fn())" |
| 330 | + "shap.force_plot(explainer.expected_value, shap_values, x, link=link)" |
330 | 331 | ]
|
331 | 332 | },
|
332 | 333 | {
|
|
348 | 349 | "source": [
|
349 | 350 | "with ManagedEndpoint(ep_name) as mep:\n",
|
350 | 351 | " shap_values = explainer.shap_values(x, nsamples='auto', l1_reg='num_features(5)')\n",
|
351 |
| - "shap.force_plot(explainer.expected_value, shap_values, x, link=link_fn())" |
| 352 | + "shap.force_plot(explainer.expected_value, shap_values, x, link=link)" |
352 | 353 | ]
|
353 | 354 | },
|
354 | 355 | {
|
|
396 | 397 | "metadata": {},
|
397 | 398 | "outputs": [],
|
398 | 399 | "source": [
|
399 |
| - "shap.force_plot(explainer.expected_value, shap_values, X, link=link_fn())" |
| 400 | + "shap.force_plot(explainer.expected_value, shap_values, X, link=link)" |
400 | 401 | ]
|
401 | 402 | },
|
402 | 403 | {
|
|
0 commit comments