add template notebook (#1570)

aaronmarkham · web-flow · commit 28b727f15595 · 2020-10-06T17:36:38.000-07:00
* add template notebook

* resolve comments
diff --git a/template.ipynb b/template.ipynb
@@ -0,0 +1,333 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Title\n",
+    "The title should be similar to the filename, but the filename should be very concise and compact, so people can read what it is when displayed in a list view in JupyterLab.\n",
+    "\n",
+    "Example title - **Amazon SageMaker Processing: pre-processing images with PyTorch using a GPU instance type**\n",
+    "\n",
+    "* Bad example filename: *amazon_sagemaker-processing-images_with_pytorch_on_GPU.ipynb*  (too long & mixes case, dashes, and underscores)\n",
+    "* Good example filename: *processing_images_pytorch_gpu.ipynb*  (succinct, all lowercase, all underscores)\n",
+    "\n",
+    "**IMPORTANT:** Use only one maining heading with `#`, so your next subheading is `##` or `###` and so on.\n",
+    "\n",
+    "## Overview\n",
+    "1. What does this notebook do?\n",
+    "    - What will the user learn how to do?\n",
+    "1. Is this an end-to-end tutorial or it is a how-to (procedural) example?\n",
+    "    - Tutorial: add conceptual information, flowcharts, images\n",
+    "    - How to: notebook should be lean. More of a list of steps. No conceptual info, but links to resources for more info.\n",
+    "1. Who is the audience? \n",
+    "    - What should the user be familiar with before running this? \n",
+    "    - Link to other examples they should have run first.\n",
+    "1. How much will this cost?\n",
+    "    - Some estimate of both time and money is recommended.\n",
+    "    - List the instance types and other resources that are created.\n",
+    "\n",
+    "\n",
+    "## Prerequisites\n",
+    "1. Which environments does this notebook work in? Select all that apply.\n",
+    "   - Notebook Instances: Jupyter?\n",
+    "   - Notebook Instances: JupyterLab?\n",
+    "   - Studio?\n",
+    "1. Which conda kernel is required?\n",
+    "1. Is there a previous notebook that is required?\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Setup \n",
+    "\n",
+    "### Setup Dependencies\n",
+    "\n",
+    "1. Describe any pip or conda or apt installs or setup scripts that are needed.\n",
+    "1. Pin sagemaker if version <2 is required.\n",
+    "\n",
+    "    `%pip install \"sagemaker>=1.14.2,<2\"`\n",
+    "    \n",
+    "    \n",
+    "1. Upgrade sagemaker if version 2 is required, but rollback upgrades to packages that might taint the user's kernel and make other notebooks break. Do this at the end of the notebook in the cleanup cell.\n",
+    "\n",
+    "    ```python\n",
+    "    # setup\n",
+    "    import sagemaker\n",
+    "    version = sagemaker.__version__\n",
+    "    %pip install 'sagemaker>=2.0.0'\n",
+    "    ...\n",
+    "    # cleanup\n",
+    "    %pip install 'sagemaker=={}'.format(version)\n",
+    "    ```\n",
+    "    \n",
+    "\n",
+    "1. Use flags that facilitate automatic, end-to-end running without a user prompt, so that the notebook can run in CI without any updates or special configuration."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# SageMaker Python SDK version 1.x is required\n",
+    "import sys\n",
+    "%pip install \"sagemaker>=1.14.2,<2\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# SageMaker Python SDK version 2.x is required\n",
+    "import sagemaker\n",
+    "import sys\n",
+    "original_version = sagemaker.__version__\n",
+    "%pip install 'sagemaker>=2.0.0'"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Setup Python Modules\n",
+    "1. Import modules, set options, and activate extensions."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-06-16T14:44:50.874881Z",
+     "start_time": "2019-06-16T14:44:38.616867Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# imports\n",
+    "import sagemaker\n",
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "\n",
+    "# options\n",
+    "pd.options.display.max_columns = 50\n",
+    "pd.options.display.max_rows = 30\n",
+    "\n",
+    "# visualizations\n",
+    "import plotly\n",
+    "import plotly.graph_objs as go\n",
+    "import plotly.offline as ply\n",
+    "plotly.offline.init_notebook_mode(connected=True)\n",
+    "\n",
+    "# extensions\n",
+    "if 'autoreload' not in get_ipython().extension_manager.loaded:\n",
+    "    %load_ext autoreload\n",
+    "    \n",
+    "%autoreload 2"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Parameters\n",
+    "1. Setup user supplied parameters like custom bucket names and roles in a separated cell and call out what their options are.\n",
+    "1. Use defaults, so the notebook will still run end-to-end without any user modification.\n",
+    "\n",
+    "For example, the following description & code block prompts the user to select the preferred dataset.\n",
+    "\n",
+    "~~~\n",
+    "\n",
+    "To do select a particular dataset, assign choosen_data_set below to be one of 'diabetes', 'california', or 'boston' where each name corresponds to the it's respective dataset.\n",
+    "\n",
+    "'boston' : boston house data\n",
+    "'california' : california house data\n",
+    "'diabetes' : diabetes data\n",
+    "\n",
+    "~~~\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "data_sets = {'diabetes': 'load_diabetes()', 'california': 'fetch_california_housing()', 'boston' : 'load_boston()'}\n",
+    "\n",
+    "# Change choosen_data_set variable to one of the data sets above. \n",
+    "choosen_data_set = 'california'\n",
+    "assert choosen_data_set in data_sets.keys()\n",
+    "print(\"I selected the '{}' dataset!\".format(choosen_data_set))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "\n",
+    "## Data import\n",
+    "1. Look for the data that was stored by a previous notebook run `%store -r variableName`\n",
+    "1. If that doesn't exist, look in S3 in their default bucket\n",
+    "1. If that doesn't exist, download it from the [SageMaker dataset bucket](https://sagemaker-sample-files.s3.amazonaws.com/) \n",
+    "1. If that doesn't exist, download it from origin\n",
+    "\n",
+    "For example, the following code block will pull training and validation data that was created in a previous notebook. This allows the customer to experiment with features, re-run the notebook, and not have it pull the dataset over and over."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Load relevant dataframes and variables from preprocessing_tabular_data.ipynb required for this notebook\n",
+    "%store -r X_train\n",
+    "%store -r X_test\n",
+    "%store -r X_val\n",
+    "%store -r Y_train\n",
+    "%store -r Y_test\n",
+    "%store -r Y_val\n",
+    "%store -r choosen_data_set"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Procedure or tutorial\n",
+    "1. Break up processes with Markdown blocks to explain what's going on.\n",
+    "1. Make use of visualizations to better demonstrate each step."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Cleanup\n",
+    "1. If you upgraded their `sagemaker` SDK, roll it back.\n",
+    "1. Delete any endpoints or other resources that linger and might cost the user money.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# rollback the SageMaker Python SDK to the kernel's original version\n",
+    "print(\"Original version: {}\".format(original_version))\n",
+    "print(\"Current version: {}\".format(sagemaker.__version__))\n",
+    "s = 'sagemaker=={}'.format(version)\n",
+    "print(\"Rolling back to... {}\".format(s))\n",
+    "%pip install {s}\n",
+    "import sagemaker\n",
+    "print(\"{} installed!\".format(sagemaker.__version__))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Next steps\n",
+    "\n",
+    "1. Wrap up with some conclusion or overview of what was accomplished.\n",
+    "1. Offer another notebook or more resources or some other call to action."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## References\n",
+    "1. author1, article1, journal1, year1, url1\n",
+    "2. author2, article2, journal2, year2, url2"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "conda_python3",
+   "language": "python",
+   "name": "conda_python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.10"
+  },
+  "pycharm": {
+   "stem_cell": {
+    "cell_type": "raw",
+    "metadata": {
+     "collapsed": false
+    },
+    "source": []
+   }
+  },
+  "toc": {
+   "base_numbering": 1,
+   "nav_menu": {},
+   "number_sections": true,
+   "sideBar": true,
+   "skip_h1_title": false,
+   "title_cell": "Table of Contents",
+   "title_sidebar": "Contents",
+   "toc_cell": false,
+   "toc_position": {},
+   "toc_section_display": true,
+   "toc_window_display": false
+  },
+  "varInspector": {
+   "cols": {
+    "lenName": 16,
+    "lenType": 16,
+    "lenVar": 40
+   },
+   "kernels_config": {
+    "python": {
+     "delete_cmd_postfix": "",
+     "delete_cmd_prefix": "del ",
+     "library": "var_list.py",
+     "varRefreshCmd": "print(var_dic_list())"
+    },
+    "r": {
+     "delete_cmd_postfix": ") ",
+     "delete_cmd_prefix": "rm(",
+     "library": "var_list.r",
+     "varRefreshCmd": "cat(var_dic_list()) "
+    }
+   },
+   "types_to_exclude": [
+    "module",
+    "function",
+    "builtin_function_or_method",
+    "instance",
+    "_Feature"
+   ],
+   "window_display": false
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}