Skip to content

Commit 28b727f

Browse files
authored
add template notebook (#1570)
* add template notebook * resolve comments
1 parent a907008 commit 28b727f

File tree

1 file changed

+333
-0
lines changed

1 file changed

+333
-0
lines changed

template.ipynb

Lines changed: 333 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,333 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Title\n",
8+
"The title should be similar to the filename, but the filename should be very concise and compact, so people can read what it is when displayed in a list view in JupyterLab.\n",
9+
"\n",
10+
"Example title - **Amazon SageMaker Processing: pre-processing images with PyTorch using a GPU instance type**\n",
11+
"\n",
12+
"* Bad example filename: *amazon_sagemaker-processing-images_with_pytorch_on_GPU.ipynb* (too long & mixes case, dashes, and underscores)\n",
13+
"* Good example filename: *processing_images_pytorch_gpu.ipynb* (succinct, all lowercase, all underscores)\n",
14+
"\n",
15+
"**IMPORTANT:** Use only one maining heading with `#`, so your next subheading is `##` or `###` and so on.\n",
16+
"\n",
17+
"## Overview\n",
18+
"1. What does this notebook do?\n",
19+
" - What will the user learn how to do?\n",
20+
"1. Is this an end-to-end tutorial or it is a how-to (procedural) example?\n",
21+
" - Tutorial: add conceptual information, flowcharts, images\n",
22+
" - How to: notebook should be lean. More of a list of steps. No conceptual info, but links to resources for more info.\n",
23+
"1. Who is the audience? \n",
24+
" - What should the user be familiar with before running this? \n",
25+
" - Link to other examples they should have run first.\n",
26+
"1. How much will this cost?\n",
27+
" - Some estimate of both time and money is recommended.\n",
28+
" - List the instance types and other resources that are created.\n",
29+
"\n",
30+
"\n",
31+
"## Prerequisites\n",
32+
"1. Which environments does this notebook work in? Select all that apply.\n",
33+
" - Notebook Instances: Jupyter?\n",
34+
" - Notebook Instances: JupyterLab?\n",
35+
" - Studio?\n",
36+
"1. Which conda kernel is required?\n",
37+
"1. Is there a previous notebook that is required?\n"
38+
]
39+
},
40+
{
41+
"cell_type": "markdown",
42+
"metadata": {},
43+
"source": [
44+
"## Setup \n",
45+
"\n",
46+
"### Setup Dependencies\n",
47+
"\n",
48+
"1. Describe any pip or conda or apt installs or setup scripts that are needed.\n",
49+
"1. Pin sagemaker if version <2 is required.\n",
50+
"\n",
51+
" `%pip install \"sagemaker>=1.14.2,<2\"`\n",
52+
" \n",
53+
" \n",
54+
"1. Upgrade sagemaker if version 2 is required, but rollback upgrades to packages that might taint the user's kernel and make other notebooks break. Do this at the end of the notebook in the cleanup cell.\n",
55+
"\n",
56+
" ```python\n",
57+
" # setup\n",
58+
" import sagemaker\n",
59+
" version = sagemaker.__version__\n",
60+
" %pip install 'sagemaker>=2.0.0'\n",
61+
" ...\n",
62+
" # cleanup\n",
63+
" %pip install 'sagemaker=={}'.format(version)\n",
64+
" ```\n",
65+
" \n",
66+
"\n",
67+
"1. Use flags that facilitate automatic, end-to-end running without a user prompt, so that the notebook can run in CI without any updates or special configuration."
68+
]
69+
},
70+
{
71+
"cell_type": "code",
72+
"execution_count": null,
73+
"metadata": {},
74+
"outputs": [],
75+
"source": [
76+
"# SageMaker Python SDK version 1.x is required\n",
77+
"import sys\n",
78+
"%pip install \"sagemaker>=1.14.2,<2\""
79+
]
80+
},
81+
{
82+
"cell_type": "code",
83+
"execution_count": null,
84+
"metadata": {},
85+
"outputs": [],
86+
"source": [
87+
"# SageMaker Python SDK version 2.x is required\n",
88+
"import sagemaker\n",
89+
"import sys\n",
90+
"original_version = sagemaker.__version__\n",
91+
"%pip install 'sagemaker>=2.0.0'"
92+
]
93+
},
94+
{
95+
"cell_type": "markdown",
96+
"metadata": {},
97+
"source": [
98+
"### Setup Python Modules\n",
99+
"1. Import modules, set options, and activate extensions."
100+
]
101+
},
102+
{
103+
"cell_type": "code",
104+
"execution_count": null,
105+
"metadata": {
106+
"ExecuteTime": {
107+
"end_time": "2019-06-16T14:44:50.874881Z",
108+
"start_time": "2019-06-16T14:44:38.616867Z"
109+
}
110+
},
111+
"outputs": [],
112+
"source": [
113+
"# imports\n",
114+
"import sagemaker\n",
115+
"import numpy as np\n",
116+
"import pandas as pd\n",
117+
"\n",
118+
"# options\n",
119+
"pd.options.display.max_columns = 50\n",
120+
"pd.options.display.max_rows = 30\n",
121+
"\n",
122+
"# visualizations\n",
123+
"import plotly\n",
124+
"import plotly.graph_objs as go\n",
125+
"import plotly.offline as ply\n",
126+
"plotly.offline.init_notebook_mode(connected=True)\n",
127+
"\n",
128+
"# extensions\n",
129+
"if 'autoreload' not in get_ipython().extension_manager.loaded:\n",
130+
" %load_ext autoreload\n",
131+
" \n",
132+
"%autoreload 2"
133+
]
134+
},
135+
{
136+
"cell_type": "markdown",
137+
"metadata": {},
138+
"source": [
139+
"## Parameters\n",
140+
"1. Setup user supplied parameters like custom bucket names and roles in a separated cell and call out what their options are.\n",
141+
"1. Use defaults, so the notebook will still run end-to-end without any user modification.\n",
142+
"\n",
143+
"For example, the following description & code block prompts the user to select the preferred dataset.\n",
144+
"\n",
145+
"~~~\n",
146+
"\n",
147+
"To do select a particular dataset, assign choosen_data_set below to be one of 'diabetes', 'california', or 'boston' where each name corresponds to the it's respective dataset.\n",
148+
"\n",
149+
"'boston' : boston house data\n",
150+
"'california' : california house data\n",
151+
"'diabetes' : diabetes data\n",
152+
"\n",
153+
"~~~\n"
154+
]
155+
},
156+
{
157+
"cell_type": "code",
158+
"execution_count": null,
159+
"metadata": {},
160+
"outputs": [],
161+
"source": [
162+
"data_sets = {'diabetes': 'load_diabetes()', 'california': 'fetch_california_housing()', 'boston' : 'load_boston()'}\n",
163+
"\n",
164+
"# Change choosen_data_set variable to one of the data sets above. \n",
165+
"choosen_data_set = 'california'\n",
166+
"assert choosen_data_set in data_sets.keys()\n",
167+
"print(\"I selected the '{}' dataset!\".format(choosen_data_set))"
168+
]
169+
},
170+
{
171+
"cell_type": "markdown",
172+
"metadata": {},
173+
"source": [
174+
"\n",
175+
"## Data import\n",
176+
"1. Look for the data that was stored by a previous notebook run `%store -r variableName`\n",
177+
"1. If that doesn't exist, look in S3 in their default bucket\n",
178+
"1. If that doesn't exist, download it from the [SageMaker dataset bucket](https://sagemaker-sample-files.s3.amazonaws.com/) \n",
179+
"1. If that doesn't exist, download it from origin\n",
180+
"\n",
181+
"For example, the following code block will pull training and validation data that was created in a previous notebook. This allows the customer to experiment with features, re-run the notebook, and not have it pull the dataset over and over."
182+
]
183+
},
184+
{
185+
"cell_type": "code",
186+
"execution_count": null,
187+
"metadata": {},
188+
"outputs": [],
189+
"source": [
190+
"# Load relevant dataframes and variables from preprocessing_tabular_data.ipynb required for this notebook\n",
191+
"%store -r X_train\n",
192+
"%store -r X_test\n",
193+
"%store -r X_val\n",
194+
"%store -r Y_train\n",
195+
"%store -r Y_test\n",
196+
"%store -r Y_val\n",
197+
"%store -r choosen_data_set"
198+
]
199+
},
200+
{
201+
"cell_type": "markdown",
202+
"metadata": {},
203+
"source": [
204+
"## Procedure or tutorial\n",
205+
"1. Break up processes with Markdown blocks to explain what's going on.\n",
206+
"1. Make use of visualizations to better demonstrate each step."
207+
]
208+
},
209+
{
210+
"cell_type": "markdown",
211+
"metadata": {},
212+
"source": [
213+
"## Cleanup\n",
214+
"1. If you upgraded their `sagemaker` SDK, roll it back.\n",
215+
"1. Delete any endpoints or other resources that linger and might cost the user money.\n"
216+
]
217+
},
218+
{
219+
"cell_type": "code",
220+
"execution_count": null,
221+
"metadata": {},
222+
"outputs": [],
223+
"source": [
224+
"# rollback the SageMaker Python SDK to the kernel's original version\n",
225+
"print(\"Original version: {}\".format(original_version))\n",
226+
"print(\"Current version: {}\".format(sagemaker.__version__))\n",
227+
"s = 'sagemaker=={}'.format(version)\n",
228+
"print(\"Rolling back to... {}\".format(s))\n",
229+
"%pip install {s}\n",
230+
"import sagemaker\n",
231+
"print(\"{} installed!\".format(sagemaker.__version__))"
232+
]
233+
},
234+
{
235+
"cell_type": "markdown",
236+
"metadata": {},
237+
"source": [
238+
"## Next steps\n",
239+
"\n",
240+
"1. Wrap up with some conclusion or overview of what was accomplished.\n",
241+
"1. Offer another notebook or more resources or some other call to action."
242+
]
243+
},
244+
{
245+
"cell_type": "markdown",
246+
"metadata": {},
247+
"source": [
248+
"## References\n",
249+
"1. author1, article1, journal1, year1, url1\n",
250+
"2. author2, article2, journal2, year2, url2"
251+
]
252+
},
253+
{
254+
"cell_type": "code",
255+
"execution_count": null,
256+
"metadata": {},
257+
"outputs": [],
258+
"source": []
259+
}
260+
],
261+
"metadata": {
262+
"kernelspec": {
263+
"display_name": "conda_python3",
264+
"language": "python",
265+
"name": "conda_python3"
266+
},
267+
"language_info": {
268+
"codemirror_mode": {
269+
"name": "ipython",
270+
"version": 3
271+
},
272+
"file_extension": ".py",
273+
"mimetype": "text/x-python",
274+
"name": "python",
275+
"nbconvert_exporter": "python",
276+
"pygments_lexer": "ipython3",
277+
"version": "3.6.10"
278+
},
279+
"pycharm": {
280+
"stem_cell": {
281+
"cell_type": "raw",
282+
"metadata": {
283+
"collapsed": false
284+
},
285+
"source": []
286+
}
287+
},
288+
"toc": {
289+
"base_numbering": 1,
290+
"nav_menu": {},
291+
"number_sections": true,
292+
"sideBar": true,
293+
"skip_h1_title": false,
294+
"title_cell": "Table of Contents",
295+
"title_sidebar": "Contents",
296+
"toc_cell": false,
297+
"toc_position": {},
298+
"toc_section_display": true,
299+
"toc_window_display": false
300+
},
301+
"varInspector": {
302+
"cols": {
303+
"lenName": 16,
304+
"lenType": 16,
305+
"lenVar": 40
306+
},
307+
"kernels_config": {
308+
"python": {
309+
"delete_cmd_postfix": "",
310+
"delete_cmd_prefix": "del ",
311+
"library": "var_list.py",
312+
"varRefreshCmd": "print(var_dic_list())"
313+
},
314+
"r": {
315+
"delete_cmd_postfix": ") ",
316+
"delete_cmd_prefix": "rm(",
317+
"library": "var_list.r",
318+
"varRefreshCmd": "cat(var_dic_list()) "
319+
}
320+
},
321+
"types_to_exclude": [
322+
"module",
323+
"function",
324+
"builtin_function_or_method",
325+
"instance",
326+
"_Feature"
327+
],
328+
"window_display": false
329+
}
330+
},
331+
"nbformat": 4,
332+
"nbformat_minor": 2
333+
}

0 commit comments

Comments
 (0)