@@ -4,6 +4,7 @@ This tutorial was last given at SciPy 2020 which was a virtual conference.
4
4
[ A video of the SciPy 2020 tutorial is available online] ( https://www.youtube.com/watch?v=EybGGLbLipI ) .
5
5
6
6
[ ![ Binder] ( https://mybinder.org/badge_logo.svg )] ( https://mybinder.org/v2/gh/dask/dask-tutorial/master?urlpath=lab )
7
+ [ ![ Build Status] ( https://github.com/dask/dask-tutorial/workflows/CI/badge.svg )] ( https://github.com/dask/dask-tutorial/actions?query=workflow%3ACI )
7
8
8
9
Dask provides multi-core execution on larger-than-memory datasets.
9
10
@@ -35,13 +36,13 @@ schedulers (odd sections.)
35
36
36
37
and then install necessary packages.
37
38
There are three different ways to achieve this, pick the one that best suits you, and *** only pick one option*** .
38
- They are, in order of preference:
39
+ They are, in order of preference:
39
40
40
41
#### 2a) Create a conda environment (preferred)
41
42
42
43
In the main repo directory
43
44
44
- conda env create -f binder/environment.yml
45
+ conda env create -f binder/environment.yml
45
46
conda activate dask-tutorial
46
47
jupyter labextension install @jupyter-widgets/jupyterlab-manager
47
48
jupyter labextension install @bokeh/jupyter_bokeh
@@ -55,10 +56,10 @@ You will need the following core libraries
55
56
You may find the following libraries helpful for some exercises
56
57
57
58
conda install python-graphviz -c conda-forge
58
-
59
- Note that this options will alter your existing environment, potentially changing the versions of packages you already
60
- have installed.
61
-
59
+
60
+ Note that this options will alter your existing environment, potentially changing the versions of packages you already
61
+ have installed.
62
+
62
63
#### 2c) Use Dockerfile
63
64
64
65
You can build a docker image out of the provided Dockerfile.
@@ -69,7 +70,7 @@ Run a container, replacing the ID with the output of the previous command
69
70
70
71
$ docker run -it -p 8888:8888 -p 8787:8787 <container_id_or_tag>
71
72
72
- The above command will give an URL (` Like http://(container_id or 127.0.0.1):8888/?token=<sometoken> ` ) which
73
+ The above command will give an URL (` Like http://(container_id or 127.0.0.1):8888/?token=<sometoken> ` ) which
73
74
can be used to access the notebook from browser. You may need to replace the given hostname with "localhost" or
74
75
"127.0.0.1".
75
76
@@ -79,7 +80,7 @@ can be used to access the notebook from browser. You may need to replace the giv
79
80
80
81
From the repo directory
81
82
82
- jupyter notebook
83
+ jupyter notebook
83
84
84
85
Or
85
86
@@ -110,8 +111,8 @@ This was already done for method c) and does not need repeating.
110
111
111
112
2 . [ Bag] ( 02_bag.ipynb ) - the first high-level collection: a generalized iterator for use
112
113
with a functional programming style and to clean messy data.
113
-
114
- 3 . [ Array] ( 03_array.ipynb ) - blocked numpy-like functionality with a collection of
114
+
115
+ 3 . [ Array] ( 03_array.ipynb ) - blocked numpy-like functionality with a collection of
115
116
numpy arrays spread across your cluster.
116
117
117
118
7 . [ Dataframe] ( 04_dataframe.ipynb ) - parallelized operations on many pandas dataframes
@@ -120,7 +121,7 @@ spread across your cluster.
120
121
5 . [ Distributed] ( 05_distributed.ipynb ) - Dask's scheduler for clusters, with details of
121
122
how to view the UI.
122
123
123
- 6 . [ Advanced Distributed] ( 06_distributed_advanced.ipynb ) - further details on distributed
124
+ 6 . [ Advanced Distributed] ( 06_distributed_advanced.ipynb ) - further details on distributed
124
125
computing, including how to debug.
125
126
126
127
7 . [ Dataframe Storage] ( 07_dataframe_storage.ipynb ) - efficient ways to read and write
0 commit comments