|
36 | 36 | "id": "HtysPAVSvcMg"
|
37 | 37 | },
|
38 | 38 | "source": [
|
39 |
| - "# 🌦️ Weather forecasting\n", |
| 39 | + "# 🌦️ Weather forecasting -- _Dataset_\n", |
40 | 40 | "\n",
|
41 | 41 | "[](https://colab.research.google.com/github/GoogleCloudPlatform/python-docs-samples/blob/main/people-and-planet-ai/weather-forecasting/notebooks/2-dataset.ipynb)\n",
|
42 | 42 | "\n",
|
|
137 | 137 | },
|
138 | 138 | {
|
139 | 139 | "cell_type": "code",
|
140 |
| - "execution_count": 3, |
| 140 | + "execution_count": null, |
141 | 141 | "metadata": {
|
142 | 142 | "id": "xGXRHJ9TFs24"
|
143 | 143 | },
|
|
282 | 282 | "\n",
|
283 | 283 | "Once we have bins for both precipitation and elevation, we combine them into a single \"unique\" bin value to make sure we get all the possible precipitation values for each elevation.\n",
|
284 | 284 | "\n",
|
285 |
| - "In [`create_dataset.py`](create_dataset.py) we defined a function called `sample_points` that gives us a balanced selction of `(longitude, latitude)` coordinates for a given date." |
| 285 | + "In [`create_dataset.py`](../create_dataset.py) we defined a function called `sample_points` that gives us a balanced selction of `(longitude, latitude)` coordinates for a given date." |
286 | 286 | ],
|
287 | 287 | "id": "hWq2BMYMcAEj"
|
288 | 288 | },
|
289 | 289 | {
|
290 | 290 | "cell_type": "code",
|
291 |
| - "execution_count": 4, |
| 291 | + "execution_count": null, |
292 | 292 | "metadata": {
|
293 | 293 | "colab": {
|
294 | 294 | "base_uri": "https://localhost:8080/"
|
|
369 | 369 | "We predefined that all our training examples would be 5 pixels width by 5 pixels height, but we could choose any size as long as the model accepts it.\n",
|
370 | 370 | "We also want all the training examples to be the same size so we can batch them.\n",
|
371 | 371 | "\n",
|
372 |
| - "In [`create_dataset.py`](create_dataset.py) we defined `get_training_example`, which fetches an `(inputs, labels)` pair for the given date and (longitude, latitude) coordinate.\n", |
| 372 | + "In [`create_dataset.py`](../create_dataset.py) we defined `get_training_example`, which fetches an `(inputs, labels)` pair for the given date and (longitude, latitude) coordinate.\n", |
373 | 373 | "Let's see how a 64x64 patch looks like, since a 5x5 patch will only look like a bunch of random pixels to us."
|
374 | 374 | ],
|
375 | 375 | "id": "W5mr765Ahsd5"
|
376 | 376 | },
|
377 | 377 | {
|
378 | 378 | "cell_type": "code",
|
379 |
| - "execution_count": 5, |
| 379 | + "execution_count": null, |
380 | 380 | "metadata": {
|
381 | 381 | "colab": {
|
382 | 382 | "base_uri": "https://localhost:8080/"
|
|
419 | 419 | },
|
420 | 420 | {
|
421 | 421 | "cell_type": "code",
|
422 |
| - "execution_count": 6, |
| 422 | + "execution_count": null, |
423 | 423 | "metadata": {
|
424 | 424 | "colab": {
|
425 | 425 | "base_uri": "https://localhost:8080/",
|
|
488 | 488 | },
|
489 | 489 | {
|
490 | 490 | "cell_type": "code",
|
491 |
| - "execution_count": 7, |
| 491 | + "execution_count": null, |
492 | 492 | "metadata": {
|
493 | 493 | "colab": {
|
494 | 494 | "base_uri": "https://localhost:8080/",
|
|
578 | 578 | },
|
579 | 579 | {
|
580 | 580 | "cell_type": "code",
|
581 |
| - "execution_count": 24, |
| 581 | + "execution_count": null, |
582 | 582 | "metadata": {
|
583 | 583 | "colab": {
|
584 | 584 | "base_uri": "https://localhost:8080/",
|
|
624 | 624 | "outputId": "2a818de7-128e-4200-f196-f629e698d985"
|
625 | 625 | },
|
626 | 626 | "id": "tcD44qxkSSya",
|
627 |
| - "execution_count": 25, |
| 627 | + "execution_count": null, |
628 | 628 | "outputs": [
|
629 | 629 | {
|
630 | 630 | "output_type": "stream",
|
|
713 | 713 | "Local testing works great for creating small datasets and making sure everything works, but to run on a large dataset at scale it's best to use a distributed runner like\n",
|
714 | 714 | "[Dataflow](https://cloud.google.com/dataflow).\n",
|
715 | 715 | "\n",
|
716 |
| - "We can run [`create_dataset.py`](create_dataset.py) as a script and run it in [Dataflow](https://cloud.google.com/dataflow).\n", |
| 716 | + "We can run [`create_dataset.py`](../create_dataset.py) as a script and run it in [Dataflow](https://cloud.google.com/dataflow).\n", |
717 | 717 | "You can control the number of dates to sample with `--num-dates` _(default=100)_, and the number of bins to use for the stratified sampling with `--num-bins` _(default=10)_.\n",
|
718 | 718 | "\n",
|
719 | 719 | "We are using the same data extraction functions for both training and prediction.\n",
|
720 |
| - "This means our Dataflow pipelines needs access to the [`serving/weather-data`](serving/weather-data) module.\n", |
| 720 | + "This means our Dataflow pipelines needs access to the [`serving/weather-data`](../serving/weather-data) module.\n", |
721 | 721 | "Since it's a local module that does not live in [PyPI](https://pypi.org), we have to first build the module with [`build`](https://pypa-build.readthedocs.io/en/latest) and then include the package for Dataflow."
|
722 | 722 | ],
|
723 | 723 | "id": "YWAI6AetcxRH"
|
|
748 | 748 | "outputId": "516fb9b4-328a-4d41-af2a-028448559882"
|
749 | 749 | },
|
750 | 750 | "id": "1NtAJBl0TKyE",
|
751 |
| - "execution_count": 17, |
| 751 | + "execution_count": null, |
752 | 752 | "outputs": [
|
753 | 753 | {
|
754 | 754 | "output_type": "stream",
|
|
769 | 769 | },
|
770 | 770 | "outputs": [],
|
771 | 771 | "source": [
|
| 772 | + "data_path = f\"gs://{bucket}/weather/data\"\n", |
| 773 | + "\n", |
772 | 774 | "!python create_dataset.py \\\n",
|
773 |
| - " --data-path=\"gs://{bucket}/weather/data\" \\\n", |
| 775 | + " --data-path=\"{data_path}\" \\\n", |
774 | 776 | " --runner=\"DataflowRunner\" \\\n",
|
775 | 777 | " --project=\"{project}\" \\\n",
|
776 | 778 | " --region=\"{location}\" \\\n",
|
|
0 commit comments