Added notebook and data for LGCP tutorial #4087

ckrapu · 2020-09-08T03:34:58Z

This PR refers to issue #4084 and adds a new notebook showing how to fit a log Gaussian Cox process using a discrete grid approximation. It also includes an extension to model marked points. A small dataset is included as well.

review-notebook-app · 2020-09-08T03:35:02Z

Check out this pull request on

Review Jupyter notebook visual diffs & provide feedback on notebooks.

Powered by ReviewNB

codecov · 2020-09-08T04:06:28Z

Codecov Report

Merging #4087 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #4087   +/-   ##
=======================================
  Coverage   88.48%   88.48%           
=======================================
  Files          88       88           
  Lines       13865    13865           
=======================================
  Hits        12268    12268           
  Misses       1597     1597

junpenglao · 2020-09-08T05:25:11Z

LGTM overall, some nitpicks: could you change the nested for-loop count with a plt.hist2d? it should simplify the code a bit.

ckrapu · 2020-09-09T00:22:07Z

LGTM overall, some nitpicks: could you change the nested for-loop count with a plt.hist2d? it should simplify the code a bit.

I didn't realize there was an easy way to do it in 2D. Good idea! I've made the update.

AlexAndorra

This is a super nice advanced use-case of GPs, thanks @ckrapu !
I think there are just a few tweaks or precisions missing, as I wrote below. It shouldn't be too long to add and feel free to ask if anything is unclear 😉

review-notebook-app · 2020-09-09T10:14:23Z

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2020-09-09T10:14:23Z
----------------------------------------------------------------

Typo: "In conjunction with a Bayesian analysis, can be used to..." -- word missing?

ckrapu commented on 2020-09-10T00:20:16Z
----------------------------------------------------------------

Yeah, I missed the rest of that statement.

review-notebook-app · 2020-09-09T10:14:24Z

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2020-09-09T10:14:24Z
----------------------------------------------------------------

I think you can use more regularizing priors to improve sampling efficiency -- this is especially true for GP hyperparameters.

pm.Normal("lam_mean", sd=10) and pm.Uniform("length_scale", lower=0.1, upper=280) in particular are super wide

ckrapu commented on 2020-09-10T00:21:58Z
----------------------------------------------------------------

Agreed - now that I think about it, a SD of 10 on log scale is way too wide. I'm not sure how to pick the length scale, though. I've tried using HalfNormal(sd=100) and various settings for an inverse-Gamm prior. None of those seemed to help cut down on the convergences. The rule of thumb I was taught was to make sure that the upper bound is less than the largest interpoint distance. Any ideas here?

review-notebook-app · 2020-09-09T10:14:25Z

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2020-09-09T10:14:25Z
----------------------------------------------------------------

If you use return_inferencedata=True in pm.sample above, you don't need the model context here, and it'll make the calls to ArviZ functions drastically faster (pm.summary is actually just calling az.summary, so you can also call it directly).

review-notebook-app · 2020-09-09T10:14:26Z

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2020-09-09T10:14:25Z
----------------------------------------------------------------

vars is deprecated, please use var_names

review-notebook-app · 2020-09-09T10:14:27Z

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2020-09-09T10:14:26Z
----------------------------------------------------------------

I think this can be fixed by more regularizing priors upstream

review-notebook-app · 2020-09-09T10:14:28Z

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2020-09-09T10:14:27Z
----------------------------------------------------------------

You can get rid of plt.tight_layout() and add constrained_layout==True in plt.subplots(2, 3, figsize=(8, 5)) instead.

Also, add a ";" at the end of the last line of the cell to avoid matplotlib object printing in notebooks: plt.figlegend(...);

ckrapu commented on 2020-09-10T05:47:36Z
----------------------------------------------------------------

Got it!

review-notebook-app · 2020-09-09T10:14:28Z

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2020-09-09T10:14:28Z
----------------------------------------------------------------

Same comments on priors

review-notebook-app · 2020-09-09T10:14:29Z

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2020-09-09T10:14:29Z
----------------------------------------------------------------

I think the last two lines diserve a bit more explanation, as this is not really a standard prior: IIUC, you're modeling the variation around marks more than marks themselves? If yes, explaining why would be interesting.

Also, a line about what observed=mark_residuals is doing would be valuable for readers, as this is an advanced (and somewhat confusing) use of observed (mark_residuals are not actually observed).

ckrapu commented on 2020-09-10T05:48:25Z
----------------------------------------------------------------

This part didn't have to be expressed in a roundabout way; I cleaned that up.

AlexAndorra commented on 2020-09-10T08:06:25Z
----------------------------------------------------------------

Much clearer indeed, thanks!

review-notebook-app · 2020-09-09T10:14:30Z

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2020-09-09T10:14:30Z
----------------------------------------------------------------

Same comment for model context

ckrapu · 2020-09-10T00:20:17Z

Yeah, I missed the rest of that statement.

View entire conversation on ReviewNB

ckrapu · 2020-09-10T00:21:59Z

Agreed - now that I think about it, a SD of 10 on log scale is way too wide. I'm not sure how to pick the length scale, though. I've tried using HalfNormal(sd=100) and various settings for an inverse-Gamm prior. None of those seemed to help cut down on the convergences. The rule of thumb I was taught was to make sure that the upper bound is less than the largest interpoint distance. Any ideas here?

View entire conversation on ReviewNB

ckrapu · 2020-09-10T05:47:37Z

Got it!

View entire conversation on ReviewNB

ckrapu · 2020-09-10T05:48:26Z

This part didn't have to be expressed in a roundabout way; I cleaned that up.

View entire conversation on ReviewNB

AlexAndorra · 2020-09-10T08:06:26Z

Much clearer indeed, thanks!

View entire conversation on ReviewNB

AlexAndorra · 2020-09-10T08:18:11Z

Thanks for the changes @ckrapu, this looks really neat now!

I'm not sure how to pick the length scale, though. I've tried using HalfNormal(sd=100) and various settings for an inverse-Gamm prior. None of those seemed to help cut down on the convergences. Any ideas here?

Yeah, that's always (very) tricky... HalfNormal(sd=100) seems already very wide, given that you exponentiate it. I think going with Gamma or InvGamme is indeed much safer, in addition to giving you more leeway to incorporate domain knowledge. I'm curious about any advice on this from the very knowledgeable @fonnesbeck and @bwengals !

Prior pred checks with spaghetti plots and ribbon plots can also be useful -- I'm doing exactly that in a NB right now; the code is still very messy and WIP, but hopefully this can help you 🤷‍♂️ There are also prior choices in the Mauna Loa NB.
Notice though that your changes already sped up the sampling by a factor of 2+!

The rule of thumb I was taught was to make sure that the upper bound is less than the largest interpoint distance.

Yeah but even that isn't that great IIUC, as 1) it picks priors based on the observed data, and 2) maybe future data can inform larger interpoint distance.

MarcoGorelli · 2020-09-10T12:17:26Z

Hi @ckrapu - a new CI check has been added, before your next commit could you fetch and merge upstream/master? I'm pretty sure it will still pass, this is just to be sure

ckrapu · 2020-09-10T22:55:00Z

Thanks for the changes @ckrapu, this looks really neat now!

I'm not sure how to pick the length scale, though. I've tried using HalfNormal(sd=100) and various settings for an inverse-Gamm prior. None of those seemed to help cut down on the convergences. Any ideas here?

Yeah, that's always (very) tricky... HalfNormal(sd=100) seems already very wide, given that you exponentiate it. I think going with Gamma or InvGamme is indeed much safer, in addition to giving you more leeway to incorporate domain knowledge. I'm curious about any advice on this from the very knowledgeable @fonnesbeck and @bwengals !

Prior pred checks with spaghetti plots and ribbon plots can also be useful -- I'm doing exactly that in a NB right now; the code is still very messy and WIP, but hopefully this can help you 🤷‍♂️ There are also prior choices in the Mauna Loa NB.
Notice though that your changes already sped up the sampling by a factor of 2+!

The rule of thumb I was taught was to make sure that the upper bound is less than the largest interpoint distance.

Yeah but even that isn't that great IIUC, as 1) it picks priors based on the observed data, and 2) maybe future data can inform larger interpoint distance.

I spoke too soon about the priors. Those changes helped a ton with the marked model as you pointed out.

ckrapu · 2020-09-10T22:55:31Z

Hi @ckrapu - a new CI check has been added, before your next commit could you fetch and merge upstream/master? I'm pretty sure it will still pass, this is just to be sure

Yup! Done.

MarcoGorelli · 2020-09-11T07:33:27Z

@ckrapu looks like some of the imports are out of order, can you make the following change please?

+    "from itertools import product\n",
+    "\n",
     "import arviz as az\n",
     "import matplotlib.pyplot as plt\n",
     "import numpy as np\n",
     "import pandas as pd\n",
-    "import pymc3 as pm\n",
-    "\n",
-    "from itertools import product"
+    "import pymc3 as pm"

?

Or, just run isort via nbqa from the command-line, see the style guide:

nbqa docs/source/notebooks/log-gaussian-cox-process.ipynb --treat-comment-as-code '# %%' --lines-between-types 1 --nbqa-mutate

AlexAndorra · 2020-09-11T07:39:32Z

@MarcoGorelli is there a way to enforce this automatically in the GH action when the committer hasn't run nbqa (or, later, Black) on the NB?

MarcoGorelli · 2020-09-11T07:41:55Z

@MarcoGorelli is there a way to enforce this automatically in the GH action when the committer hasn't run nbqa (or, later, Black) on the NB?

As in, can we get the GitHub Action to push to the file? Maybe...I know that pre-commit's author is working on a paid product that would allow for that, but there may be a way to do this without it, I'll look into it

MarcoGorelli · 2020-09-11T07:55:46Z

@MarcoGorelli is there a way to enforce this automatically in the GH action when the committer hasn't run nbqa (or, later, Black) on the NB?

As in, can we get the GitHub Action to push to the file? Maybe...I know that pre-commit's author is working on a paid product that would allow for that, but there may be a way to do this without it, I'll look into it

@AlexAndorra looks like it's not ready yet https://pre-commit.ci/ ...hopefully it'll be available for free for open source projects. In the meantime I guess we can ask contributors to run this themselves, or a maintainer can add a commit to their branch

AlexAndorra · 2020-09-11T08:06:52Z

As in, can we get the GitHub Action to push to the file?

Yes, as in: contributor pushes to PR --> GH action checking imports/Black on NB fails --> GH action runs nbqa itself and then pushes the changes automatically. In other words, all the formatting / style is taken care of automatically 🎉

looks like it's not ready yet https://pre-commit.ci/ ...hopefully it'll be available for free for open source projects

That'd be so awesome!

In the meantime I guess we can ask contributors to run this themselves

Definitely -- that's already what we do so it's actually no change

ckrapu · 2020-09-12T03:50:16Z

The import order is fixed. Should we assume that the import order desired by nbqa is the default moving forward?

MarcoGorelli · 2020-09-12T07:28:51Z

Thanks for updating!

the import order desired by nbqa

just to be clear - it's isort that determines the import order, nbqa is just a little adapter which will let it run on Jupyter Notebooks (instead of Python files)

AlexAndorra · 2020-09-13T09:53:05Z

Should we assume that the import order desired by nbqa is the default moving forward?

Yep, but this was already the case (see the style guide) -- the only difference is that now it's checked automatically in CI 😉

AlexAndorra

Merging now, as we gave it a few days 🎉 Thanks @ckrapu for this nice addition to the docs!

ckrapu · 2020-10-27T01:57:51Z

This may be a dumb question, but I don't see a thumbnail for this notebook in the gallery. Did I forget to set something with this PR?

twiecki · 2020-10-27T05:09:57Z

@ckrapu it should be the last plot of the NB. @ColCarroll?

MarcoGorelli · 2020-10-27T08:20:57Z

@ckrapu are you referring to the "Gaussian Process" section here https://docs.pymc.io/nb_examples/index.html ?

If so, it seems that "conditional-autoregressive-model" is also missing

ckrapu · 2020-10-28T14:19:23Z

Yup - I thought that putting its name in docs/source/notebooks/table_of_contents_examples.js would automatically place it in the gallery.

Added notebook and data for LGCP tutorial

07ac5de

AlexAndorra linked an issue Sep 8, 2020 that may be closed by this pull request

Adding doc notebook for log Gaussian Cox process #4084

Closed

AlexAndorra added docs examples labels Sep 8, 2020

Replacing nested loop for data preprocessing

6a893b8

AlexAndorra requested changes Sep 9, 2020

View reviewed changes

Adjusting priors and clarifying marks model

3f35e5c

ckrapu added 2 commits September 10, 2020 15:46

Merge remote-tracking branch 'upstream/master'

1fb78a5

Minor additions to text of LGCP tutorial

cb0f41b

Fixing import order

01e2fdb

AlexAndorra approved these changes Sep 13, 2020

View reviewed changes

AlexAndorra merged commit 04ebd59 into pymc-devs:master Sep 13, 2020

Uh oh!

Added notebook and data for LGCP tutorial #4087

Added notebook and data for LGCP tutorial #4087

Uh oh!

Conversation

ckrapu commented Sep 8, 2020

Uh oh!

review-notebook-app bot commented Sep 8, 2020

Uh oh!

codecov bot commented Sep 8, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

junpenglao commented Sep 8, 2020

Uh oh!

ckrapu commented Sep 9, 2020

Uh oh!

AlexAndorra left a comment

Choose a reason for hiding this comment

Uh oh!

review-notebook-app bot commented Sep 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app bot commented Sep 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app bot commented Sep 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app bot commented Sep 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app bot commented Sep 9, 2020

Uh oh!

review-notebook-app bot commented Sep 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app bot commented Sep 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app bot commented Sep 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app bot commented Sep 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ckrapu commented Sep 10, 2020

Uh oh!

ckrapu commented Sep 10, 2020

Uh oh!

ckrapu commented Sep 10, 2020

Uh oh!

ckrapu commented Sep 10, 2020

Uh oh!

AlexAndorra commented Sep 10, 2020

Uh oh!

AlexAndorra commented Sep 10, 2020

Uh oh!

MarcoGorelli commented Sep 10, 2020

Uh oh!

ckrapu commented Sep 10, 2020

Uh oh!

ckrapu commented Sep 10, 2020

Uh oh!

MarcoGorelli commented Sep 11, 2020

Uh oh!

AlexAndorra commented Sep 11, 2020

Uh oh!

MarcoGorelli commented Sep 11, 2020

Uh oh!

MarcoGorelli commented Sep 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AlexAndorra commented Sep 11, 2020

Uh oh!

ckrapu commented Sep 12, 2020

Uh oh!

MarcoGorelli commented Sep 12, 2020

Uh oh!

AlexAndorra commented Sep 13, 2020

codecov bot commented Sep 8, 2020 •

edited

Loading

review-notebook-app bot commented Sep 9, 2020 •

edited

Loading

review-notebook-app bot commented Sep 9, 2020 •

edited

Loading

review-notebook-app bot commented Sep 9, 2020 •

edited

Loading

review-notebook-app bot commented Sep 9, 2020 •

edited

Loading

review-notebook-app bot commented Sep 9, 2020 •

edited

Loading

review-notebook-app bot commented Sep 9, 2020 •

edited

Loading

review-notebook-app bot commented Sep 9, 2020 •

edited

Loading

review-notebook-app bot commented Sep 9, 2020 •

edited

Loading

MarcoGorelli commented Sep 11, 2020 •

edited

Loading