Skip to content

fix index bug after spatial query #163

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Oct 10, 2023

Conversation

Sonja-Stockhaus
Copy link
Collaborator

@Sonja-Stockhaus Sonja-Stockhaus commented Sep 26, 2023

Bug in the spatial query notebook in spatialdata-notebooks (scverse/spatialdata-notebooks#53)

@codecov-commenter
Copy link

codecov-commenter commented Sep 26, 2023

Codecov Report

Merging #163 (841f0c7) into main (0e0ecb1) will increase coverage by 0.04%.
The diff coverage is 100.00%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #163      +/-   ##
==========================================
+ Coverage   78.98%   79.03%   +0.04%     
==========================================
  Files          11       11              
  Lines        1342     1345       +3     
==========================================
+ Hits         1060     1063       +3     
  Misses        282      282              
Files Coverage Δ
src/spatialdata_plot/pl/render.py 89.37% <100.00%> (+0.10%) ⬆️
src/spatialdata_plot/pl/utils.py 68.92% <100.00%> (+0.04%) ⬆️

@Sonja-Stockhaus Sonja-Stockhaus marked this pull request as ready for review September 26, 2023 10:29
@LucaMarconato
Copy link
Member

Thanks @Sonja-Stockhaus for the PR. This PR solves my specific use case for the notebook, but unfortunately during review I noticed that it leads to a bug. In short, the link between the table and the shapes is done by matching a pair of columns in the table (the region_key and instance_key columns, that together act as a multi-index), and the index key in the shapes. If you reset the index in the shape this link can be lost, especially after a query operation where one expect to discard some rows.

I have wrote this code that reproduces the bug.

import spatialdata as sd
from spatialdata.models import TableModel
from spatialdata.datasets import blobs
import spatialdata_plot
from napari_spatialdata import Interactive
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as patches

sdata = blobs()

# let's fetch the table metadata
region = sdata.table.uns[TableModel.ATTRS_KEY]['region']
region_key = sdata.table.uns[TableModel.ATTRS_KEY]['region_key']
instance_key = sdata.table.uns[TableModel.ATTRS_KEY]['instance_key']

# let's see what the table annotates
print(region)

# it annotates the labels; let's make the table annotate the circles instead
new_table = sdata.table.copy()
sdata.table.obs[region_key] = 'blobs_circles'

# there were 21 labels, but now there are only 5 circles, we need to subset
new_table = sdata.table[:5]
new_table.uns[TableModel.ATTRS_KEY]['region'] = 'blobs_circles'

# double checking that the indices match
print(new_table.obs[instance_key].to_numpy())
print(sdata['blobs_circles'].index.to_numpy())

# they don't, let's fix this
new_table.obs[instance_key] = np.array(range(5))

# let's add some annotations
new_table.obs['annotation'] = ['a', 'b', 'c', 'd', 'e']
new_table.obs['annotation'] = new_table.obs['annotation'].astype('category')

# the new table is ready, let's replace the old one
del sdata.table
sdata.table = new_table

# subsetting the data
xmin = 100
ymin = 150
xmax = 400
ymax = 250
sdata_cropped = sdata.query.bounding_box(axes=('x', 'y'), min_coordinate=[xmin, ymin], max_coordinate=[xmax, ymax], target_coordinate_system='global', filter_table=True)
print(sdata_cropped['blobs_circles'].index)

# BUG: we get a table with 3 rows, but we have only one circle
print(sdata_cropped)

# workaround, bug with spatialdata-plot: https://github.com/scverse/spatialdata-plot/issues/167
del sdata_cropped.images['blobs_image']
del sdata_cropped.images['blobs_multiscale_image']
del sdata_cropped.labels['blobs_labels']
del sdata_cropped.labels['blobs_multiscale_labels']

# viz
axes = plt.subplots(1, 3, figsize=(10, 4))[1]

# ax 0
sdata.pl.render_shapes('blobs_circles', color='annotation').pl.show(ax=axes[0])

# ax 1
sdata.pl.render_shapes('blobs_circles', color='annotation').pl.show(ax=axes[1])
sdata_cropped.pl.render_shapes('blobs_circles').pl.show(ax=axes[1])
rect = patches.Rectangle((xmin, ymin), xmax - xmin, ymax - ymin, linewidth=5, edgecolor="k", facecolor="none")
axes[1].add_patch(rect)

# ax 2
sdata_cropped.pl.render_shapes('blobs_circles', color='annotation').pl.show(ax=axes[2])

plt.tight_layout()
# Interactive(sdata)

and which creates this plot, which shows the problem:
image

@LucaMarconato LucaMarconato marked this pull request as draft September 29, 2023 14:01
@Sonja-Stockhaus
Copy link
Collaborator Author

Thanks for pointing out @LucaMarconato! I'll have a look at it.

@Sonja-Stockhaus
Copy link
Collaborator Author

@LucaMarconato Not sure if I get the problem in your code, bc when I run it I end up with 3 circles and the table has 3 rows which is what we want, right? The other bug should be due to e.g. a multi-scale image being passed to _robust_transform (#167 (comment))

Still, I think you're right that resetting the index might be problematic in other cases so I'll try to find a different solution.

@LucaMarconato
Copy link
Member

LucaMarconato commented Oct 1, 2023

You are right, the code that I posted is correct, I got naively fooled by the colors.

Anyway, I would make a test for the following cases, if/when they pass I think we are good to merge:

  • performing a random permutation of the rows of the table and of the rows of the shapes and then checking if the plot is consistent
  • as above, but also doing the query by passing filter_table=False
  • as before, but now having the table mapping to two shapes elements (both shapes elements and the table are with random rows order, both shapes elements are filtered).

My guess is that if you are plotting based on row order, then the first and last test should still work, because after the query I match the order of the rows when filter_table=True (if I remember well), but then the second test should fail.

@Sonja-Stockhaus
Copy link
Collaborator Author

@LucaMarconato I experimented with randomly shuffled tables a bit and so far everything worked. I'm pretty sure that it should be fine as it is, because the actual assignment of colors (using the indices) happens somewhere else and here we only have a color vector that contains the colors of the elements in the shapes table in the same order as the shapes. So dropping the index won't hurt here (anymore). And we only drop the index of a copy of the shapes table, so the original table should be unchanged. One could do everything without using drop_index() but with it you can use iterrows() which makes the code shorter.
Does that sound logical to you? Hope I didn't overlook anything

@LucaMarconato LucaMarconato marked this pull request as ready for review October 9, 2023 21:26
@LucaMarconato
Copy link
Member

Thanks for the experiments and for the updates. I would still add the tests in the repository with random permutations (by saving the plot results) to be sure that nothing breaks with future PRs. Can you do it please?

@Sonja-Stockhaus
Copy link
Collaborator Author

Sure

Copy link
Member

@LucaMarconato LucaMarconato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic!

@Sonja-Stockhaus Sonja-Stockhaus merged commit 4fc2b04 into main Oct 10, 2023
@Sonja-Stockhaus Sonja-Stockhaus deleted the bugfix/sd-notebooks-spatial-query branch October 10, 2023 10:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants