-
Notifications
You must be signed in to change notification settings - Fork 17
fix index bug after spatial query #163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #163 +/- ##
==========================================
+ Coverage 78.98% 79.03% +0.04%
==========================================
Files 11 11
Lines 1342 1345 +3
==========================================
+ Hits 1060 1063 +3
Misses 282 282
|
Thanks @Sonja-Stockhaus for the PR. This PR solves my specific use case for the notebook, but unfortunately during review I noticed that it leads to a bug. In short, the link between the table and the shapes is done by matching a pair of columns in the table (the region_key and instance_key columns, that together act as a multi-index), and the index key in the shapes. If you reset the index in the shape this link can be lost, especially after a query operation where one expect to discard some rows. I have wrote this code that reproduces the bug. import spatialdata as sd
from spatialdata.models import TableModel
from spatialdata.datasets import blobs
import spatialdata_plot
from napari_spatialdata import Interactive
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as patches
sdata = blobs()
# let's fetch the table metadata
region = sdata.table.uns[TableModel.ATTRS_KEY]['region']
region_key = sdata.table.uns[TableModel.ATTRS_KEY]['region_key']
instance_key = sdata.table.uns[TableModel.ATTRS_KEY]['instance_key']
# let's see what the table annotates
print(region)
# it annotates the labels; let's make the table annotate the circles instead
new_table = sdata.table.copy()
sdata.table.obs[region_key] = 'blobs_circles'
# there were 21 labels, but now there are only 5 circles, we need to subset
new_table = sdata.table[:5]
new_table.uns[TableModel.ATTRS_KEY]['region'] = 'blobs_circles'
# double checking that the indices match
print(new_table.obs[instance_key].to_numpy())
print(sdata['blobs_circles'].index.to_numpy())
# they don't, let's fix this
new_table.obs[instance_key] = np.array(range(5))
# let's add some annotations
new_table.obs['annotation'] = ['a', 'b', 'c', 'd', 'e']
new_table.obs['annotation'] = new_table.obs['annotation'].astype('category')
# the new table is ready, let's replace the old one
del sdata.table
sdata.table = new_table
# subsetting the data
xmin = 100
ymin = 150
xmax = 400
ymax = 250
sdata_cropped = sdata.query.bounding_box(axes=('x', 'y'), min_coordinate=[xmin, ymin], max_coordinate=[xmax, ymax], target_coordinate_system='global', filter_table=True)
print(sdata_cropped['blobs_circles'].index)
# BUG: we get a table with 3 rows, but we have only one circle
print(sdata_cropped)
# workaround, bug with spatialdata-plot: https://github.com/scverse/spatialdata-plot/issues/167
del sdata_cropped.images['blobs_image']
del sdata_cropped.images['blobs_multiscale_image']
del sdata_cropped.labels['blobs_labels']
del sdata_cropped.labels['blobs_multiscale_labels']
# viz
axes = plt.subplots(1, 3, figsize=(10, 4))[1]
# ax 0
sdata.pl.render_shapes('blobs_circles', color='annotation').pl.show(ax=axes[0])
# ax 1
sdata.pl.render_shapes('blobs_circles', color='annotation').pl.show(ax=axes[1])
sdata_cropped.pl.render_shapes('blobs_circles').pl.show(ax=axes[1])
rect = patches.Rectangle((xmin, ymin), xmax - xmin, ymax - ymin, linewidth=5, edgecolor="k", facecolor="none")
axes[1].add_patch(rect)
# ax 2
sdata_cropped.pl.render_shapes('blobs_circles', color='annotation').pl.show(ax=axes[2])
plt.tight_layout()
# Interactive(sdata) |
Thanks for pointing out @LucaMarconato! I'll have a look at it. |
@LucaMarconato Not sure if I get the problem in your code, bc when I run it I end up with 3 circles and the table has 3 rows which is what we want, right? The other bug should be due to e.g. a multi-scale image being passed to Still, I think you're right that resetting the index might be problematic in other cases so I'll try to find a different solution. |
You are right, the code that I posted is correct, I got naively fooled by the colors. Anyway, I would make a test for the following cases, if/when they pass I think we are good to merge:
My guess is that if you are plotting based on row order, then the first and last test should still work, because after the query I match the order of the rows when |
@LucaMarconato I experimented with randomly shuffled tables a bit and so far everything worked. I'm pretty sure that it should be fine as it is, because the actual assignment of colors (using the indices) happens somewhere else and here we only have a color vector that contains the colors of the elements in the shapes table in the same order as the shapes. So dropping the index won't hurt here (anymore). And we only drop the index of a copy of the shapes table, so the original table should be unchanged. One could do everything without using drop_index() but with it you can use iterrows() which makes the code shorter. |
Thanks for the experiments and for the updates. I would still add the tests in the repository with random permutations (by saving the plot results) to be sure that nothing breaks with future PRs. Can you do it please? |
Sure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic!
Bug in the spatial query notebook in spatialdata-notebooks (scverse/spatialdata-notebooks#53)