BIAP8

BIAP8 - always load image data as floating point

Status

In discussion.

See this mailing list thread for discussion on an earlier version of this proposal.

Background

Summary

The problem with our current get_data method is that the returned data type is difficult to predict, and can switch between integer and floating point types depending on values in the image header.

The underlying problem is that the author and the user of a given NIfTI image would be unlikely to expect that the scalefactors of the NIfTI header (which the user will probably not be aware of) will affect the calculations done on the image data after loading into memory.

In detail

At the moment, if you do this:

img = nib.load('my_image.nii')
data = img.get_data()

then the data type (dtype) of the returned data array depends on the values in the header of my_image.nii. Specifically, if the raw on-disk data type is np.int16 (it is often is) and the header scalefactor values are default (1 for slope, 0 for intercept) then you will get back an array of the on disk data type - here - np.int16.

This is very efficient on memory, but it can be a real trap unless you are careful.

For example, let's say you had a pipeline where you did this:

sum = img.get_data().sum()

That would work fine most of the time, when the data on disk is floating point, or the scalefactors are not default (1, 0). Then one day, you get an image with int16 data type on disk and 1, 0 scalefactors, and your sum calculation silently overflows. I (MB) ran into this when teaching - I had to cast some image arrays to floating point to get sensible answers.

Current implementation

get_data has the following implementation, at time of writing:

def get_data(self):
    """ Return image data from image with any necessary scalng applied

    If the image data is a array proxy (data not yet read from disk) then
    read the data, and store in an internal cache.  Future calls to
    ``get_data`` will return the cached copy.

    Returns
    -------
    data : array
        array of image data
    """
    if self._data_cache is None:
        self._data_cache = np.asanyarray(self._dataobj)
    return self._data_cache

Note that:

self._dataobj may well be an array proxy object;
np.asanyarray forces the read of an array proxy object into a numpy array;
the read also fills an internal cache.

Proposal - add, prefer `get_fdata` method

The future default behavior of nibabel should be to do the thing least likely to trip you up by accident. But - we do not want the result of get_data to change between nibabel versions.

step 1: now - add get_fdata method:

def get_fdata(self, dtype=np.float64):
    """ Return floating point image data with any necessary scalng applied

    If the image data is a array proxy (data not yet read from disk) then
    read the data, and store in an internal cache.  Future calls to
    ``get_fdata`` on the same instance will return the cached copy.

    Parameters
    ----------
    dtype : numpy dtype specifier
        A numpy dtype specifier specifying a floating point type.  Data is
        returned as this floating point type.

    Returns
    -------
    fdata : array
        array of image data of data type `dtype`.
    """
    dtype = np.dtype(dtype)
    if not issubclass(dtype, np.inexact):
        raise ValueError('{} should be floating point type'.format(dtype))
    if self._fdata_cache is None:
        self._fdata_cache = np.asanyarray(self._dataobj).astype(dtype)
    return self._fdata_cache

Change all instances of get_data in documentation to get_fdata.

Add warning about pending deprecation in get_data method, with suggestion to use get_fdata or np.asanyarray(img.dataobj) if you want the previous behavior.

Add floating point cache self._fdata_cache to cache cleared by uncache method.

step 2: around one year from now - deprecate get_data method;
step 3: around three years from now - remove get_data method and associated self._data_cache attribute.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

BIAP8

BIAP8 - always load image data as floating point

Status

Background

Summary

In detail

Current implementation

Proposal - add, prefer `get_fdata` method

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

BIAP8

BIAP8 - always load image data as floating point

Status

Background

Summary

In detail

Current implementation

Proposal - add, prefer get_fdata method

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

Proposal - add, prefer `get_fdata` method