Skip to content

Processing of large datasets

eftychios pnevmatikakis edited this page Mar 14, 2016 · 8 revisions

Processing of large datasets

The purpose of this page is to explain how to use the constrained NMF algorithm to process large datasets that create memory issues. The main steps of this process which is implemented in the file demo_memmap.m are explained below:

Reading the file and saving it in a memory mapped .mat file

This (time consuming) process has to be executed once for every new dataset. The data is loaded into memory (either the whole dataset at once, or in chunks) and then saved in a .mat file that contains the following entries:

  • Y: The data in its native dimensions as a 3d array (or 4d array if 3d imaging is performed) where the last dimension is time and each entry corresponds to a different timestep.
  • Yr: The data reshaped as a 2d array with rows corresponding to the different pixels and columns corresponding to the different timesteps.
  • nY: The minimum value over the whole dataset (in case we want to subtract it prior to processing)
  • sizY: The dimensions of Y

Intuitively it doesn't make sense to store both Y and Yr since double the space is required. However we do so (at least for the time being), because empirically it leads to much faster loading and processing of the data in the subsequent steps.

There are two functions for performing this memory mapping procedure:

  • memmap_file.m: This function assumes that the whole dataset is stored in a single .tif. It reads the file and saves it in .mat format in the same folder and with the same name. If the file is too large to fit in memory then it can be read in chunks by setting the variable chunksize to an appropriate value. As an input the user needs to specify the path to the file.
  • memmap_file_sequence.m: This function assumes that the dataset is stored in a sequence of files, where different files correspond to different (set of) timesteps named sequentially. It reads the file and saves it in .mat format in the same folder and with the same name. As an input the user needs to specify the path to the folder that contains the files.
Clone this wiki locally