-
Notifications
You must be signed in to change notification settings - Fork 148
Processing of large datasets
The purpose of this page is to explain how to use the constrained NMF algorithm to process large datasets that create memory issues. The main steps of this process which is implemented in the file demo_memmap.m
are explained below:
This (time consuming) process has to be executed once for every new dataset. The data is loaded into memory (either the whole dataset at once, or in chunks) and then saved in a .mat
file that contains the following entries:
-
Y
: The data in its native dimensions as a 3d array (or 4d array if 3d imaging is performed) where the last dimension is time and each entry corresponds to a different timestep. -
Yr
: The data reshaped as a 2d array with rows corresponding to the different pixels and columns corresponding to the different timesteps. -
nY
: The minimum value over the whole dataset (in case we want to subtract it prior to processing) -
sizY
: The dimensions ofY
Intuitively it doesn't make sense to store both Y
and Yr
since double the space is required. However we do so (at least for the time being), because empirically it leads to much faster loading and processing of the data in the subsequent steps.
There are two functions for performing this memory mapping procedure:
-
memmap_file.m
: This function assumes that the whole dataset is stored in a single.tif
. It reads the file and saves it in.mat
format in the same folder and with the same name. If the file is too large to fit in memory then it can be read in chunks by setting the variablechunksize
to an appropriate value. As an input the user needs to specify the path to the file. -
memmap_file_sequence.m
: This function assumes that the dataset is stored in a sequence of files, where different files correspond to different (set of) timesteps named sequentially. It reads the file and saves it in.mat
format in the same folder and with the same name. As an input the user needs to specify the path to the folder that contains the files.