Skip to content

Parallelization of multi start optimization

adrianhauber edited this page May 27, 2019 · 3 revisions

In it's current state, Data2Dynamics only supports parallelization over experimental conditions but not over multi-start optimization runs. An easy way to implement such a parallelization is by starting multiple instances of MATLAB on one machine and have each of them run a small batch of the full number of multi-start optimization runs. In this article, we want to introduce you to this straightforward way of parallelization with a small example.

Let's first initialize D2D, load model and data, compile and save the workspace:

arInit;
arLoadModel('model');
arLoadData('data_for_model');
arCompileAll;

arSave('my_workspace');

Now, let's say we have four processor cores available and want to fit with 15 multi-start runs:

multistart_runs = 15;
parallel_instances = 4;

The basic idea now is to have a bash script startup.sh that does start multiple instances of MATLAB that call a function doWork.m. For the latter, we need to store some variables in a configuration struct:

conf.pwd = pwd;
conf.d2dpath = fileparts(which('arInit.m'));
conf.workspace = ar.config.savepath;
conf.parIn = parallel_instances; % number of matlab instances
conf.totNum = ceil(multistart_runs/parallel_instances)*...
    parallel_instances; % extend total number of multistart runs to the next multiple of parallel_instances without loss of computation time

save('parallel_conf.mat', 'conf');

Now call startup.sh:

for icall = 1:parallel_instances
    system(sprintf('cd %s; sh startup.sh %i', conf.pwd, icall));
end

startup.sh now has to open multiple instances of matlab which can be realized using the screen command:

screen -d -m /Applications/MATLAB_R2019a.app/bin/matlab -nodisplay -r "addpath('~/Projekte/d2d/arFramework3'); doWork('$1'); exit;"

Make sure you put in the correct MATLAB and d2d path. The function doWork.m can look like this:

function y = doWork(icall)
    icall = str2num(icall);
    
    load('parallel_conf.mat', 'conf'); % load config 

    cd(conf.pwd);
    addpath(conf.d2dpath);
    
    arInit;
    arLoad(conf.workspace);

    arFitLHS(conf.totNum/conf.parIn , icall); 
    % conf.totNum/conf.parIn is the number of fits each instance of MATLAB has to do
    % use icall as a random seed to make sure every instance fits for
    % different initial parameter vectors

    arSave(['par_result_' num2str(icall) '.mat'], 'ar');
end

After all the calculation is finished, there should be parallel_instances = 4 folders named *_par_result_* in Results/, each containing the results of ceil(multistart_runs/parallel_instances) = 4 multi-start runs. These can be conveniently collected by using arMergeFitsCluster('par_result').

Clone this wiki locally