-
Notifications
You must be signed in to change notification settings - Fork 532
Discuss: Nipype boilerplate simplification/ v2 API #2539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
instead of 3, we should also be able to do:
would use the cwl library to parse the file and convert to a nipype workflow. and correspondingly |
also for
options could be |
How about this as a Python-based DSL? class Workflow1(Workflow):
a = BET(...)
wf2 = Workflow2(..., inputs={'node2.in1': a.out1})
b = Map(antsRegistration(..., in1=a.out1), 'in1', propagate_failures=True)
x = SomeFunc(..., in1=b.out1)
c = Join(ImageMaths(..., in1=wf2.node1.out1), 'b.in1', failure_mode='allow_any')
wf1 = Workflow1()
wf1.run() This seems like it would have to be something of syntactic sugar, and |
@effigies - i can live with that and construct a graph from this. |
I feel that @effigies' suggestion would also allow to easily set up conditional nodes and other decision making within the definition of the Workflow |
@oesteban - indeed I was about to bring up conditionals and loops. I'm not sure how that would fit in to the dsl yet. We should flesh that piece out before settling on the API. |
@effigies How much overhead is that going to add to interface creation? |
@satra I would be interested in conditionals and loops in the format suggested in (2), as well. And Also, it would be good to translate a real workflow - that uses iterables, @Shotgunosine Overhead in what sense? |
I mean when you say that init would need to do the work of turning the class variables into instance variables. Is that work that people adding interfaces need to do? |
My two cents: I really like the compact @satra outlines. I'm a big fan of functional programming so it seems very natural to me. I think that would be a pleasing format to teach to a complete beginner too though. |
I do like the compact form of 2 above as well. |
Oh, I see. No, I meant that we would need to code the base That said, often workflow setup does need to be dynamically adjusted at runtime, which could be handled like so: class MyWorkflow(Workflow):
...
def __init__(self, ...):
super(MyWorkflow, self).__init__(self, ...)
# Workflow is fully initialized, and any runtime
# additions/modifications can be done here We could also approach my syntax as sugar on top of form 2, so post-initializing setup could be done outside the workflow itself: wf = MyWorkflow(distribute=True)
if inputs_look_like_this:
wf.add('y', SomeOtherNode(...))
else:
wf.add('z', YetAnother(...)) To be clear, I think form 2 looks reasonable, and I only arrived at my approach by looking at it and thinking what could be cut further. This replaces (perhaps optionally?) |
@effigies - i'll try to apply @djarecka's api + this discussion to init_bold_surf_wf |
here is an attempt. there are several details that need to be sorted out, but something like this should work in principle. wf = Workflow(name, mem_gb_node=DEFAULT_MEMORY_MIN_GB,
inputs=['source_file', 't1_preproc', 'subject_id',
'subjects_dir', 't1_2_fsnative_forward_transform',
'mem_gb', 'output_spaces', 'medial_surface_nan'],
outputs='surfaces')
@interface
def select_target(subject_id, space):
""" Given a source subject ID and a target space, get the target subject ID """
return subject_id if space == 'fsnative' else space
wf.add('targets', select_target(subject_id=wf.inputs.subject_id))
.map('space', space=[space for space in wf.inputs.output_spaces
if space.startswith('fs')])
wf.add('rename_src', Rename(format_string='%(subject)s',
keep_ext=True,
in_file=wf.inputs.source_file))
.map('subject')
wf.add('resampling_xfm',
fs.utils.LTAConvert(in_lta='identity.nofile',
out_lta=True,
source_file=wf.inputs.source_file,
target_file=wf.inputs.t1_preproc)
.add('set_xfm_source', ConcatenateLTA(out_type='RAS2RAS',
in_lta2=wf.inputs.t1_2_fsnative_forward_transform,
in_lta1=wf.resampling_xfm.out_lta))
wf.add('sampler',
fs.SampleToSurface(sampling_method='average', sampling_range=(0, 1, 0.2),
sampling_units='frac', interp_method='trilinear',
cortex_mask=True, override_reg_subj=True,
out_type='gii',
subjects_dir=wf.inputs.subjects_dir,
subject_id=wf.inputs.subject_id,
reg_file=wf.set_xfm_source.out_file,
target_subject=wf.targets.out,
source_file=wf.rename_src.out_file),
mem_gb=mem_gb * 3)
.map([('source_file', 'target_subject'), 'hemi'], hemi=['lh', 'rh'])
wf.add_cond('cond1',
condition=wf.inputs.medial_surface_nan,
iftrue=wf.add('medial_nans', MedialNaNs(subjects_dir=wf.inputs.subjects_dir,
in_file=wf.sampler.out_file,
target_subject=wf.targets.out))
.set_output('out', wf.median_nans.out),
elseclause=wf.set_output('out', wf.sampler.out_file))
wf.add('merger', niu.Merge(1, ravel_inputs=True,
in1=wf.cond1.out),
run_without_submitting=True)
.join('sampler.hemi')
wf.add('update_metadata',
GiftiSetAnatomicalStructure(in_file=wf.merger.out))
wf.outputs.surfaces = wf.update_metadata.out_file |
I believe we already support 1), I like 2), and I agree that support for other established DSL would be nice. On a more general note I am afraid the inner circle of nipype veterans might have limited insight into everyday user struggles. I wonder if we could set up a survey and/or virtual focus groups to figure out what are the pain points for most users. |
I personally like the Satra's suggestions 2, don't see too much difference between compact or the original version. I'm struggling to see the workflow structure in the DSL example. |
@chrisfilo - i like the idea of doing a survey. it's been a while since we did something like that. how about we brainstorm a little bit internally (although publicly), and propose a working draft. i do want to borrow some ideas from tensorflow, which may allow a more pythonic style.
@djarecka - could you please clarify what you mean by that? |
Here's my approach, which is admittedly not very clear on how class BOLDSurfaceWorkflow(pe.Workflow):
inputs = Inputs(source_file=File(exists=True),
t1_preproc=File(exists=True),
subject_id=traits.Str(),
subjects_dir=Directory(exists=True)
t1_2_fsnative_forward_transform=File(exists=True))
@interface
def select_target(subject_id, space):
""" Given a source subject ID and a target space, get the target subject ID """
return subject_id if space == 'fsnative' else space
targets = Join(Map(select_target(subject_id=inputs.subject_id), 'space'))
rename_src = Join(Map(Rename(format_string='%(subject)s', keep_ext=True,
in_file=inputs.source_file), 'subject'))
resampling_xfm = LTAConvert(in_lta='identity.nofile', out_lta=True,
source_file=inputs.source_file,
target_file=inputs.t1_preproc)
set_xfm_source = ConcatenateLTA(out_type='RAS2RAS', in_lta1=resampling_xfm.out_lta,
in_lta2=inputs.t1_2_fsnative_forward_transform)
sampler = Join(
Map(
SampleToSurface(sampling_method='average', sampling_range=(0, 1, 0.2),
sampling_units='frac', interp_method='trilinear',
cortex_mask=True, override_reg_subj=True, out_type='gii',
subjects_dir=inputs.subjects_dir, subject_id=inputs.subject_id,
reg_file=set_xfm_source.out_file, target_subject=targets.out,
source_file=rename_src.out_file),
[('source_file', 'target_subject'), 'hemi'], hemi=['lh', 'rh']),
'source_file')
merger = Join(Merge(1, ravel_inputs=True, in1=sampler.out_file), 'sampler.hemi')
update_metadata = Join(
Map(GiftiSetAnatomicalStructure(in_file=merger.out), 'in_file'),
'in_file')
outputs = Outputs(surfaces=update_metadata.out_file)
def __init__(self, mem_gb, output_spaces, medial_surface_nan,
name='bold_surf_wf', *args, **kwargs):
super(BOLDSurfaceWorkflow, self).__init__(name, *args, **kwargs)
spaces = [space for space in output_spaces if space.startswith('fs')]
self.targets.space = spaces
self.rename_src.subject = spaces |
@satra - well, this was just my subjective impression reading the syntax. But in general, I like this additional piece of syntax that tells you to I was a bit afraid that in DSL syntax, you can put a lot of completely unrelated calculations without harm to the syntax, I mean it would still work, but won't be readable. If you have this explicit links (e.g. |
Some thoughts on branching. This is super provisional. While loopsI expect the body of a loop would be most easily thought of as a sub-workflow: class Iterable(Workflow):
inputs = Inputs(['a', 'b', 'c', 'condition'])
node1 = ...
node2 = ...
outputs = Outputs(['a', 'b', 'c', 'condition']) (Certainly one could imagine an interface with the outputs being a superset of the inputs and including a condition, so we don't need to limit ourselves to a workflow.) We could then introduce class MyWorkflow(Workflow):
refine = While(Iterable(...), 'condition', lambda x, tol=0.0001: x < tol) We might also want to consider some syntax like To translate this into @satra's 2-form: my_wf.add('refine', Iterable(...)).while('condition') There are two approaches to the output of a ConditionalI see @satra above suggested: wf.add_cond('cond1',
condition=wf.inputs.condition,
iftrue=wf.<operation>,
elseclause=wf.<operation>) This is a pretty different approach, as it involves manipulating the workflow structure during workflow-runtime. To be clear, I'm not rejecting this idea, but it's not how I've been thinking about workflows. I think I would again talk in terms of a conditional sub-workflow (or node), but instead of a loop which requires a state, it's a simple on-off. To make it an class Either(Workflow):
inputs = Inputs(['a', 'b', 'c'], condition=traits.Bool)
left = WorkflowA(in1=inputs.a, in2=inputs.b, enable=inputs.condition)
right = WorkflowB(in1=inputs.a, in2=inputs.c, enable=not inputs.condition)
outputs = Outputs(out1=ifelse(inputs.condition, left.out1, right.out1),
out2=ifelse(inputs.condition, left.out2, right.out2)) To adapt this to @satra's notation, I might render this as: wf.add('left', WorkflowA(...), enable=inputs.condition)
wf.add('right', WorkflowB(...), enable=not inputs.condition)
wf.add('final_out', Identity(out1=ifelse(inputs.condition, 'left.out1'), ...) RecursionGiven this approach for class Iterate(Workflow):
inputs = Inputs(['a', 'b'], enable=traits.Bool)
body = WorkflowBody(a=inputs.a, b=inputs.b)
iterate = Iterate(a=body.out1, b=body.out2, enable=not body.completed)
outputs = Outputs(out1=ifelse(body.completed, body.out1, iterate.out1),
out2=ifelse(body.completed, body.out2, iterate.out2)) I'm not sure that this is at all a good idea, but seems worth considering. |
@chrisfilo I agree that getting feedback on the difficult points of the learning curve would be good. I am curious though whether surveys or focus groups are considered particularly effective ways of discovering these, in organizations that have the resources to regularly research UX. In my experience, nipype is somewhat like git, in that it's hard to put your finger on what's difficult to learn as you learn it, and it's hard to retrospectively say what was difficult as well. But perhaps others are better at introspection. Nipype workshops (and to a lesser extent, Brainhacks) seem like a good place to get feedback. You have the opportunity for one-on-one training, which gives the more experienced user a chance to observe the sticking points, and can finish the workshop with a focus group where you explicitly ask for user assessment of the difficulties. (Do we have any such data from previous workshops?) Anyway, I think it's probably a good idea to distribute a survey, too. |
One thing I would also like to note: for nipype 2.0 we agreed that subworkflows are also nodes, correct? From outside, a node wrapping a workflow and a node wrapping an Interface should be fundamentally the same, shouldn't they? |
That's my understanding. |
Okay, so in all examples above |
Ah, yeah, I didn't mean to leave |
@satra On the topic of executing CWL on Nipype, you might be interested in the CWL support in Toil (https://github.com/BD2KGenomics/toil/blob/master/src/toil/cwl/cwltoil.py) and Airflow (https://github.com/Barski-lab/cwl-airflow/) both of which are workflow engines with a Python API. They use cwltool to load the document, then convert to the native workflow graph and execute the graph. This only requires a few hundred lines of code. |
Also on the topic of CWL, how do they represent conditionals and loops, if they do? It may be good to provide a familiar syntax. |
Moving away from the workflow perspective - for the Nipype 2.0 I would very much like to see a couple of features inter-related:
Opinions? |
I think this is reasonable, especially given that interfaces can't generally be reused. And if we do successfully merge interfaces and nodes... Which is actually a question I have: We've had a useful functional difference between interfaces and nodes, which is that the interface makes no attempt to change the behavior of the wrapped functionality, while the node works to keep the effects of the interface contained. What's the proposed way of achieving this dual functionality in 2.0, or is the un-wrapped interface to be a thing of the past?
I do like something like |
this depends on whether we merge nodes and interfaces into a common base. perhaps to satisfy the however, i would be willing to consider unwrapped interfaces a thing of the past. perhaps it would be good to hear other opinions on this.
i think everything in nipype is along those lines (like On related boilerplate notes, @djarecka and i discussed two operations that the node/workflow will have: |
@oesteban - going back to your earlier question about nodes and workflows class Node/BaseInterface:
pass
class Workflow(Node):
pass |
i like @satra's second example in the first comment. it's similar to keras's sequential model api. i was going to suggest something similar to keras's functional model api, but the sequential-model-like api is easier to use. |
…ges in deepcopy of node
Uh oh!
There was an error while loading. Please reload this page.
Summary
When one writes a nipype interface or workflow there is a fair bit of boilerplate. The intent of this discussion is to figure out how we reduce/simplify this, and to think about the v2 api.
should we:
except for special interfaces (e.g., Function)
or more compactly:
.add
will return a composite reference to both the workflow and the interface added@djarecka @Shotgunosine @effigies @chrisfilo @oesteban - open for discussion.
The text was updated successfully, but these errors were encountered: