Skip to content

Commit f9b88ba

Browse files
athewseyVerdi March
andcommitted
feat: kwargs in ProcessingStep
Accepting source_dir but not the range of other FrameworkProcessor run() args to ProcessingStep would constitute a confusing half-fix. Insisting on get_run_args() would introduce behaviour differences vs ScriptProcessor and also a more confusing API (why is 'code' present but not 'source_dir'? Why are the 'inputs' here different from the 'inputs' when I call run?) in the medium term. kwargs allows us to move towards a more concise, logical ProcessingStep API (just supply the same args you would to .run()) and aim to simplify SparkProcessor's isolated get_run_args() situation in later commits. Co-authored-by: Verdi March <[email protected]>
1 parent 2a092e9 commit f9b88ba

File tree

2 files changed

+18
-29
lines changed

2 files changed

+18
-29
lines changed

src/sagemaker/workflow/steps.py

Lines changed: 18 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -485,44 +485,39 @@ def __init__(
485485
outputs: List[ProcessingOutput] = None,
486486
job_arguments: List[str] = None,
487487
code: str = None,
488-
source_dir: str = None,
489488
property_files: List[PropertyFile] = None,
490489
cache_config: CacheConfig = None,
491490
depends_on: Union[List[str], List[Step]] = None,
492491
retry_policies: List[RetryPolicy] = None,
493-
kms_key=None,
492+
**kwargs,
494493
):
495494
"""Construct a ProcessingStep, given a `Processor` instance.
496495
497496
In addition to the processor instance, the other arguments are those that are supplied to
498-
the `process` method of the `sagemaker.processing.Processor`.
497+
the `run()` method of the :class:`~sagemaker.processing.Processor`.
499498
500499
Args:
501500
name (str): The name of the processing step.
502-
processor (Processor): A `sagemaker.processing.Processor` instance.
501+
processor (Processor): A :class:`~sagemaker.processing.Processor` instance.
503502
display_name (str): The display name of the processing step.
504503
description (str): The description of the processing step.
505-
inputs (List[ProcessingInput]): A list of `sagemaker.processing.ProcessorInput`
506-
instances. Defaults to `None`.
507-
outputs (List[ProcessingOutput]): A list of `sagemaker.processing.ProcessorOutput`
508-
instances. Defaults to `None`.
509-
job_arguments (List[str]): A list of strings to be passed into the processing job.
510-
Defaults to `None`.
511-
code (str): S3 URI or local path to a file with the user script to run. If
512-
``source_dir`` is specified (for ``processor``s that support it), then ``code``
513-
must be a path relative to the root of ``source_dir``. Defaults to `None`.
514-
source_dir (str): S3 URI or local path to a folder with any other containing processing
515-
source code dependencies aside from the entry point ``code`` file. This parameter
516-
is only supported when using a 'processor' based on ``FrameworkProcessor``. If
517-
an S3 URI is provided, it must point to a tar.gz file. Defaults to `None`.
504+
inputs (List[ProcessingInput]): A list of inputs to the processing job. Defaults to
505+
`None`.
506+
outputs (List[ProcessingOutput]): A list of outputs from the processing job. Defaults
507+
to `None`.
508+
job_arguments (List[str]): A list of command line arguments to be passed into the
509+
processing job. Defaults to `None`.
510+
code (str): This can be an S3 URI or a local path to a file with the framework
511+
script to run. Defaults to `None`.
518512
property_files (List[PropertyFile]): A list of property files that workflow looks
519513
for and resolves from the configured processing output list.
520-
cache_config (CacheConfig): A `sagemaker.workflow.steps.CacheConfig` instance.
521-
depends_on (List[str] or List[Step]): A list of step names or step instance
522-
this `sagemaker.workflow.steps.ProcessingStep` depends on
514+
cache_config (CacheConfig): Step result caching configuration.
515+
depends_on (List[str] or List[Step]): A list of step names or step instances this
516+
`sagemaker.workflow.steps.ProcessingStep` depends on.
523517
retry_policies (List[RetryPolicy]): A list of retry policy
524-
kms_key (str): The ARN of the KMS key that is used to encrypt the
525-
user code file. Defaults to `None`.
518+
**kwargs: Additional arguments as per ``processor.run()``, depending on your processor
519+
type. For example may include ``source_dir`` for processors based on
520+
:class:`~sagemaker.processing.FrameworkProcessor`.
526521
"""
527522
super(ProcessingStep, self).__init__(
528523
name, StepTypeEnum.PROCESSING, display_name, description, depends_on, retry_policies
@@ -534,11 +529,7 @@ def __init__(
534529
self.code = code
535530
self.property_files = property_files
536531
self.job_name = None
537-
538-
self.processor_kwargs = dict(kms_key=kms_key)
539-
# Optional args supported by only a subset of Processor classes:
540-
if source_dir is not None:
541-
self.processor_kwargs["source_dir"] = source_dir
532+
self.processor_kwargs = kwargs
542533

543534
# Examine why run method in sagemaker.processing.Processor mutates the processor instance
544535
# by setting the instance's arguments attribute. Refactor Processor.run, if possible.

tests/unit/sagemaker/workflow/test_steps.py

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -616,7 +616,6 @@ def test_processing_step_normalizes_args_with_local_code(mock_normalize_args, sc
616616
inputs=step.inputs,
617617
outputs=step.outputs,
618618
code=step.code,
619-
kms_key=None,
620619
)
621620

622621

@@ -690,7 +689,6 @@ def test_processing_step_normalizes_args_with_no_code(mock_normalize_args, scrip
690689
inputs=step.inputs,
691690
outputs=step.outputs,
692691
code=None,
693-
kms_key=None,
694692
)
695693

696694

0 commit comments

Comments
 (0)