Skip to content

Restore SKLearn FrameworkProcessor via _normalize_args #2633

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 109 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
109 commits
Select commit Hold shift + click to select a range
087482c
Framework processor: first port
verdimrc Mar 25, 2021
b8972cc
Subclasses to go to their respective submodule
verdimrc Mar 29, 2021
8d554fa
FrameworkProcessor: source_dir defaults to None
verdimrc Mar 29, 2021
b83b0d0
Remove type annotations from public APIs
verdimrc Mar 29, 2021
547e1e6
Fix circular dependency between processing.py and estimator.py
verdimrc Mar 29, 2021
28a3a44
Disable type-checker on line sagemaker.estimator.Framework
verdimrc Mar 29, 2021
3a1907f
Fix pylint errors & warnings
verdimrc Mar 30, 2021
7a4d2f5
Merge branch 'master' into pr-framework-processor
ajaykarpur Apr 21, 2021
d05a51c
Bugfix: FrameworkProcessor propagates error to SageMaker
verdimrc May 5, 2021
82836b0
One-liner docstring update on FrameworkProcessor
verdimrc May 5, 2021
fb80bae
Directly refactor SKLearnProcessor (with breaking change in its API)
verdimrc May 5, 2021
5256d02
Updated docstring in subclasses of FrameworkProcessor
verdimrc May 5, 2021
1a28de5
Bugfix: FrameworkProcessor did not pass sagemaker_session to estimato…
verdimrc May 12, 2021
047e049
mock_create_tar_file() in test_processing.py
verdimrc May 14, 2021
163c668
Updated test_sklearn_processor_with_required_parameters() for framewo…
May 15, 2021
e884ba0
Fix all tests that rely on _get_expected_args()
verdimrc May 18, 2021
85dd50f
Merge branch 'test-sklearn-modular' into pr-framework-processor-round-02
May 18, 2021
dac7ed3
Fixed test_sklearn_with_all_parameters()
May 18, 2021
24534b3
Fixed 2 tests: test_sklearn_with_all_parameters_via_run_args{,_called…
May 18, 2021
dd7a869
Merge branch 'test-sklearn-all-modular' into pr-framework-processor-r…
May 18, 2021
2926db1
Merge remote-tracking branch 'verdi/pr-framework-processor-round-02' …
athewsey May 18, 2021
9b98e50
Fix FrameworkProcessor to call underlying estimator with instance_cou…
May 18, 2021
8ec3226
Merge remote-tracking branch 'verdi/pr-framework-processor-round-02' …
athewsey May 18, 2021
34f95bd
refactor: SKLearn fix comment & unused param
athewsey May 18, 2021
7dc43dc
change(processing): refactor s3_prefix & payload
athewsey May 18, 2021
ffcfce6
Merge pull request #4 from athewsey/feat/fw-processor
verdimrc May 19, 2021
0987f2b
Updated documents on internal framework processor payload
May 19, 2021
7aeffc5
framework processor: fix linter warning
May 19, 2021
51464dd
fix: restore 'command' for FrameworkProcessors
athewsey May 19, 2021
ca8490a
Merge pull request #6 from athewsey/feat/fw-processor
verdimrc May 19, 2021
4a1c53c
Unit test framework processor with source_dir and dependencies
May 19, 2021
690b8ae
FrameworkProcessor: revert entry_point back to code
May 19, 2021
ea95f87
Merge pull request #9 from verdimrc/fp-revert-to-code
verdimrc May 19, 2021
3e531dd
Added FrameworkProcessor.get_run_args()
May 20, 2021
05bf6d7
FrameworkProcessor: runproc.sh to deal with possibly broken pip on so…
May 20, 2021
dd48ee6
Merge pull request #11 from verdimrc/fp-get-run-args
verdimrc May 20, 2021
1030bdf
test(processing): SKLearn requirements.txt test
athewsey May 21, 2021
782bbd4
Rename test_sklearn.py to test_processing_sklearn.py
verdimrc May 21, 2021
7afca7e
Merge pull request #15 from athewsey/feat/fw-processor
verdimrc May 21, 2021
0b1efdd
added unit test pytorch required params
May 21, 2021
ee0a7e5
Merge pull request #20 from verdimrc/fp-pytorch-unit-test
verdimrc May 21, 2021
8b9ff09
added unit test for xgboost processor
May 21, 2021
d06ff5e
Bugfix: reverted back to integ/test_sklearn.py
May 21, 2021
6721bfe
Merge branch 'master' into pr-framework-processor
ajaykarpur May 21, 2021
4312db9
Merge pull request #21 from verdimrc/fp-xgboost-unit-test
verdimrc May 21, 2021
bb9613b
feat(processing): add HuggingFaceProcessor
athewsey May 21, 2021
d3d41b8
added a unit test for mxnet processor
May 21, 2021
1ed9349
fix: un-break local mode on FrameworkProcessor
athewsey May 22, 2021
e9cdf75
Merge pull request #22 from athewsey/feat/fw-processor
verdimrc May 23, 2021
75e7645
Merge pull request #23 from verdimrc/fp-mxnet-uni-test
verdimrc May 23, 2021
e9ad1ce
Fix pylint W0102 (+few others) on farmework processors
May 23, 2021
0e2bf64
Fix pylint W0102 (+few others) on huggingface processors
May 23, 2021
b8fcd1b
Merge pull request #24 from verdimrc/fp-fix-pylint
verdimrc May 23, 2021
52ca20b
Added unit test for tf processor
May 24, 2021
2fe9ede
remove unused import
May 24, 2021
7da027c
Merge pull request #25 from verdimrc/pr-tf-processor-unit-test
verdimrc May 25, 2021
3229b0f
Integration test for FrameworkProces and SageMaker workflow
May 26, 2021
00c8d57
Merge pull request #27 from verdimrc/fp-workflow
verdimrc May 26, 2021
e648c78
removed unnecessary import
May 26, 2021
195e3f9
Merge pull request #29 from verdimrc/fp-fix-unit-tests
verdimrc May 26, 2021
a7ea9db
Made test local processor to not depend on region setting
May 27, 2021
3525a4b
Merge pull request #30 from verdimrc/fp-fix-test-local
verdimrc May 27, 2021
7ae31fd
Added integration test for MXNetProcessor
May 28, 2021
0de3a8f
Added integration test for PyTorchProcessor
May 28, 2021
69d791b
Merge pull request #35 from verdimrc/fp-integ-test-non-tf
verdimrc May 28, 2021
b41b4ae
fix: Rename XGBoostEstimator->XGBoostProcessor
athewsey May 28, 2021
bd39ff5
test: Add integ test for TensorFlowProcessor
athewsey May 28, 2021
072b84f
Update tests/integ/test_xgboost.py
verdimrc May 28, 2021
c91ab64
Merge pull request #36 from athewsey/feat/fw-processor
verdimrc May 28, 2021
5ec5378
Fix XGBoost unit test
May 28, 2021
408d5a3
Merge branch 'master' into pr-framework-processor-round-02
May 28, 2021
d78850e
Merge branch 'pr-framework-processor' into pr-framework-processor-rou…
verdimrc May 28, 2021
5aa0335
Merge pull request #37 from verdimrc/pr-framework-processor-round-02
verdimrc May 28, 2021
e38da07
Fix linter errors
May 28, 2021
fa0c309
Merge branch 'master' into pr-framework-processor
ahsan-z-khan Jun 9, 2021
dd0294d
Merge branch 'master' into pr-framework-processor
verdimrc Jun 11, 2021
f537cc8
Merge remote-tracking branch 'upstream/master' into feat/fw-processor
athewsey Jul 16, 2021
4ded19c
Merge remote-tracking branch 'upstream/master'
athewsey Jul 16, 2021
37c9a7d
change: fix mxnet integration test (in the test)
athewsey Jul 16, 2021
c24a800
Merge remote-tracking branch 'verdi/pr-framework-processor' into feat…
athewsey Jul 16, 2021
d9c6560
Merge remote-tracking branch 'origin/master' into pr-framework-processor
Jul 16, 2021
caa3d3e
Merge branch 'master' into pr-framework-processor
ahsan-z-khan Jul 16, 2021
80d6013
Merge branch 'master' into pr-framework-processor
ahsan-z-khan Jul 19, 2021
750dfa2
Remove stale codes
Jul 20, 2021
0ff6b5c
Merge branch 'master' into pr-framework-processor
shreyapandit Jul 21, 2021
2e6d4d2
Merge branch 'master' into feat/fw-processor
athewsey Sep 7, 2021
eed868b
Merge branch 'master' into feat/fw-processor
athewsey Sep 13, 2021
0b13d13
Merge remote-tracking branch 'upstream/master' into feat/fw-processor
athewsey Sep 14, 2021
7de6b1a
doc(processing): remove 'new arg' comments
athewsey Sep 14, 2021
801a87a
fix: FrameworkProcessor to work with pipelines
athewsey Sep 14, 2021
872bf50
fix: FrameworkProcessor default code_location
athewsey Sep 14, 2021
6003037
Merge remote-tracking branch 'upstream/master' into feat/fw-processor…
athewsey Sep 20, 2021
74f5436
fix: HuggingFaceProcessor estimator kwargs
athewsey Sep 20, 2021
49c584c
fix: restore SKLearnProcessor positional args
athewsey Sep 20, 2021
6f3f57c
fix: SKLearnProcessor super() misuse
athewsey Sep 20, 2021
cf03e3a
fix(processing): resolve existing unit tests
athewsey Sep 20, 2021
d55aa82
style(processing): fix linting errors
athewsey Sep 20, 2021
ab921b8
change: restore SKLearn entrypoint customization
athewsey Sep 20, 2021
7eb4b8a
change(tests): delete unreachable line
athewsey Sep 20, 2021
03e7cee
fix: PySparkProcessor args pass-through to parent
athewsey Sep 20, 2021
0fd7f1d
change(tests): fix failing processing unit tests
athewsey Sep 20, 2021
5fc171b
doc: improve FrameworkProcessor guide docs
athewsey Sep 20, 2021
ff67bb3
change: add unit for framework ProcessingStep
athewsey Sep 20, 2021
051c502
feat: swap FrameworkProcessor bash shell to sh
athewsey Sep 21, 2021
2903f39
style: fix linting errors
athewsey Sep 21, 2021
d326c9f
Merge remote-tracking branch 'upstream/master' into feat/fw-processor…
athewsey Sep 21, 2021
0c07814
fix(DataWranglerProcessor): add source_dir arg
athewsey Sep 23, 2021
02e0db1
Merge remote-tracking branch 'upstream/master' into feat/fw-processor…
athewsey Sep 24, 2021
00d86ba
Merge branch 'master' into feat/fw-processor-normargs
jeniyat Oct 9, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions doc/amazon_sagemaker_processing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,22 @@ Then you can run a scikit-learn script ``preprocessing.py`` in a processing job.

preprocessing_job_description = sklearn_processor.jobs[-1].describe()

Instead of a single script file, you can submit an entire folder of code for a processing job, and optionally specify additional dependencies to be installed at job start-up by including a `requirements.txt` file.

To do this, also specify the ``source_dir`` parameter in ``.run()`` calls for :class:`SKLearnProcessor` or any other :class:`FrameworkProcessor`-based processor:

.. code:: python

sklearn_processor.run(
code="run.py", # 'processing/run.py' is the main script to run
source_dir="processing", # Upload the whole contents of 'processing/'

# If 'processing/requirements.txt' exists, the dependencies it specifies
# will be automatically installed before 'run.py' is started.
inputs=[...],
outputs=[...],
)

For an in-depth look, please see the `Scikit-learn Data Processing and Model Evaluation`_ example notebook.

.. _Scikit-learn Data Processing and Model Evaluation: https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker_processing/scikit_learn_data_processing_and_model_evaluation/scikit_learn_data_processing_and_model_evaluation.ipynb
Expand Down Expand Up @@ -220,6 +236,13 @@ For an in-depth look, please see the `Feature Transformation with Spark`_ exampl

.. _Feature Transformation with Spark: https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker_processing/feature_transformation_with_sagemaker_processing/feature_transformation_with_sagemaker_processing.ipynb

Data Processing with Other Frameworks
=====================================

:class:`FrameworkProcessor`-based classes are also provided for a range of other ML frameworks: For example PyTorch, TensorFlow, and MXNet.

You can use these to run data processing jobs in pre-built container environments, similarly to model training with :class:`Framework`-based Estimators.


Learn More
==========
Expand All @@ -229,12 +252,18 @@ Processing class documentation

- :class:`sagemaker.processing.Processor`
- :class:`sagemaker.processing.ScriptProcessor`
- :class:`sagemaker.processing.FrameworkProcessor`
- :class:`sagemaker.sklearn.processing.SKLearnProcessor`
- :class:`sagemaker.spark.processing.PySparkProcessor`
- :class:`sagemaker.spark.processing.SparkJarProcessor`
- :class:`sagemaker.processing.ProcessingInput`
- :class:`sagemaker.processing.ProcessingOutput`
- :class:`sagemaker.processing.ProcessingJob`
- :class:`sagemaker.huggingface.processing.HuggingFaceProcessor`
- :class:`sagemaker.mxnet.processing.MXNetProcessor`
- :class:`sagemaker.pytorch.processing.PyTorchProcessor`
- :class:`sagemaker.tensorflow.processing.TensorFlowProcessor`
- :class:`sagemaker.xgboost.processing.XGBoostProcessor`


Further documentation
Expand Down
12 changes: 10 additions & 2 deletions doc/frameworks/huggingface/sagemaker.huggingface.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,18 @@ Hugging Face Model
:undoc-members:
:show-inheritance:

HuggingFace Predictor
---------------------
Hugging Face Predictor
----------------------

.. autoclass:: sagemaker.huggingface.model.HuggingFacePredictor
:members:
:undoc-members:
:show-inheritance:

Hugging Face Processor
----------------------

.. autoclass:: sagemaker.huggingface.processing.HuggingFaceProcessor
:members:
:undoc-members:
:show-inheritance:
8 changes: 8 additions & 0 deletions doc/frameworks/mxnet/sagemaker.mxnet.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,11 @@ MXNet Predictor
:members:
:undoc-members:
:show-inheritance:

MXNet Processor
---------------------------

.. autoclass:: sagemaker.mxnet.processing.MXNetProcessor
:members:
:undoc-members:
:show-inheritance:
8 changes: 8 additions & 0 deletions doc/frameworks/pytorch/sagemaker.pytorch.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,11 @@ PyTorch Predictor
:members:
:undoc-members:
:show-inheritance:

PyTorch Processor
-----------------

.. autoclass:: sagemaker.pytorch.processing.PyTorchProcessor
:members:
:undoc-members:
:show-inheritance:
8 changes: 8 additions & 0 deletions doc/frameworks/tensorflow/sagemaker.tensorflow.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,11 @@ TensorFlow Serving Predictor
:members:
:undoc-members:
:show-inheritance:

TensorFlow Processor
--------------------

.. autoclass:: sagemaker.tensorflow.processing.TensorFlowProcessor
:members:
:undoc-members:
:show-inheritance:
5 changes: 5 additions & 0 deletions doc/frameworks/xgboost/xgboost.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,8 @@ The Amazon SageMaker XGBoost open source framework algorithm.
:members:
:undoc-members:
:show-inheritance:

.. autoclass:: sagemaker.xgboost.processing.XGBoostProcessor
:members:
:undoc-members:
:show-inheritance:
4 changes: 4 additions & 0 deletions src/sagemaker/huggingface/processing.py
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,8 @@ def _create_estimator(
source_dir=None,
dependencies=None,
git_config=None,
output_kms_key=None,
volume_kms_key=None,
):
"""Override default estimator factory function for HuggingFace's different parameters

Expand All @@ -121,6 +123,8 @@ def _create_estimator(
dependencies=dependencies,
git_config=git_config,
code_location=self.code_location,
output_kms_key=output_kms_key,
volume_kms_key=volume_kms_key,
enable_network_isolation=False,
image_uri=self.image_uri,
role=self.role,
Expand Down
6 changes: 3 additions & 3 deletions src/sagemaker/mxnet/processing.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,17 +28,17 @@ class MXNetProcessor(FrameworkProcessor):

def __init__(
self,
framework_version, # New arg
framework_version,
role,
instance_count,
instance_type,
py_version="py3", # New kwarg
py_version="py3",
image_uri=None,
command=None,
volume_size_in_gb=30,
volume_kms_key=None,
output_kms_key=None,
code_location=None, # New arg
code_location=None,
max_runtime_in_seconds=None,
base_job_name=None,
sagemaker_session=None,
Expand Down
Loading