Skip to content

Sagemaker Config for SDK Defaults #3757

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 40 commits into from
Mar 29, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
cc090d4
feature: Added Config parser for SageMaker Python SDK (#840)
balajisankar15 Feb 21, 2023
6c9bc6a
intelligent defaults - tags and encryption (#842)
rubanh Feb 21, 2023
67e8d94
intelligent defaults - custom parameters and small fixes (#845)
rubanh Feb 21, 2023
67fc282
feature: Added support for VPC Config, EnableNetworkIsolation, KMS Ke…
balajisankar15 Feb 23, 2023
669e5a6
fix: Make Key, Value as required fields for each "Tags" entry in the …
Feb 23, 2023
29c9728
fix: Make 'role' as Optional for ModelQualityMonitor and DefaultModel…
balajisankar15 Feb 24, 2023
09cc530
Fix: Certain unit tests aren't passing sagemaker_session. Modify the …
balajisankar15 Feb 24, 2023
ce90eda
fix: Sagemaker Config - KeyError: 'MonitoringJobDefinition' in model_…
Feb 27, 2023
24b681f
change: Sagemaker Config - improved readability of print statements a…
Feb 27, 2023
6b6e44c
fix: Sagemaker Config - Reduce duplicate and misleading config-relate…
Feb 28, 2023
cb66d7b
fix: Sagemaker Config - add function description
Mar 6, 2023
ceecae5
fix: Sagemaker Config - Fix failing Integ tests, fix backwards incomp…
Mar 2, 2023
018d682
change: new integ test for sagemaker_config
Mar 4, 2023
514618b
fix: Sagemaker Config - fleshed out unit tests and fixed bugs
Mar 8, 2023
3dac2d3
fix: Sagemaker Config - Removed hard coded config values in the unit …
Mar 13, 2023
e62a378
fix: inject from config into existing ProductionVariants inside creat…
Mar 14, 2023
2baeab0
change: added unit test for verifying yaml safe_load method
Mar 15, 2023
43492a6
change: addressed PR comments for SageMaker Config
Mar 15, 2023
53ca7fc
change: Sagemaker Config - minor clarification
Mar 15, 2023
70056aa
change: ModelMonitoring and Processing now use helper methods for upd…
Mar 15, 2023
51190cc
change: Refactoring session.py and added additional schema validation…
Mar 15, 2023
dba07fa
update: expand one unit test
Mar 18, 2023
95bc7de
update: new integ test for cross context injection
Mar 18, 2023
19e185a
change: remove unwanted method and replace it with a different method…
Mar 20, 2023
3bd3a94
fix: Address documentation errors and removed unnecessary properties …
Mar 20, 2023
d4905c9
fix: moving certain config file helper methods to utils.py
Mar 21, 2023
f02131b
change: Add a separate helper to merge list of objects
Mar 21, 2023
ada7ddc
fix: Documentation updates for SageMakerConfig
Mar 21, 2023
1ec0563
fix: bubble up exceptions from S3 while fetching the Config
Mar 27, 2023
263594a
fix: Added additional test cases for config helper methods. Also made…
Mar 27, 2023
cd2181b
fix: small bug fix to print statements for update_list_of_dicts_with_…
Mar 27, 2023
6086451
fix: Replace SageMakerConfig class with just method invocations
Mar 27, 2023
d73bb97
fix: fix broken unit tests due to refactoring
Mar 28, 2023
fc18316
fix: bug where a user-provided sagemaker_config wasnt set
Mar 28, 2023
d8e33ea
change: rename fetch_sagemaker_config to load_sagemaker_config
Mar 28, 2023
523d01d
fix: update Schema to match exactly with APIs
Mar 28, 2023
99660ca
add documentation for default configuration support
Mar 28, 2023
17f290b
fix linting errors
Mar 28, 2023
dc38ce5
fix link lint
Mar 28, 2023
ee93e3f
fix lint
Mar 28, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions doc/api/utility/config.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Config
-------

.. automodule:: sagemaker.config.config
:members:
:undoc-members:
:show-inheritance:
572 changes: 572 additions & 0 deletions doc/overview.rst

Large diffs are not rendered by default.

3 changes: 3 additions & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,9 @@ def read_requirements(filename):
"pandas",
"pathos",
"schema",
"PyYAML==5.4.1",
"jsonschema",
"platformdirs",
]

# Specific use case dependencies
Expand Down
2 changes: 1 addition & 1 deletion src/sagemaker/algorithm.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ class AlgorithmEstimator(EstimatorBase):
def __init__(
self,
algorithm_arn: str,
role: str,
role: str = None,
instance_count: Optional[Union[int, PipelineVariable]] = None,
instance_type: Optional[Union[str, PipelineVariable]] = None,
volume_size: Union[int, PipelineVariable] = 30,
Expand Down
2 changes: 1 addition & 1 deletion src/sagemaker/amazon/amazon_estimator.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ class AmazonAlgorithmEstimatorBase(EstimatorBase):

def __init__(
self,
role: str,
role: Optional[Union[str, PipelineVariable]] = None,
instance_count: Optional[Union[int, PipelineVariable]] = None,
instance_type: Optional[Union[str, PipelineVariable]] = None,
data_location: Optional[str] = None,
Expand Down
4 changes: 2 additions & 2 deletions src/sagemaker/amazon/factorization_machines.py
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ class FactorizationMachines(AmazonAlgorithmEstimatorBase):

def __init__(
self,
role: str,
role: Optional[Union[str, PipelineVariable]] = None,
instance_count: Optional[Union[int, PipelineVariable]] = None,
instance_type: Optional[Union[str, PipelineVariable]] = None,
num_factors: Optional[int] = None,
Expand Down Expand Up @@ -326,7 +326,7 @@ class FactorizationMachinesModel(Model):
def __init__(
self,
model_data: Union[str, PipelineVariable],
role: str,
role: Optional[str] = None,
sagemaker_session: Optional[Session] = None,
**kwargs
):
Expand Down
4 changes: 2 additions & 2 deletions src/sagemaker/amazon/ipinsights.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ class IPInsights(AmazonAlgorithmEstimatorBase):

def __init__(
self,
role: str,
role: Optional[Union[str, PipelineVariable]] = None,
instance_count: Optional[Union[int, PipelineVariable]] = None,
instance_type: Optional[Union[str, PipelineVariable]] = None,
num_entity_vectors: Optional[int] = None,
Expand Down Expand Up @@ -229,7 +229,7 @@ class IPInsightsModel(Model):
def __init__(
self,
model_data: Union[str, PipelineVariable],
role: str,
role: Optional[str] = None,
sagemaker_session: Optional[Session] = None,
**kwargs
):
Expand Down
4 changes: 2 additions & 2 deletions src/sagemaker/amazon/kmeans.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ class KMeans(AmazonAlgorithmEstimatorBase):

def __init__(
self,
role: str,
role: Optional[Union[str, PipelineVariable]] = None,
instance_count: Optional[Union[int, PipelineVariable]] = None,
instance_type: Optional[Union[str, PipelineVariable]] = None,
k: Optional[int] = None,
Expand Down Expand Up @@ -255,7 +255,7 @@ class KMeansModel(Model):
def __init__(
self,
model_data: Union[str, PipelineVariable],
role: str,
role: Optional[str] = None,
sagemaker_session: Optional[Session] = None,
**kwargs
):
Expand Down
4 changes: 2 additions & 2 deletions src/sagemaker/amazon/knn.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ class KNN(AmazonAlgorithmEstimatorBase):

def __init__(
self,
role: str,
role: Optional[Union[str, PipelineVariable]] = None,
instance_count: Optional[Union[int, PipelineVariable]] = None,
instance_type: Optional[Union[str, PipelineVariable]] = None,
k: Optional[int] = None,
Expand Down Expand Up @@ -246,7 +246,7 @@ class KNNModel(Model):
def __init__(
self,
model_data: Union[str, PipelineVariable],
role: str,
role: Optional[str] = None,
sagemaker_session: Optional[Session] = None,
**kwargs
):
Expand Down
4 changes: 2 additions & 2 deletions src/sagemaker/amazon/lda.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ class LDA(AmazonAlgorithmEstimatorBase):

def __init__(
self,
role: str,
role: Optional[Union[str, PipelineVariable]] = None,
instance_type: Optional[Union[str, PipelineVariable]] = None,
num_topics: Optional[int] = None,
alpha0: Optional[float] = None,
Expand Down Expand Up @@ -230,7 +230,7 @@ class LDAModel(Model):
def __init__(
self,
model_data: Union[str, PipelineVariable],
role: str,
role: Optional[str] = None,
sagemaker_session: Optional[Session] = None,
**kwargs
):
Expand Down
4 changes: 2 additions & 2 deletions src/sagemaker/amazon/linear_learner.py
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,7 @@ class LinearLearner(AmazonAlgorithmEstimatorBase):

def __init__(
self,
role: str,
role: Optional[Union[str, PipelineVariable]] = None,
instance_count: Optional[Union[int, PipelineVariable]] = None,
instance_type: Optional[Union[str, PipelineVariable]] = None,
predictor_type: Optional[str] = None,
Expand Down Expand Up @@ -499,7 +499,7 @@ class LinearLearnerModel(Model):
def __init__(
self,
model_data: Union[str, PipelineVariable],
role: str,
role: Optional[str] = None,
sagemaker_session: Optional[Session] = None,
**kwargs
):
Expand Down
4 changes: 2 additions & 2 deletions src/sagemaker/amazon/ntm.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ class NTM(AmazonAlgorithmEstimatorBase):

def __init__(
self,
role: str,
role: Optional[Union[str, PipelineVariable]] = None,
instance_count: Optional[Union[int, PipelineVariable]] = None,
instance_type: Optional[Union[str, PipelineVariable]] = None,
num_topics: Optional[int] = None,
Expand Down Expand Up @@ -263,7 +263,7 @@ class NTMModel(Model):
def __init__(
self,
model_data: Union[str, PipelineVariable],
role: str,
role: Optional[str] = None,
sagemaker_session: Optional[Session] = None,
**kwargs
):
Expand Down
4 changes: 2 additions & 2 deletions src/sagemaker/amazon/object2vec.py
Original file line number Diff line number Diff line change
Expand Up @@ -153,7 +153,7 @@ class Object2Vec(AmazonAlgorithmEstimatorBase):

def __init__(
self,
role: str,
role: Optional[Union[str, PipelineVariable]] = None,
instance_count: Optional[Union[int, PipelineVariable]] = None,
instance_type: Optional[Union[str, PipelineVariable]] = None,
epochs: Optional[int] = None,
Expand Down Expand Up @@ -361,7 +361,7 @@ class Object2VecModel(Model):
def __init__(
self,
model_data: Union[str, PipelineVariable],
role: str,
role: Optional[str] = None,
sagemaker_session: Optional[Session] = None,
**kwargs
):
Expand Down
4 changes: 2 additions & 2 deletions src/sagemaker/amazon/pca.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ class PCA(AmazonAlgorithmEstimatorBase):

def __init__(
self,
role: str,
role: Optional[Union[str, PipelineVariable]] = None,
instance_count: Optional[Union[int, PipelineVariable]] = None,
instance_type: Optional[Union[str, PipelineVariable]] = None,
num_components: Optional[int] = None,
Expand Down Expand Up @@ -243,7 +243,7 @@ class PCAModel(Model):
def __init__(
self,
model_data: Union[str, PipelineVariable],
role: str,
role: Optional[str] = None,
sagemaker_session: Optional[Session] = None,
**kwargs
):
Expand Down
4 changes: 2 additions & 2 deletions src/sagemaker/amazon/randomcutforest.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ class RandomCutForest(AmazonAlgorithmEstimatorBase):

def __init__(
self,
role: str,
role: Optional[Union[str, PipelineVariable]] = None,
instance_count: Optional[Union[int, PipelineVariable]] = None,
instance_type: Optional[Union[str, PipelineVariable]] = None,
num_samples_per_tree: Optional[int] = None,
Expand Down Expand Up @@ -216,7 +216,7 @@ class RandomCutForestModel(Model):
def __init__(
self,
model_data: Union[str, PipelineVariable],
role: str,
role: Optional[str] = None,
sagemaker_session: Optional[Session] = None,
**kwargs
):
Expand Down
46 changes: 38 additions & 8 deletions src/sagemaker/automl/automl.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,16 @@

from sagemaker import Model, PipelineModel
from sagemaker.automl.candidate_estimator import CandidateEstimator
from sagemaker.config import (
AUTO_ML_ROLE_ARN_PATH,
AUTO_ML_KMS_KEY_ID_PATH,
AUTO_ML_VPC_CONFIG_PATH,
AUTO_ML_VOLUME_KMS_KEY_ID_PATH,
AUTO_ML_INTER_CONTAINER_ENCRYPTION_PATH,
)
from sagemaker.job import _Job
from sagemaker.session import Session
from sagemaker.utils import name_from_base
from sagemaker.utils import name_from_base, resolve_value_from_config
from sagemaker.workflow.entities import PipelineVariable
from sagemaker.workflow.pipeline_context import runnable_by_pipeline

Expand Down Expand Up @@ -98,15 +105,15 @@ class AutoML(object):

def __init__(
self,
role: str,
target_attribute_name: str,
role: Optional[str] = None,
target_attribute_name: str = None,
output_kms_key: Optional[str] = None,
output_path: Optional[str] = None,
base_job_name: Optional[str] = None,
compression_type: Optional[str] = None,
sagemaker_session: Optional[Session] = None,
volume_kms_key: Optional[str] = None,
encrypt_inter_container_traffic: Optional[bool] = False,
encrypt_inter_container_traffic: Optional[bool] = None,
vpc_config: Optional[Dict[str, List]] = None,
problem_type: Optional[str] = None,
max_candidates: Optional[int] = None,
Expand Down Expand Up @@ -176,14 +183,10 @@ def __init__(
Returns:
AutoML object.
"""
self.role = role
self.output_kms_key = output_kms_key
self.output_path = output_path
self.base_job_name = base_job_name
self.compression_type = compression_type
self.volume_kms_key = volume_kms_key
self.encrypt_inter_container_traffic = encrypt_inter_container_traffic
self.vpc_config = vpc_config
self.problem_type = problem_type
self.max_candidate = max_candidates
self.max_runtime_per_training_job_in_seconds = max_runtime_per_training_job_in_seconds
Expand All @@ -204,6 +207,31 @@ def __init__(
self._auto_ml_job_desc = None
self._best_candidate = None
self.sagemaker_session = sagemaker_session or Session()
self.vpc_config = resolve_value_from_config(
vpc_config, AUTO_ML_VPC_CONFIG_PATH, sagemaker_session=self.sagemaker_session
)
self.volume_kms_key = resolve_value_from_config(
volume_kms_key, AUTO_ML_VOLUME_KMS_KEY_ID_PATH, sagemaker_session=self.sagemaker_session
)
self.output_kms_key = resolve_value_from_config(
output_kms_key, AUTO_ML_KMS_KEY_ID_PATH, sagemaker_session=self.sagemaker_session
)
self.role = resolve_value_from_config(
role, AUTO_ML_ROLE_ARN_PATH, sagemaker_session=self.sagemaker_session
)
if not self.role:
# Originally IAM role was a required parameter.
# Now we marked that as Optional because we can fetch it from SageMakerConfig
# Because of marking that parameter as optional, we should validate if it is None, even
# after fetching the config.
raise ValueError("An AWS IAM role is required to create an AutoML job.")

self.encrypt_inter_container_traffic = resolve_value_from_config(
direct_input=encrypt_inter_container_traffic,
config_path=AUTO_ML_INTER_CONTAINER_ENCRYPTION_PATH,
default_value=False,
sagemaker_session=self.sagemaker_session,
)

self._check_problem_type_and_job_objective(self.problem_type, self.job_objective)

Expand Down Expand Up @@ -276,6 +304,8 @@ def attach(cls, auto_ml_job_name, sagemaker_session=None):
volume_kms_key=auto_ml_job_desc.get("AutoMLJobConfig", {})
.get("SecurityConfig", {})
.get("VolumeKmsKeyId"),
# Do not override encrypt_inter_container_traffic from config because this info
# is pulled from an existing automl job
encrypt_inter_container_traffic=auto_ml_job_desc.get("AutoMLJobConfig", {})
.get("SecurityConfig", {})
.get("EnableInterContainerTrafficEncryption", False),
Expand Down
37 changes: 30 additions & 7 deletions src/sagemaker/automl/candidate_estimator.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,14 @@
from __future__ import absolute_import

from six import string_types

from sagemaker import Session
from sagemaker.config import (
TRAINING_JOB_VPC_CONFIG_PATH,
TRAINING_JOB_VOLUME_KMS_KEY_ID_PATH,
TRAINING_JOB_INTER_CONTAINER_ENCRYPTION_PATH,
)
from sagemaker.session import Session
from sagemaker.job import _Job
from sagemaker.utils import name_from_base
from sagemaker.utils import name_from_base, resolve_value_from_config


class CandidateEstimator(object):
Expand Down Expand Up @@ -72,7 +76,8 @@ def fit(
inputs,
candidate_name=None,
volume_kms_key=None,
encrypt_inter_container_traffic=False,
# default of False for training job, checked inside function
encrypt_inter_container_traffic=None,
vpc_config=None,
wait=True,
logs=True,
Expand All @@ -87,7 +92,8 @@ def fit(
volume_kms_key (str): The KMS key id to encrypt data on the storage volume attached to
the ML compute instance(s).
encrypt_inter_container_traffic (bool): To encrypt all communications between ML compute
instances in distributed training. Default: False.
instances in distributed training. If not passed, will be fetched from
sagemaker_config if a value is defined there. Default: False.
vpc_config (dict): Specifies a VPC that jobs and hosted models have access to.
Control access to and from training and model containers by configuring the VPC
wait (bool): Whether the call should wait until all jobs completes (default: True).
Expand All @@ -99,7 +105,14 @@ def fit(
"""Logs can only be shown if wait is set to True.
Please either set wait to True or set logs to False."""
)

vpc_config = resolve_value_from_config(
vpc_config, TRAINING_JOB_VPC_CONFIG_PATH, sagemaker_session=self.sagemaker_session
)
volume_kms_key = resolve_value_from_config(
volume_kms_key,
TRAINING_JOB_VOLUME_KMS_KEY_ID_PATH,
sagemaker_session=self.sagemaker_session,
)
self.name = candidate_name or self.name
running_jobs = {}

Expand Down Expand Up @@ -131,12 +144,22 @@ def fit(
base_name = "sagemaker-automl-training-rerun"
step_name = name_from_base(base_name)
step["name"] = step_name

# Check training_job config not auto_ml_job config because this function calls
# training job API
_encrypt_inter_container_traffic = resolve_value_from_config(
direct_input=encrypt_inter_container_traffic,
config_path=TRAINING_JOB_INTER_CONTAINER_ENCRYPTION_PATH,
default_value=False,
sagemaker_session=self.sagemaker_session,
)

train_args = self._get_train_args(
desc,
channels,
step_name,
volume_kms_key,
encrypt_inter_container_traffic,
_encrypt_inter_container_traffic,
vpc_config,
)
self.sagemaker_session.train(**train_args)
Expand Down
4 changes: 2 additions & 2 deletions src/sagemaker/chainer/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -82,8 +82,8 @@ class ChainerModel(FrameworkModel):
def __init__(
self,
model_data: Union[str, PipelineVariable],
role: str,
entry_point: str,
role: Optional[str] = None,
entry_point: Optional[str] = None,
image_uri: Optional[Union[str, PipelineVariable]] = None,
framework_version: Optional[str] = None,
py_version: Optional[str] = None,
Expand Down
Loading