Skip to content

documentation: add dataset_definition to processing page #2589

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Aug 24, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion doc/api/utility/inputs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,8 @@ Inputs
:members:
:undoc-members:
:show-inheritance:
:noindex:

.. automodule:: sagemaker.dataset_definition.inputs
:members:
:undoc-members:
:show-inheritance:
13 changes: 6 additions & 7 deletions src/sagemaker/dataset_definition/inputs.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ class RedshiftDatasetDefinition(ApiObject):

With this input, SQL queries will be executed using Redshift to generate datasets to S3.

Attributes:
Parameters:
cluster_id (str): The Redshift cluster Identifier.
database (str): The name of the Redshift database used in Redshift query execution.
db_user (str): The database user name used in Redshift query execution.
Expand Down Expand Up @@ -60,7 +60,7 @@ class AthenaDatasetDefinition(ApiObject):

With this input, SQL queries will be executed using Athena to generate datasets to S3.

Attributes:
Parameters:
catalog (str): The name of the data catalog used in Athena query execution.
database (str): The name of the database used in the Athena query execution.
query_string (str): The SQL query statements, to be executed.
Expand All @@ -87,7 +87,7 @@ class AthenaDatasetDefinition(ApiObject):
class DatasetDefinition(ApiObject):
"""DatasetDefinition input.

Attributes:
Parameters:
data_distribution_type (str): Whether the generated dataset is FullyReplicated or
ShardedByS3Key (default).
input_mode (str): Whether to use File or Pipe input mode. In File (default) mode, Amazon
Expand All @@ -98,9 +98,8 @@ class DatasetDefinition(ApiObject):
local_path (str): The local path where you want Amazon SageMaker to download the Dataset
Definition inputs to run a processing job. LocalPath is an absolute path to the input
data. This is a required parameter when `AppManaged` is False (default).
redshift_dataset_definition
(:class:`~sagemaker.dataset_definition.inputs.RedshiftDatasetDefinition`): Redshift
dataset definition.
redshift_dataset_definition (:class:`~sagemaker.dataset_definition.inputs.RedshiftDatasetDefinition`):
Configuration for Redshift Dataset Definition input.
athena_dataset_definition (:class:`~sagemaker.dataset_definition.inputs.AthenaDatasetDefinition`):
Configuration for Athena Dataset Definition input.
"""
Expand All @@ -126,7 +125,7 @@ class S3Input(ApiObject):
S3 list operations are not strongly consistent.
Use ManifestFile if strong consistency is required.

Attributes:
Parameters:
s3_uri (str): the path to a specific S3 object or a S3 prefix
local_path (str): the path to a local directory. If not provided, skips data download
by SageMaker platform.
Expand Down
2 changes: 1 addition & 1 deletion src/sagemaker/inputs.py
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@ def __init__(self, seed):
class CreateModelInput(object):
"""A class containing parameters which can be used to create a SageMaker Model

Attributes:
Parameters:
instance_type (str): type or EC2 instance will be used for model deployment.
accelerator_type (str): elastic inference accelerator type.
"""
Expand Down