Skip to content

Wrong notebook being run in pipeline when creating NotebookJobStep with shared environment_variables dict #4856

Closed
@fakio

Description

@fakio

Describe the bug
When I create a Pipeline with two NotebookJobStep steps and both steps were created using the same dict as environment_variables parameter the first step is run with the second input notebook isntead of its own input notebook.

To reproduce

env_vars = {
    'test': 'test',
}
steps = [
    NotebookJobStep(
        image_uri="885854791233.dkr.ecr.us-east-1.amazonaws.com/sagemaker-distribution-prod:1-cpu",
        kernel_name="python3",
        input_notebook="job1.ipynb",
        initialization_script="setup.sh",
        environment_variables=env_vars,
    ),
    NotebookJobStep(
        image_uri="885854791233.dkr.ecr.us-east-1.amazonaws.com/sagemaker-distribution-prod:1-cpu",
        kernel_name="python3",
        input_notebook="job2.ipynb",
        initialization_script="setup.sh",
        environment_variables=env_vars,
    ),
]
pipeline = Pipeline(
    name="pipeline",
    steps=steps,
)
pipeline.upsert(role_arn=role)
execution = pipeline.start()

The problem seems the env vars for each step:

print(json.loads(pipeline.definition())["Steps"][0]["Arguments"]["Environment"])
{
'test': 'test',
'AWS_DEFAULT_REGION': 'us-east-1',
'SM_JOB_DEF_VERSION': '1.0',
'SM_ENV_NAME': 'sagemaker-default-env',
'SM_SKIP_EFS_SIMULATION': 'true',
'SM_EXECUTION_INPUT_PATH': '/opt/ml/input/data/sagemaker_headless_execution_pipelinestep',
'SM_KERNEL_NAME': 'python3',
'SM_INPUT_NOTEBOOK_NAME': 'job2.ipynb', <<==== wrong input
'SM_OUTPUT_NOTEBOOK_NAME': 'job2-ipynb-2024-08-29-15-04-49-575.ipynb',
'SM_INIT_SCRIPT': 'setup.sh'
}

print(json.loads(pipeline.definition())["Steps"][1]["Arguments"]["Environment"])
{
'test': 'test',
'AWS_DEFAULT_REGION': 'us-east-1',
'SM_JOB_DEF_VERSION': '1.0',
'SM_ENV_NAME': 'sagemaker-default-env',
'SM_SKIP_EFS_SIMULATION': 'true',
'SM_EXECUTION_INPUT_PATH': '/opt/ml/input/data/sagemaker_headless_execution_pipelinestep',
'SM_KERNEL_NAME': 'python3',
'SM_INPUT_NOTEBOOK_NAME': 'job2.ipynb',
'SM_OUTPUT_NOTEBOOK_NAME': 'job2-ipynb-2024-08-29-15-04-49-575.ipynb',
'SM_INIT_SCRIPT': 'setup.sh'
}

Expected behavior
Run job1.ipynb and job2.ipynb in each step.

Screenshots or logs

Screenshot of notebook jobs in Studio UI:

image

System information
A description of your system. Please provide:

  • SageMaker Python SDK version: 2.226.1
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans):
  • Framework version:
  • Python version: 3.8.18
  • CPU or GPU: CPU
  • Custom Docker image (Y/N): N

Metadata

Metadata

Assignees

Labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions