Skip to content
This repository was archived by the owner on Jun 15, 2023. It is now read-only.

update debugger Lambda function in doc, as in sm-examples #99

Closed
wants to merge 2 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 29 additions & 9 deletions doc_source/debugger-cloudwatch-lambda.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,27 +63,43 @@ The following figure shows an example of the **Create function** page with the i
import json
import boto3
import logging


logger = logging.getLogger()
logger.setLevel(logging.INFO)


def lambda_handler(event, context):
training_job_name = event.get("detail").get("TrainingJobName")
logging.info(f'Evaluating Debugger rules for training job: {training_job_name}')

eval_statuses = event.get("detail").get("DebugRuleEvaluationStatuses", None)

if eval_statuses is None or len(eval_statuses) == 0:
logging.info("Couldn't find any debug rule statuses, skipping...")
return {
'statusCode': 200,
'body': json.dumps('Nothing to do')
}


# should only attempt stopping jobs with InProgress status
training_job_status = event.get("detail").get("TrainingJobStatus", None)
if training_job_status != 'InProgress':
logging.debug(f"Current Training job status({training_job_status}) is not 'InProgress'. Exiting")
return {
'statusCode': 200,
'body': json.dumps('Nothing to do')
}

client = boto3.client('sagemaker')

for status in eval_statuses:
logging.info(status.get("RuleEvaluationStatus") + ', RuleEvaluationStatus=' + str(status))
if status.get("RuleEvaluationStatus") == "IssuesFound":
secondary_status = event.get("detail").get("SecondaryStatus", None)
logging.info(
'Evaluation of rule configuration {} resulted in "IssuesFound". '
'Attempting to stop training job {}'.format(
status.get("RuleConfigurationName"), training_job_name
)
f'About to stop training job, since evaluation of rule configuration {status.get("RuleConfigurationName")} resulted in "IssuesFound". ' +
f'\ntraining job "{training_job_name}" status is "{training_job_status}", secondary status is "{secondary_status}"' +
f'\nAttempting to stop training job "{training_job_name}"'
)
try:
client.stop_training_job(
Expand All @@ -102,6 +118,10 @@ The following figure shows an example of the **Create function** page with the i

For more information about the Lambda code editor interface, see [Creating functions using the AWS Lambda console editor](https://docs.aws.amazon.com/lambda/latest/dg/code-editor.html)\.

1. Create a new execution role for the Lambda, and in your IAM console, search for the role and attach "AmazonSageMakerFullAccess" policy to the role. This is needed for the code in your Lambda function to stop the training job\.

1. Basic settings > set Timeout to 30 seconds instead of 3 seconds\.

1. Skip all other settings and choose **Save** at the top of the configuration page\.

### Step 3: Create a CloudWatch Events Rule and Link to the Lambda Function for Debugger<a name="debugger-cloudwatch-events"></a>
Expand Down Expand Up @@ -150,4 +170,4 @@ You can run the following example notebooks, which are prepared for experimentin

## Disable the CloudWatch Events Rule to Stop Using the Automated Training Job Termination<a name="debugger-disable-cw"></a>

If you want to disable the automated training job termination, you need to disable the CloudWatch Events rule\. In the Lambda **Designer** panel, choose the **EventBridge \(CloudWatch Events\)** block linked to the Lambda function\. This shows an **EventBridge** panel below the **Designer** panel \(for example, see the previous screen shot\)\. Select the check box next to **EventBridge \(CloudWatch Events\): debugger\-cw\-event\-rule**, and then choose **Disable**\. If you want to use the automated termination functionality later, you can enable the CloudWatch Events rule again\.
If you want to disable the automated training job termination, you need to disable the CloudWatch Events rule\. In the Lambda **Designer** panel, choose the **EventBridge \(CloudWatch Events\)** block linked to the Lambda function\. This shows an **EventBridge** panel below the **Designer** panel \(for example, see the previous screen shot\)\. Select the check box next to **EventBridge \(CloudWatch Events\): debugger\-cw\-event\-rule**, and then choose **Disable**\. If you want to use the automated termination functionality later, you can enable the CloudWatch Events rule again\.