Skip to content

doc: Support for generation of Jumpstart model table on build #2888

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 27 commits into from
Feb 8, 2022

Conversation

bencrabtree
Copy link
Collaborator

Issue #, if available:

Description of changes:
This PR augments the sphinx build process to add a script that generates an up-to-date table of available JumpStart model_ids in documentation. This is done by adding a setup function to docs/conf.py. This script will run each time the documentation is built and will pull the current models from JumpStarts S3 bucket (which is publicly readable).

Testing done:

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

General

  • I have read the CONTRIBUTING doc
  • I certify that the changes I am introducing will be backward compatible, and I have discussed concerns about this, if any, with the Python SDK team
  • I used the commit message format described in CONTRIBUTING
  • I have passed the region in to all S3 and STS clients that I've initialized as part of this change.
  • I have updated any necessary documentation, including READMEs and API docs (if appropriate)

Tests

  • I have added tests that prove my fix is effective or that my feature works (if appropriate)
  • I have added unit and/or integration tests as appropriate to ensure backward compatibility of the changes
  • I have checked that my tests are not configured for a specific region or account (if appropriate)
  • I have used unique_name_from_base to create resource names in integ tests (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: 4d0bfed
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: 4d0bfed
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: 4d0bfed
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

doc/conf.py Outdated
@@ -15,6 +15,10 @@

import pkg_resources
from datetime import datetime
import sys
import os
sys.path.append(os.path.join(os.path.dirname(__file__), '/doc/'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sphinx has issues importing modules from the file system. You have to explicitly define where it should look sphinx-doc/sphinx#2390

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: 1f9b88f
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: 1f9b88f
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: 1f9b88f
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

Copy link
Contributor

@JGuinegagne JGuinegagne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-blocking: a fast follow, could you add unit tests for doc/jumpstart/utils.py ?

def get_jumpstart_sdk_manifest():
s3_client = boto3.client("s3")
studio_metadata = (
s3_client.get_object(Bucket=JUMPSTART_BUCKET, Key=SDK_MANIFEST_FILE)["Body"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a text-book case of:

  • [PM/customer]: "it doesn't work"
  • [engineer]: "it works in my local environment"

Taking a step back: boto3 relies on botocore to make the http calls to the AWS service endpoints. botocore automatically adds identification headers to these requests. To populate these headers, botocore needs to access AWS credentials, generally from environment variables. Then the AWS services will use these headers to (1) authenticate and (2) authorize, ie check the IAM permissions the principals authenticated in (1) has against the permissions required by the request being made.

So here, you are implicitly making the assumption that (a) the execution environment of readthedocs where this code will be executed has AWS credentials, and (b) that those AWS credentials have S3:GetObject permission. If (a) is false, then the authentication will fail, most likely botocore will raise an exception even before sending the request. If (b) is false, the authorization will fail, and you will get a 403 error.

Luckily, our distribution buckets are world-readable, and for S3 you can use the public URL of the object, instead of going through the service endpoint. You could use wget, but the Pythonic way of doing this is to rely on a fetching library, most likely urllib. It would look something like:

from urllib import request

url = 'https://jumpstart-cache-prod-us-west-2.s3.us-west-2.amazonaws.com/models_manifest.json'
with request.openurl(url) as f:
  models_manifest = f.read().decode("utf-8")

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, this makes plenty of sense. Unfortunately there's not a great way to test in the production environment but I should've caught this.


def get_jumpstart_sdk_manifest():
s3_client = boto3.client("s3")
studio_metadata = (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well this is a fine misnomer, since this file is not the models metadata used in Studio.

Could you rename models_manifest?

@@ -0,0 +1,77 @@
from __future__ import absolute_import
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Do we need the legal comment at the top?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably


f = open("jumpstart/jumpstart.rst", "w")

f.write("==================================\n")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we write to the file each time, or can we instead build a list of strings, and do something like

lines_of_file = []
lines_of_file.append("==================================")
lines_of_file.append("JumpStart Available Model Table")
...
...
...
file_content = '\n'.join(lines_of_file)
f.write(file_content)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may not be necessary, the http calls to download the metadata are probably orders of magnitude longer to execute.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is personal preference- I'm not positive what the pythonic best practice here would be

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: f94245c
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

Copy link
Contributor

@JGuinegagne JGuinegagne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-blocking comments

import json
from packaging.version import Version

JUMPSTART_REGION = "eu-west-2"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-blocking: if you have time, please add a comment as to why we use DUB so that someone doesn't just change it back to us-west-2. Also, could you rename JUMPSTART_METADATA_REGION?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1


JUMPSTART_REGION = "eu-west-2"
SDK_MANIFEST_FILE = "models_manifest.json"
JUMPSTART_BUCKET_BASE_URL = 'https://jumpstart-cache-prod-{}.s3.{}.amazonaws.com'.format(JUMPSTART_REGION, JUMPSTART_REGION)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-blocking: f-string please

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: f94245c
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: f94245c
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: 858a0dc
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: 858a0dc
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: 858a0dc
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-notebook-tests
  • Commit ID: 858a0dc
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

mufaddal-rohawala and others added 3 commits February 3, 2022 20:39
Co-authored-by: Mufaddal Rohawala <[email protected]>
Co-authored-by: Basil Beirouti <[email protected]>
Co-authored-by: Payton Staub <[email protected]>
Co-authored-by: Ahsan Khan <[email protected]>
Co-authored-by: Shreya Pandit <[email protected]>
Co-authored-by: Mufaddal Rohawala <[email protected]>
Co-authored-by: Basil Beirouti <[email protected]>
Co-authored-by: Payton Staub <[email protected]>
Co-authored-by: Mohamed Ali Jamaoui <[email protected]>
Co-authored-by: ci <ci>
Co-authored-by: Jeniya Tabassum <[email protected]>
Co-authored-by: sreedes <[email protected]>
Co-authored-by: Navin Soni <[email protected]>
Co-authored-by: Miyoung <[email protected]>
Co-authored-by: Ameen Khan <[email protected]>
Co-authored-by: Zhankui Lu <[email protected]>
Co-authored-by: Xiaoguang Chen <[email protected]>
Co-authored-by: Jonathan Guinegagne <[email protected]>
Co-authored-by: Zhankui Lu <[email protected]>
Co-authored-by: Yifei Zhu <[email protected]>
Co-authored-by: Qingzi-Lan <[email protected]>
Co-authored-by: Navin Soni <[email protected]>
Co-authored-by: Xinghan Chen <[email protected]>
Co-authored-by: Ivy Bazan <[email protected]>
Co-authored-by: IvyBazan <[email protected]>
@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: f05b04e
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: f05b04e
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: f05b04e
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@mufaddal-rohawala mufaddal-rohawala force-pushed the dev branch 2 times, most recently from 96c01d7 to c437191 Compare February 5, 2022 00:08
@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: 7932372
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: 11aea41
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: 44ad05e
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: 11aea41
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: 7932372
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: 7932372
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: 44ad05e
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: 11aea41
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: 44ad05e
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-notebook-tests
  • Commit ID: 11aea41
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

if model["model_id"] not in sdk_manifest_top_versions_for_models:
sdk_manifest_top_versions_for_models[model["model_id"]] = model
else:
if Version(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can use elif

sdk_manifest = get_jumpstart_sdk_manifest()
sdk_manifest_top_versions_for_models = {}

for model in sdk_manifest:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Future improvement (non-blocking):
Do we know how much time is added to read and update read the docs for all the jumpstart models (I heard no of models are 300+)?
If it takes a significant time, alternatively we could have checked for jumpstart manifest hash (md5 or something) and compared against one stored in sagemaker defalt bucket and only update if hash mismatches.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen this take a max of 5 minutes as it stands. I agree we should definitely look to optimize this call, but we wanted to get P0 into production ASAP so customers could access the model table. We can discuss the implementation further at our next sync.

@shreyapandit shreyapandit changed the title doc: added support for generation of jumpstart model table on build doc: Support for generation of Jumpstart model table on build Feb 8, 2022
@shreyapandit shreyapandit merged commit 7ea4b80 into aws:dev Feb 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.