Skip to content

fix: ModelReference deployment for Alt Configs models #4813

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Aug 1, 2024

Conversation

malav-shastri
Copy link
Collaborator

@malav-shastri malav-shastri commented Jul 30, 2024

Issue #, if available:
(Please check the internal ticket for the latest updates)

Description of changes:
We have identified that currently Model Ref deployment in CuratedHub is broken which looks like in general incompatibility of new QS code with CuratedHub Code path + some Schema parsing issues.

  • Parser issue: where is_hub_content flag is not set for the alt config parsing, resulting in failure when parsing the Alt config schema

  • Parser issue: inference_configs_dict is not being fetched correctly from the json object which results in AttributeError: 'JumpStartScriptScope' object has no attribute 'get'

  • hub_arn has not been passed for get_jumpstart_configs() resulting in empty hub_arn and code path for curatedHub is not getting executed.

  • when get_top_config_from_ranking() returns None there's no handling of None response in the _add_config_name_to_init_kwargs() function which results in key error.

Testing done:

  • Notebook testing, tried to deploy the ModelReference using the local changes

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

General

  • I have read the CONTRIBUTING doc
  • I certify that the changes I am introducing will be backward compatible, and I have discussed concerns about this, if any, with the Python SDK team
  • I used the commit message format described in CONTRIBUTING
  • I have passed the region in to all S3 and STS clients that I've initialized as part of this change.
  • I have updated any necessary documentation, including READMEs and API docs (if appropriate)

Tests

  • I have added tests that prove my fix is effective or that my feature works (if appropriate)
  • I have added unit and/or integration tests as appropriate to ensure backward compatibility of the changes
  • I have checked that my tests are not configured for a specific region or account (if appropriate)
  • I have used unique_name_from_base to create resource names in integ tests (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@malav-shastri malav-shastri requested a review from a team as a code owner July 30, 2024 18:36
@malav-shastri malav-shastri requested a review from nargokul July 30, 2024 18:36
if json_obj.get("inference_configs")
else None
)
inference_configs_dict: Optional[Dict[str, JumpStartMetadataConfig]] = json_obj[
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this works, now that the inference_configs_dict becomes a json object, how would we construct the JumpStartMetadataConfig class without parsing the json?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, also curious if we should do .get here also to mitigate future errors

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we reintroduce the code above, please create a dedicated utility & unit test separately.

dict comprehension coupled w/ ternary operator are cool, but they make the code much harder to read and maintain.

else None
)

if self._is_hub_content:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-blocking: since we are here, it would be good if we could update line 1792 for training configs, but doesn't have to be in this PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the callout, I added those changed as well

Copy link
Collaborator

@Captainia Captainia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add unittest coverage that in types.py, we should catch issues like this in the future

@malav-shastri
Copy link
Collaborator Author

Please add unittest coverage that in types.py, we should catch issues like this in the future

sure I'll have a followup PR for that right after this one, I believe we should be unblocking the customers here first

Copy link
Collaborator

@nileshvd nileshvd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont have context about this change. But since Captainia@ has already approved the change, and since this is for an active sev2 I am approving this change.

@nileshvd nileshvd merged commit 1b4dc7c into aws:master Aug 1, 2024
13 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants