Skip to content

Bug fix for getting dataframes in TrainingJobAnalytics. #441

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 30 commits into from
Nov 14, 2018
Merged

Bug fix for getting dataframes in TrainingJobAnalytics. #441

merged 30 commits into from
Nov 14, 2018

Conversation

piyushadlakha
Copy link
Contributor

@piyushadlakha piyushadlakha commented Oct 25, 2018

Issue #, if available:

Description of changes:
Bug fix for getting dataframes in TrainingJobAnalytics.

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

  • I have read the CONTRIBUTING doc
  • I have added tests that prove my fix is effective or that my feature works (if appropriate)
  • I have updated the changelog with a description of my changes (if appropriate)
  • I have updated any necessary documentation (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Copy link
Contributor

@laurenyu laurenyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a unit test?

@@ -246,7 +246,12 @@ def _determine_timeinterval(self):
"""
description = self._sage_client.describe_training_job(TrainingJobName=self.name)
start_time = description[u'TrainingStartTime'] # datetime object
end_time = description.get(u'TrainingEndTime', datetime.datetime.utcnow())
# Incrementing end time by 1 min since cloud watch drops seconds before finding the logs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/cloud watch/CloudWatch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

# Incrementing end time by 1 min since cloud watch drops seconds before finding the logs.
# This results in logs being searched in the time range in which the correct log line was not present.
# Example - Log time - 2018-10-22 08:25:55
# Here calculated end time would also be 2018-10-22 08:25:55 (without 1 min addition)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: there's an extra space after "be"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@codecov-io
Copy link

codecov-io commented Oct 25, 2018

Codecov Report

Merging #441 into master will increase coverage by 0.27%.
The diff coverage is 94.18%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #441      +/-   ##
==========================================
+ Coverage   93.75%   94.02%   +0.27%     
==========================================
  Files          55       57       +2     
  Lines        4034     4269     +235     
==========================================
+ Hits         3782     4014     +232     
- Misses        252      255       +3
Impacted Files Coverage Δ
src/sagemaker/cli/tensorflow.py 50% <ø> (ø) ⬆️
src/sagemaker/transformer.py 100% <ø> (ø) ⬆️
src/sagemaker/cli/mxnet.py 100% <ø> (ø) ⬆️
src/sagemaker/fw_utils.py 100% <100%> (ø) ⬆️
src/sagemaker/predictor.py 96.77% <100%> (+0.17%) ⬆️
src/sagemaker/analytics.py 91.94% <100%> (ø) ⬆️
src/sagemaker/pytorch/estimator.py 100% <100%> (ø) ⬆️
src/sagemaker/tensorflow/predictor.py 95.83% <100%> (ø) ⬆️
src/sagemaker/chainer/estimator.py 100% <100%> (ø) ⬆️
src/sagemaker/amazon/kmeans.py 100% <100%> (ø) ⬆️
... and 20 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a5595af...25038d1. Read the comment docs.

@piyushadlakha
Copy link
Contributor Author

Unit test for this are already in place. We are in all cases mocking the calls to CW client and that mock is independent of the namespace.

@laurenyu
Copy link
Contributor

given the change in logic for calculating end time, it would be good to have a unit test for that

@piyushadlakha
Copy link
Contributor Author

UT's added.

laurenyu
laurenyu previously approved these changes Oct 26, 2018
mvsusp
mvsusp previously requested changes Oct 26, 2018
Copy link
Contributor

@mvsusp mvsusp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check our PR guidelines in description of the PR. Make sure that you provided unit tests and updated the Changelog.

Thanks

RodrigoAtAWS and others added 16 commits November 13, 2018 17:20
…#450)

* Add incremental training model parameters to Estimator, and bump library version to 1.13.0

* Update README.rst

Removed unnecessary comma from sentence.

* Update estimator.py

Removed duplicated line
output_path is now honored:
Can be either file:// or s3:// - This also changes the default
behavior of local mode to use the SDK provided default S3 bucket
if nothing is passed. This makes it easier for customers to create
models in SageMaker too since their Model Artifacts will already be
a tarfile in S3.

input_channel content_type is now honored in the same way as SageMaker. If it is not provided it is not passed to the container. Before we were always passing 'application/octet-stream'
* Make InputDataConfig optional for training.

* Update boto3 dependency to make sure boto support no InputDataConfig.

* Update changelog.

* Add missing assertion for chainer failure script test.
* add tensorflow serving container support
* add failure test case

* fix flaky assert
Add Pylint checking. Fixed all the current errors and warnings,
any future PRs will fail if they introduce any pylint error/warnings.
docker-compose 1.23 has moved to a newer version of requests
which allows us to remove the upper bound on urllib3 that
was causing a lot of problems for everyone.
* add tensorflow serving docs

* add content_type to tensorflow.serving.Predictor

* support CustomAttributes in local mode
* Better documentation comment on DeferredError.

* Using Napolean-style docstring formatting for example code.

* Fixup flake8 trailing whitespace.
@laurenyu
Copy link
Contributor

can you check the unit tests? four are failing with:

E       RemovedInPytest4Warning: Fixture "sagemaker_session" called directly. Fixtures are not meant to be called directly, are created automatically when test functions request them as parameters. See https://docs.pytest.org/en/latest/fixture.html for more information.

@laurenyu laurenyu dismissed mvsusp’s stale review November 14, 2018 23:13

talked to @mvsusp offline

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.