Bug fix for getting dataframes in TrainingJobAnalytics. #441

piyushadlakha · 2018-10-25T22:13:32Z

Issue #, if available:

Description of changes:
Bug fix for getting dataframes in TrainingJobAnalytics.

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

I have read the CONTRIBUTING doc
I have added tests that prove my fix is effective or that my feature works (if appropriate)
I have updated the changelog with a description of my changes (if appropriate)
I have updated any necessary documentation (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

laurenyu

can you add a unit test?

laurenyu · 2018-10-25T22:16:38Z

src/sagemaker/analytics.py

@@ -246,7 +246,12 @@ def _determine_timeinterval(self):
        """
        description = self._sage_client.describe_training_job(TrainingJobName=self.name)
        start_time = description[u'TrainingStartTime']  # datetime object
-        end_time = description.get(u'TrainingEndTime', datetime.datetime.utcnow())
+        # Incrementing end time by 1 min since cloud watch drops seconds before finding the logs.


s/cloud watch/CloudWatch

laurenyu · 2018-10-25T22:16:55Z

src/sagemaker/analytics.py

+        # Incrementing end time by 1 min since cloud watch drops seconds before finding the logs.
+        # This results in logs being searched in the time range in which the correct log line was not present.
+        # Example - Log time - 2018-10-22 08:25:55
+        #           Here calculated end time would also be  2018-10-22 08:25:55 (without 1 min addition)


nitpick: there's an extra space after "be"

codecov-io · 2018-10-25T22:18:44Z

Codecov Report

Merging #441 into master will increase coverage by 0.27%.
The diff coverage is 94.18%.

@@            Coverage Diff             @@
##           master     #441      +/-   ##
==========================================
+ Coverage   93.75%   94.02%   +0.27%     
==========================================
  Files          55       57       +2     
  Lines        4034     4269     +235     
==========================================
+ Hits         3782     4014     +232     
- Misses        252      255       +3

Impacted Files	Coverage Δ
src/sagemaker/cli/tensorflow.py	`50% <ø> (ø)`	⬆️
src/sagemaker/transformer.py	`100% <ø> (ø)`	⬆️
src/sagemaker/cli/mxnet.py	`100% <ø> (ø)`	⬆️
src/sagemaker/fw_utils.py	`100% <100%> (ø)`	⬆️
src/sagemaker/predictor.py	`96.77% <100%> (+0.17%)`	⬆️
src/sagemaker/analytics.py	`91.94% <100%> (ø)`	⬆️
src/sagemaker/pytorch/estimator.py	`100% <100%> (ø)`	⬆️
src/sagemaker/tensorflow/predictor.py	`95.83% <100%> (ø)`	⬆️
src/sagemaker/chainer/estimator.py	`100% <100%> (ø)`	⬆️
src/sagemaker/amazon/kmeans.py	`100% <100%> (ø)`	⬆️
... and 20 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a5595af...25038d1. Read the comment docs.

…Analytics

piyushadlakha · 2018-10-25T22:25:09Z

Unit test for this are already in place. We are in all cases mocking the calls to CW client and that mock is independent of the namespace.

laurenyu · 2018-10-25T22:27:03Z

given the change in logic for calculating end time, it would be good to have a unit test for that

piyushadlakha · 2018-10-25T23:48:21Z

UT's added.

mvsusp

Please check our PR guidelines in description of the PR. Make sure that you provided unit tests and updated the Changelog.

Thanks

src/sagemaker/analytics.py

…Analytics.

…#450) * Add incremental training model parameters to Estimator, and bump library version to 1.13.0 * Update README.rst Removed unnecessary comma from sentence. * Update estimator.py Removed duplicated line

output_path is now honored: Can be either file:// or s3:// - This also changes the default behavior of local mode to use the SDK provided default S3 bucket if nothing is passed. This makes it easier for customers to create models in SageMaker too since their Model Artifacts will already be a tarfile in S3. input_channel content_type is now honored in the same way as SageMaker. If it is not provided it is not passed to the container. Before we were always passing 'application/octet-stream'

* Make InputDataConfig optional for training. * Update boto3 dependency to make sure boto support no InputDataConfig. * Update changelog. * Add missing assertion for chainer failure script test.

* add tensorflow serving container support

* add failure test case * fix flaky assert

Add Pylint checking. Fixed all the current errors and warnings, any future PRs will fail if they introduce any pylint error/warnings.

docker-compose 1.23 has moved to a newer version of requests which allows us to remove the upper bound on urllib3 that was causing a lot of problems for everyone.

* add tensorflow serving docs * add content_type to tensorflow.serving.Predictor * support CustomAttributes in local mode

* Better documentation comment on DeferredError. * Using Napolean-style docstring formatting for example code. * Fixup flake8 trailing whitespace.

This is in accordance with our new strategy around not making framework_version completely mandatory.

…474)

laurenyu · 2018-11-14T19:13:59Z

can you check the unit tests? four are failing with:

E       RemovedInPytest4Warning: Fixture "sagemaker_session" called directly. Fixtures are not meant to be called directly, are created automatically when test functions request them as parameters. See https://docs.pytest.org/en/latest/fixture.html for more information.

@mvsusp

talked to @mvsusp offline

Bug fix for getting dataframes in TrainingJobAnalytics.

249e0d6

laurenyu reviewed Oct 25, 2018

View reviewed changes

Addressing comments for Bug fix for getting dataframes in TrainingJob…

8401afa

…Analytics

Unit tests for Bug fix for getting dataframes in TrainingJobAnalytics.

42d2741

laurenyu previously approved these changes Oct 26, 2018

View reviewed changes

Merge branch 'master' into TrainingJobAnalytics

1640264

mvsusp previously requested changes Oct 26, 2018

View reviewed changes

src/sagemaker/analytics.py Show resolved Hide resolved

updating change log for Bug fix for getting dataframes in TrainingJob…

d43047d

…Analytics.

piyushadlakha dismissed laurenyu’s stale review via d43047d November 4, 2018 09:32

laurenyu mentioned this pull request Nov 13, 2018

Updating Cloudwatch namespace for metrics in TrainingJobsAnalytics #473

Merged

4 tasks

RodrigoAtAWS and others added 16 commits November 13, 2018 17:20

Add model parameters to Estimator, and bump library version to 1.13.0 (…

c55e7ca

…#450) * Add incremental training model parameters to Estimator, and bump library version to 1.13.0 * Update README.rst Removed unnecessary comma from sentence. * Update estimator.py Removed duplicated line

Add image URIs for built-in Algorithms for SIN/LHR/BOM/SFO/YUL (#456)

3d103b4

Support MXNet 1.3 with its training script format changes (#446)

23851b1

Make InputDataConfig optional for training. (#459)

bd1f43b

* Make InputDataConfig optional for training. * Update boto3 dependency to make sure boto support no InputDataConfig. * Update changelog. * Add missing assertion for chainer failure script test.

add tfs container support (#460)

2edcc3a

* add tensorflow serving container support

simplify create_image_uri function (#462)

10b7e42

* add failure test case * fix flaky assert

bump version to 1.14.0 (#463)

e8bf717

Skip gpu tests in regions without ml.p2.xlarge (#461)

eea2ad5

Adding Object2Vec support to SageMaker Python SDK (#467)

c8147a9

fix readme rendering (#464)

52d1ec4

Add Pylint (#465)

3a69cf6

Add Pylint checking. Fixed all the current errors and warnings, any future PRs will fail if they introduce any pylint error/warnings.

Support optional input channels in local mode. (#466)

9a997a5

build: upgrade docker-compose to 1.23 (#470)

6a1d93c

docker-compose 1.23 has moved to a newer version of requests which allows us to remove the upper bound on urllib3 that was causing a lot of problems for everyone.

add tensorflow serving docs (#468)

876287e

* add tensorflow serving docs * add content_type to tensorflow.serving.Predictor * support CustomAttributes in local mode

Very minor: Better documentation comment on DeferredError. (#469)

0c4ad9c

* Better documentation comment on DeferredError. * Using Napolean-style docstring formatting for example code. * Fixup flake8 trailing whitespace.

laurenyu and others added 8 commits November 13, 2018 17:20

Update empty framework_version warning (#472)

1df9317

This is in accordance with our new strategy around not making framework_version completely mandatory.

Remove hardcoded 'training' in error message for checking job status (#…

a9ed02e

…474)

Bump version to 1.14.2 (#477)

3fd5d9a

Add missing changelog entry for 1.14.2 (#479)

ce1de18

Bug fix for getting dataframes in TrainingJobAnalytics.

aaf59b9

Resolving conflits in analytics

25038d1

Merge branch 'master' into TrainingJobAnalytics

2f425ab

Updating change log for version 1.14.2

93e3e48

Merge branch 'master' into TrainingJobAnalytics

c88611b

laurenyu approved these changes Nov 14, 2018

View reviewed changes

laurenyu merged commit 5db2603 into aws:master Nov 14, 2018

ChoiByungWook pushed a commit that referenced this pull request Dec 8, 2020

feature: add model parallelism support (#441)

bbb3f16

icywang86rui mentioned this pull request Jan 7, 2021

cannot recognize num_gpus for more than 1 gpu per instance aws/sagemaker-pytorch-training-toolkit#222

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug fix for getting dataframes in TrainingJobAnalytics. #441

Bug fix for getting dataframes in TrainingJobAnalytics. #441

Uh oh!

piyushadlakha commented Oct 25, 2018 •

edited

Loading

Uh oh!

laurenyu left a comment

Uh oh!

laurenyu Oct 25, 2018

Uh oh!

piyushadlakha Oct 25, 2018

Uh oh!

laurenyu Oct 25, 2018

Uh oh!

piyushadlakha Oct 25, 2018

Uh oh!

codecov-io commented Oct 25, 2018 •

edited

Loading

Uh oh!

piyushadlakha commented Oct 25, 2018

Uh oh!

laurenyu commented Oct 25, 2018

Uh oh!

piyushadlakha commented Oct 25, 2018

Uh oh!

mvsusp left a comment

Uh oh!

Uh oh!

laurenyu commented Nov 14, 2018

Uh oh!

Uh oh!

Bug fix for getting dataframes in TrainingJobAnalytics. #441

Bug fix for getting dataframes in TrainingJobAnalytics. #441

Uh oh!

Conversation

piyushadlakha commented Oct 25, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge Checklist

Uh oh!

laurenyu left a comment

Choose a reason for hiding this comment

Uh oh!

laurenyu Oct 25, 2018

Choose a reason for hiding this comment

Uh oh!

piyushadlakha Oct 25, 2018

Choose a reason for hiding this comment

Uh oh!

laurenyu Oct 25, 2018

Choose a reason for hiding this comment

Uh oh!

piyushadlakha Oct 25, 2018

Choose a reason for hiding this comment

Uh oh!

codecov-io commented Oct 25, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

piyushadlakha commented Oct 25, 2018

Uh oh!

laurenyu commented Oct 25, 2018

Uh oh!

piyushadlakha commented Oct 25, 2018

Uh oh!

mvsusp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

laurenyu commented Nov 14, 2018

Uh oh!

Uh oh!

piyushadlakha commented Oct 25, 2018 •

edited

Loading

codecov-io commented Oct 25, 2018 •

edited

Loading