Update unit tests of kmeans, pca, factorization machines, lda and ntm #103

yangaws · 2018-03-20T07:03:01Z

Description:
1, Remove unused codes in unit tests of ntm and lda
2, Add more tests in unit test of factorization machines
3, Add pca and kmeans unit tests
4, Fix type of hyper-parameter tol in kmeans
5, Add missing hyper-parameter eval_metrics in kmeans
6, Remove some tests in test_amazon_estimator since they are in unit test of pca
7, Use validator function in pca instead of lambda

Test:
tox tests/unit passed

codecov-io · 2018-03-20T07:06:53Z

Codecov Report

Merging #103 into master will increase coverage by 1.57%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #103      +/-   ##
==========================================
+ Coverage   89.74%   91.32%   +1.57%     
==========================================
  Files          34       34              
  Lines        2039     2040       +1     
==========================================
+ Hits         1830     1863      +33     
+ Misses        209      177      -32

Impacted Files	Coverage Δ
src/sagemaker/amazon/pca.py	`100% <100%> (+17.14%)`	⬆️
src/sagemaker/amazon/linear_learner.py	`100% <100%> (ø)`	⬆️
src/sagemaker/amazon/kmeans.py	`100% <100%> (+42.55%)`	⬆️
src/sagemaker/amazon/factorization_machines.py	`100% <0%> (+8.45%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update da14a6b...19f4640. Read the comment docs.

winstonaws · 2018-03-20T17:23:18Z

src/sagemaker/amazon/kmeans.py

    num_trials = hp('local_lloyd_num_trials', gt(0), 'An integer greater-than 0', int)
    local_init_method = hp('local_lloyd_init_method', isin('random', 'kmeans++'), 'One of "random", "kmeans++"', str)
    half_life_time_size = hp('half_life_time_size', ge(0), 'An integer greater-than-or-equal-to 0', int)
    epochs = hp('epochs', gt(0), 'An integer greater-than 0', int)
    center_factor = hp('extra_center_factor', gt(0), 'An integer greater-than 0', int)
+    eval_metrics = hp(name='eval_metrics', validation_message='A comma separated list of "msd" or "ssd"',


Where did you get comma separated list from? The API docs seem to imply just one value: https://docs.aws.amazon.com/sagemaker/latest/dg/k-means-api-config.html

If the API docs are wrong, can you ask the algorithms team to fix the docs?

I checked with alg owner. This 'eval_metrics' should be a list. I have asked them to update the api doc.

winstonaws · 2018-03-20T17:25:49Z

src/sagemaker/amazon/pca.py

-    algorithm_mode = hp(name='algorithm_mode', validate=lambda x: x in ['regular', 'stable', 'randomized'],
-                        validation_message='Value must be one of "regular", "stable", "randomized"', data_type=str)
+    num_components = hp('num_components', gt(0), 'Value must be an integer greater than zero', int)
+    algorithm_mode = hp('algorithm_mode', isin('regular', 'stable', 'randomized'),


Same here: https://docs.aws.amazon.com/sagemaker/latest/dg/PCA-reference.html

I checked with alg owner. The 'stable' value is not supported for now. I have removed it.

winstonaws · 2018-03-20T17:26:52Z

src/sagemaker/amazon/pca.py

    subtract_mean = hp(name='subtract_mean', validation_message='Value must be a boolean', data_type=bool)
-    extra_components = hp(name='extra_components', validate=lambda x: x >= 0,
-                          validation_message="Value must be an integer greater than or equal to 0", data_type=int)
+    extra_components = hp('extra_components', ge(0), "Value must be an integer greater than or equal to 0", int)


Maybe edit the description to state that you should leave this unset if you want the behavior for -1: https://docs.aws.amazon.com/sagemaker/latest/dg/PCA-reference.html

Thanks. I removed the validator for behavior of -1.

winstonaws · 2018-03-20T17:32:02Z

tests/unit/test_fm.py

@@ -94,3 +108,282 @@ def test_all_hyperparameters(sagemaker_session):
 def test_image(sagemaker_session):
    fm = FactorizationMachines(sagemaker_session=sagemaker_session, **ALL_REQ_ARGS)
    assert fm.train_image() == registry(REGION) + '/factorization-machines:1'
+
+
+def test_num_factors_validation_fail_type(sagemaker_session):


I'd recommend using parameterized test functions to make these tests significantly more concise for each: https://docs.pytest.org/en/latest/parametrize.html

We can chat about the best way to do this if you like.

As discussed offline, all hyper-parameter related unit tests are now in parametrized style.

winstonaws · 2018-03-20T17:33:01Z

tests/unit/test_fm.py

+    assert base_fit.call_args[0][1] == MINI_BATCH_SIZE
+
+
+def test_call_fit_none_mini_batch_size(sagemaker_session):


What is this test asserting on? And does the base fit need to be patched? (Same question about patch applies to the tests below as well)

This test doesn't assert. It tests whether fit runs successfully(no exception) when no mini_batch_size is given(for this alg, mini_batch_size should have default value).

Other tests below this are similar, just test different case for mini_batch_size passed to fit(). Usually, for one algorithm, there are several cases for mini_batch_size. With default value? With valid range? Required?

Update unit tests of kmeans, pca, factorization machines, lda and ntm

09f70e2

update change log

a4dbd15

yangaws requested review from iquintero, laurenyu, winstonaws, andremoeller and ChoiByungWook March 20, 2018 07:12

winstonaws suggested changes Mar 20, 2018

View reviewed changes

yangaws and others added 3 commits March 22, 2018 00:03

modify unit test style

ef3e66a

Merge branch 'master' into unittest

ea59ae8

fix doc

b495043

winstonaws previously approved these changes Mar 22, 2018

View reviewed changes

update linear learner integ tests

034583d

yangaws dismissed winstonaws’s stale review via 034583d March 22, 2018 22:30

winstonaws approved these changes Mar 22, 2018

View reviewed changes

Merge branch 'master' into unittest

19f4640

yangaws merged commit 447197e into aws:master Mar 23, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update unit tests of kmeans, pca, factorization machines, lda and ntm #103

Update unit tests of kmeans, pca, factorization machines, lda and ntm #103

Uh oh!

yangaws commented Mar 20, 2018

Uh oh!

codecov-io commented Mar 20, 2018 •

edited

Loading

Uh oh!

winstonaws Mar 20, 2018

Uh oh!

yangaws Mar 22, 2018

Uh oh!

winstonaws Mar 20, 2018

Uh oh!

yangaws Mar 22, 2018

Uh oh!

winstonaws Mar 20, 2018

Uh oh!

yangaws Mar 22, 2018 •

edited

Loading

Uh oh!

winstonaws Mar 20, 2018

Uh oh!

yangaws Mar 22, 2018

Uh oh!

winstonaws Mar 20, 2018

Uh oh!

yangaws Mar 22, 2018

Uh oh!

Uh oh!

		assert base_fit.call_args[0][1] == MINI_BATCH_SIZE


		def test_call_fit_none_mini_batch_size(sagemaker_session):

Update unit tests of kmeans, pca, factorization machines, lda and ntm #103

Update unit tests of kmeans, pca, factorization machines, lda and ntm #103

Uh oh!

Conversation

yangaws commented Mar 20, 2018

Uh oh!

codecov-io commented Mar 20, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yangaws Mar 22, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov-io commented Mar 20, 2018 •

edited

Loading

yangaws Mar 22, 2018 •

edited

Loading