Skip to content

healthcare API: update v1beta1 datasets samples to v1 #3300

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Apr 15, 2020
Merged

healthcare API: update v1beta1 datasets samples to v1 #3300

merged 23 commits into from
Apr 15, 2020

Conversation

noerog
Copy link
Contributor

@noerog noerog commented Apr 7, 2020

No description provided.

@noerog noerog requested a review from a team as a code owner April 7, 2020 23:54
@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Apr 7, 2020
@tmatsuo tmatsuo self-requested a review April 8, 2020 17:07
@noerog
Copy link
Contributor Author

noerog commented Apr 8, 2020

All healthcare tests are now passing, failures are in firestore/cloud-client.

Copy link
Contributor

@tmatsuo tmatsuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through the first pass and added some comments.

project_id = os.environ['GOOGLE_CLOUD_PROJECT']
service_account_json = os.environ['GOOGLE_APPLICATION_CREDENTIALS']

dataset_id = 'test_dataset-{}'.format(int(time.time()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use uuid.uuid4() instead of time.time()?
time.time() can easily conflict between multiple builds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

bucket)

# Clean up
dicom_stores.delete_dicom_store(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this clean up is always called. Can you move this to teardown part in another fixuture?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created new test_dicom_store fixture.

'serviceAccount:[email protected]',
'roles/viewer')

# Clean up
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created new test_dicom_store fixture.



# [START healthcare_dicom_store_set_iam_policy]
def set_dicom_store_iam_policy(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and elsewhere, you have service_account_json as the first argument. Does application default credentials not work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's for historical reasons in case ADC wasn't set. Would you prefer for it to be removed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have preference on ADC because it will be easier for users, but it's up to you. I won't block because of this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed all files to use ADC. I'm inclined to do the same thing with project ID. WDYT?

project_id = os.environ['GOOGLE_CLOUD_PROJECT']
service_account_json = os.environ['GOOGLE_APPLICATION_CREDENTIALS']

dataset_id = 'test_dataset-{}'.format(int(time.time()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uuid

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


blob.upload_from_filename(resource_file)

time.sleep(10) # Give new blob time to propagate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This leads to a test flakiness. Consider a more reliable way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using retrying package to check for HttpError with backoff.

import_object,
)

# Clean up
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned about having clean-up code in the test body.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created new test_fhir_store fixture.

"roles/viewer",
)

# Clean up
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created new test_fhir_store fixture.

hl7v2_message_file)

hl7v2_message_id = ""
@eventually_consistent.call
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI we released gcp-devrel-py-tools 0.0.16. It allows you to have tries argument in the decorator.

@eventually_consistent.call(tries=5)

This will shorten the test time on failures.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you recommend adding it here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's up to you. It's mostly for convenience during local development.

Copy link
Contributor

@tmatsuo tmatsuo Apr 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@noerog We decided to remove gcp-devrel-py-tools from this repo. Sorry about being back and forth. So can you replace it with retrying module? I think you know how (because other code has it), but feel free to ask anything.

Also, if sample code needs retrying, the sample command line tool is also flaky. Does it make sense to have retrying logic in the sample code so that command line tool will almost always work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we even need to retry the hl7v2_messages_list test. It was broken when you added the decorator because the API surface changed, not because it was flaky.

@@ -0,0 +1,3 @@
pytest==5.3.2
gcp-devrel-py-tools==0.0.15
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe:
gcp-devrel-py-tools==0.0.16

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 36 to 37
discovery_url = '{}?version={}'.format(
discovery_api, api_version)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this API is publicly listed at https://www.googleapis.com/discovery/v1/apis, so you shouldn't need a discovery_url.

Providing the service name and version should be sufficient for building the client.

discovery.build(service_name, api_version, credentials=scoped_credentials)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in all files.

Copy link
Contributor

@tmatsuo tmatsuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just responded to your comment. I'll review all the files again tomorrow.

Copy link
Contributor

@tmatsuo tmatsuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We decided to remove gcp-devrel-py-tools from this repo. Can you use retrying instead?

@kurtisvg FYI

hl7v2_message_file)

hl7v2_message_id = ""
@eventually_consistent.call
Copy link
Contributor

@tmatsuo tmatsuo Apr 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@noerog We decided to remove gcp-devrel-py-tools from this repo. Sorry about being back and forth. So can you replace it with retrying module? I think you know how (because other code has it), but feel free to ask anything.

Also, if sample code needs retrying, the sample command line tool is also flaky. Does it make sense to have retrying logic in the sample code so that command line tool will almost always work?

@tmatsuo
Copy link
Contributor

tmatsuo commented Apr 10, 2020

@noerog Thanks, this PR is big. I'll go through all the files, first thing in the next week. Sorry for the delay, but I hope it works for you.

@noerog
Copy link
Contributor Author

noerog commented Apr 11, 2020

No problem at all. Just want to make sure you know that this is basically just a branch off the current v1beta1 code in https://github.com/GoogleCloudPlatform/python-docs-samples/tree/master/healthcare/api-client but updated for v1. I am very happy to make improvements, but wanted to let you know that the vast majority of this code has already gone through past reviews. Thanks.

I also plan to delete the vast majority of the v1beta1 code once this PR is in and create a v1beta1 directory like the v1 branch here. There are a few features that didn't make it from v1beta1 to v1 that I'd like to keep though.

Copy link
Contributor

@tmatsuo tmatsuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the files in the datasets directory. I haven't reviewed other files in detail, but I can imagine there are similar issues in other directories too, so I decided to send the review now, rather than later.

I also have a suggestion. How about to separate this PR into multiple PRs, one per sub directory?

@@ -0,0 +1,425 @@
# Copyright 2018 Google LLC All Rights Reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this file just copied from the v1beta directory?
I think even if that's the case, the copyright year should be 2020 because it's a new file. I may be wrong. If you're not sure which is right, do you mind asking OSPO folks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated all .py files to 2020.

api_version = 'v1'
service_name = 'healthcare'

credentials = service_account.Credentials.from_service_account_file(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you still need to create the credentials manually?

If ADC just works with the client, the client can be create just by

return discovery.build(service_name, api_version)

If you're sure that you need to pass the credentials manually, please ignore this comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed here and in all other files where possible. Also updated the doc string to clarify that the credentials come from ADC.

In fhir_resources.py and dicomweb.py we have to use the requests package which requires passing in credentials manually, AFAIK.

dataset_parent = 'projects/{}/locations/{}'.format(
project_id, cloud_region)

body = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to create body local variable?

can you just do:

    request = client.projects().locations().datasets().create(
        parent=dataset_parent, body={}, datasetId=dataset_id)

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed here and in dicom_stores.py.

return response
except HttpError as e:
print('Error, dataset not created: {}'.format(e))
return ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function returns the response when it succeeds, and an empty string upon failure.

Does it make sense to just omit the return "" statement so that it returns None?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done here and in all other relevant files.

return response
except HttpError as e:
print('Error, dataset not deleted: {}'.format(e))
return ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

cloud_region = 'us-central1'
project_id = os.environ['GOOGLE_CLOUD_PROJECT']

dataset_id = 'test-dataset-{}'.format(int(time.time()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use uuid4 instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, sorry for not catching it in this file.

time_zone)

# Clean up
datasets.delete_dataset(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

Can you make another fixture which actually creates a temporary dataset and delete it upon cleanup?

You may need to have different dataset_id for those fixtures.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

assert 'UTC' in out


def test_deidentify_dataset(capsys):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please utilize a fixture.

Copy link
Contributor Author

@noerog noerog Apr 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a fixture for the source dataset and using a fixture for the de-id destination dataset.

assert 'De-identified data written to' in out


def test_get_set_dataset_iam_policy(capsys):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixture

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@noerog
Copy link
Contributor Author

noerog commented Apr 13, 2020

I removed FHIR files from this PR, they're now in #3384. Will update after I've done the same with DICOM and HL7v2. This PR needs to go in first because the other directories depend on files in datasets/.

Copy link
Contributor

@tmatsuo tmatsuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again I only reviewed files in datasets.

api_version = 'v1'
service_name = 'healthcare'

return discovery.build(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: You can have this in one line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

def create_dataset(
project_id,
cloud_region,
dataset_id):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this function fit within 79 chars? If so why not having this in one line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.



# [START healthcare_delete_dataset]
def delete_dataset(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.



# [START healthcare_get_dataset]
def get_dataset(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also this should fit in one line



# [START healthcare_patch_dataset]
def patch_dataset(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also fit within one line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both done.



# [START healthcare_dataset_get_iam_policy]
def get_dataset_iam_policy(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also fit within one line


# Delete the destination_dataset_id which
# is created as part of the de-id test.
datasets.delete_dataset(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unreliable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added crud_dataset_id and dest_dataset_id fixtures with a retry check for non-404 errors. Only checked for non-404 because I can't think of any other error code we'd want to do the same for. Is this what you were looking for?

@noerog
Copy link
Contributor Author

noerog commented Apr 13, 2020

hl7v2/ PR: #3388
dicom/ PR: #3387
fhir/ PR: #3384

@noerog noerog changed the title healthcare API: update all v1beta1 samples to v1 and create v1 directory healthcare API: update v1beta1 datasets samples to v1 Apr 13, 2020
Copy link
Contributor

@tmatsuo tmatsuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, almost there


@pytest.fixture(scope="module")
def test_dataset():
dataset = datasets.create_dataset(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add retries for creating/deleting here too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Interestingly I didn't need to yield the dataset to get the fixture value because "dataset" isn't used anywhere.

Copy link
Contributor

@tmatsuo tmatsuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@noerog Thanks!

Sorry I didn't catch it in the previous pass, but I think the callback you're using is not right. See the comment below.

wait_exponential_max=10000,
stop_max_attempt_number=10)
def create():
datasets.create_dataset(project_id, cloud_region, dataset_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also ignore 409 conflict here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry I was not clear.

The try and exception clause look good, but I'd like it to be retried. Can you keep the @retry and def create(), then inside the create(), add the try-except clause?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh sorry, fixed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you keep the try-except block in all the setup/cleanup code?

stop_max_attempt_number=10,
retry_on_exception=retry_if_server_exception)
def clean_up():
datasets.delete_dataset(project_id, cloud_region, dataset_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ignore 404

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please keep @retry

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

stop_max_attempt_number=10,
retry_on_exception=retry_if_server_exception)
def clean_up():
datasets.delete_dataset(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ignore 404

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

stop_max_attempt_number=10,
retry_on_exception=retry_if_server_exception)
def clean_up():
datasets.delete_dataset(project_id, cloud_region, dataset_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ignore 404

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

@tmatsuo tmatsuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I was not super clear. I'd like to keep @retry.

wait_exponential_max=10000,
stop_max_attempt_number=10)
def create():
datasets.create_dataset(project_id, cloud_region, dataset_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry I was not clear.

The try and exception clause look good, but I'd like it to be retried. Can you keep the @retry and def create(), then inside the create(), add the try-except clause?

stop_max_attempt_number=10,
retry_on_exception=retry_if_server_exception)
def clean_up():
datasets.delete_dataset(project_id, cloud_region, dataset_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please keep @retry

stop_max_attempt_number=10,
retry_on_exception=retry_if_server_exception)
def clean_up():
datasets.delete_dataset(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

stop_max_attempt_number=10,
retry_on_exception=retry_if_server_exception)
def clean_up():
datasets.delete_dataset(project_id, cloud_region, dataset_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor

@tmatsuo tmatsuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! It looks great now.

Copy link
Contributor

@tmatsuo tmatsuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, one last change from me.

I think we should remove try-except from at least create_dataset and delete_dataset for robust test setup and clean up.

Also, maybe we should remove try-except from all the sample code because the current code is just throwing away all the information. This is not a good practice anyways.

Copy link
Contributor

@tmatsuo tmatsuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I'm afraid that you misunderstood my requests.
What I asked was to remove try-except from datasets.py, not from datasets_test.py.

I think we need both try-except and retry for setup/cleanup code in datasets_test.py.

If we only retry, it's problematic because

  • creation fails if the first ceate_dataset failed, but on the server side creation actually succeeded
  • deletion fails if the test code successfully delete the CRUD dataset
  • deletion fails if the first delete_dataset failed, but on the server side deletion actually succeeded

wait_exponential_max=10000,
stop_max_attempt_number=10)
def create():
datasets.create_dataset(project_id, cloud_region, dataset_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you keep the try-except block in all the setup/cleanup code?

noerog and others added 2 commits April 15, 2020 14:20
more robust test setup and cleanup
@tmatsuo
Copy link
Contributor

tmatsuo commented Apr 15, 2020

@noerog @busunkim96
I added a commit to noerog's branch. PTAL

@tmatsuo
Copy link
Contributor

tmatsuo commented Apr 15, 2020

The test is taking a long time because we have significant diff between master and the change base.

@noerog
FYI, rebasing to master often will shorten the build time, you may want to do that for upcoming PRs.

@tmatsuo tmatsuo merged commit 5a3cd08 into GoogleCloudPlatform:master Apr 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants