-
Notifications
You must be signed in to change notification settings - Fork 6.6k
healthcare API: update v1beta1 datasets samples to v1 #3300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
All healthcare tests are now passing, failures are in firestore/cloud-client. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went through the first pass and added some comments.
project_id = os.environ['GOOGLE_CLOUD_PROJECT'] | ||
service_account_json = os.environ['GOOGLE_APPLICATION_CREDENTIALS'] | ||
|
||
dataset_id = 'test_dataset-{}'.format(int(time.time())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use uuid.uuid4()
instead of time.time()
?
time.time()
can easily conflict between multiple builds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
bucket) | ||
|
||
# Clean up | ||
dicom_stores.delete_dicom_store( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this clean up is always called. Can you move this to teardown part in another fixuture?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created new test_dicom_store fixture.
'serviceAccount:[email protected]', | ||
'roles/viewer') | ||
|
||
# Clean up |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created new test_dicom_store fixture.
|
||
|
||
# [START healthcare_dicom_store_set_iam_policy] | ||
def set_dicom_store_iam_policy( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and elsewhere, you have service_account_json as the first argument. Does application default credentials not work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's for historical reasons in case ADC wasn't set. Would you prefer for it to be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have preference on ADC because it will be easier for users, but it's up to you. I won't block because of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed all files to use ADC. I'm inclined to do the same thing with project ID. WDYT?
project_id = os.environ['GOOGLE_CLOUD_PROJECT'] | ||
service_account_json = os.environ['GOOGLE_APPLICATION_CREDENTIALS'] | ||
|
||
dataset_id = 'test_dataset-{}'.format(int(time.time())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uuid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
blob.upload_from_filename(resource_file) | ||
|
||
time.sleep(10) # Give new blob time to propagate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This leads to a test flakiness. Consider a more reliable way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using retrying package to check for HttpError with backoff.
import_object, | ||
) | ||
|
||
# Clean up |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned about having clean-up code in the test body.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created new test_fhir_store fixture.
"roles/viewer", | ||
) | ||
|
||
# Clean up |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created new test_fhir_store fixture.
hl7v2_message_file) | ||
|
||
hl7v2_message_id = "" | ||
@eventually_consistent.call |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI we released gcp-devrel-py-tools 0.0.16. It allows you to have tries
argument in the decorator.
@eventually_consistent.call(tries=5)
This will shorten the test time on failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you recommend adding it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's up to you. It's mostly for convenience during local development.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@noerog We decided to remove gcp-devrel-py-tools from this repo. Sorry about being back and forth. So can you replace it with retrying
module? I think you know how (because other code has it), but feel free to ask anything.
Also, if sample code needs retrying, the sample command line tool is also flaky. Does it make sense to have retrying logic in the sample code so that command line tool will almost always work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we even need to retry the hl7v2_messages_list test. It was broken when you added the decorator because the API surface changed, not because it was flaky.
@@ -0,0 +1,3 @@ | |||
pytest==5.3.2 | |||
gcp-devrel-py-tools==0.0.15 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe:
gcp-devrel-py-tools==0.0.16
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
discovery_url = '{}?version={}'.format( | ||
discovery_api, api_version) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this API is publicly listed at https://www.googleapis.com/discovery/v1/apis, so you shouldn't need a discovery_url
.
Providing the service name and version should be sufficient for building the client.
discovery.build(service_name, api_version, credentials=scoped_credentials)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in all files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just responded to your comment. I'll review all the files again tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We decided to remove gcp-devrel-py-tools from this repo. Can you use retrying
instead?
@kurtisvg FYI
hl7v2_message_file) | ||
|
||
hl7v2_message_id = "" | ||
@eventually_consistent.call |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@noerog We decided to remove gcp-devrel-py-tools from this repo. Sorry about being back and forth. So can you replace it with retrying
module? I think you know how (because other code has it), but feel free to ask anything.
Also, if sample code needs retrying, the sample command line tool is also flaky. Does it make sense to have retrying logic in the sample code so that command line tool will almost always work?
@noerog Thanks, this PR is big. I'll go through all the files, first thing in the next week. Sorry for the delay, but I hope it works for you. |
No problem at all. Just want to make sure you know that this is basically just a branch off the current v1beta1 code in https://github.com/GoogleCloudPlatform/python-docs-samples/tree/master/healthcare/api-client but updated for v1. I am very happy to make improvements, but wanted to let you know that the vast majority of this code has already gone through past reviews. Thanks. I also plan to delete the vast majority of the v1beta1 code once this PR is in and create a v1beta1 directory like the v1 branch here. There are a few features that didn't make it from v1beta1 to v1 that I'd like to keep though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed the files in the datasets
directory. I haven't reviewed other files in detail, but I can imagine there are similar issues in other directories too, so I decided to send the review now, rather than later.
I also have a suggestion. How about to separate this PR into multiple PRs, one per sub directory?
@@ -0,0 +1,425 @@ | |||
# Copyright 2018 Google LLC All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this file just copied from the v1beta directory?
I think even if that's the case, the copyright year should be 2020 because it's a new file. I may be wrong. If you're not sure which is right, do you mind asking OSPO folks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated all .py files to 2020.
api_version = 'v1' | ||
service_name = 'healthcare' | ||
|
||
credentials = service_account.Credentials.from_service_account_file( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you still need to create the credentials
manually?
If ADC just works with the client, the client can be create just by
return discovery.build(service_name, api_version)
If you're sure that you need to pass the credentials manually, please ignore this comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed here and in all other files where possible. Also updated the doc string to clarify that the credentials come from ADC.
In fhir_resources.py and dicomweb.py we have to use the requests package which requires passing in credentials manually, AFAIK.
dataset_parent = 'projects/{}/locations/{}'.format( | ||
project_id, cloud_region) | ||
|
||
body = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to create body
local variable?
can you just do:
request = client.projects().locations().datasets().create(
parent=dataset_parent, body={}, datasetId=dataset_id)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed here and in dicom_stores.py.
return response | ||
except HttpError as e: | ||
print('Error, dataset not created: {}'.format(e)) | ||
return "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function returns the response when it succeeds, and an empty string upon failure.
Does it make sense to just omit the return ""
statement so that it returns None?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done here and in all other relevant files.
return response | ||
except HttpError as e: | ||
print('Error, dataset not deleted: {}'.format(e)) | ||
return "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
cloud_region = 'us-central1' | ||
project_id = os.environ['GOOGLE_CLOUD_PROJECT'] | ||
|
||
dataset_id = 'test-dataset-{}'.format(int(time.time())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use uuid4 instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, sorry for not catching it in this file.
time_zone) | ||
|
||
# Clean up | ||
datasets.delete_dataset( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
Can you make another fixture which actually creates a temporary dataset and delete it upon cleanup?
You may need to have different dataset_id for those fixtures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
assert 'UTC' in out | ||
|
||
|
||
def test_deidentify_dataset(capsys): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please utilize a fixture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a fixture for the source dataset and using a fixture for the de-id destination dataset.
assert 'De-identified data written to' in out | ||
|
||
|
||
def test_get_set_dataset_iam_policy(capsys): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixture
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
…are only in v1beta1
…f notificationConfigs
I removed FHIR files from this PR, they're now in #3384. Will update after I've done the same with DICOM and HL7v2. This PR needs to go in first because the other directories depend on files in datasets/. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again I only reviewed files in datasets.
api_version = 'v1' | ||
service_name = 'healthcare' | ||
|
||
return discovery.build( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: You can have this in one line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
def create_dataset( | ||
project_id, | ||
cloud_region, | ||
dataset_id): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this function fit within 79 chars? If so why not having this in one line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
|
||
# [START healthcare_delete_dataset] | ||
def delete_dataset( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
|
||
# [START healthcare_get_dataset] | ||
def get_dataset( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also this should fit in one line
|
||
|
||
# [START healthcare_patch_dataset] | ||
def patch_dataset( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also fit within one line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both done.
|
||
|
||
# [START healthcare_dataset_get_iam_policy] | ||
def get_dataset_iam_policy( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also fit within one line
|
||
# Delete the destination_dataset_id which | ||
# is created as part of the de-id test. | ||
datasets.delete_dataset( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unreliable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added crud_dataset_id and dest_dataset_id fixtures with a retry check for non-404 errors. Only checked for non-404 because I can't think of any other error code we'd want to do the same for. Is this what you were looking for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, almost there
|
||
@pytest.fixture(scope="module") | ||
def test_dataset(): | ||
dataset = datasets.create_dataset( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add retries for creating/deleting here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Interestingly I didn't need to yield the dataset to get the fixture value because "dataset" isn't used anywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@noerog Thanks!
Sorry I didn't catch it in the previous pass, but I think the callback you're using is not right. See the comment below.
wait_exponential_max=10000, | ||
stop_max_attempt_number=10) | ||
def create(): | ||
datasets.create_dataset(project_id, cloud_region, dataset_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also ignore 409 conflict here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry I was not clear.
The try and exception clause look good, but I'd like it to be retried. Can you keep the @retry
and def create()
, then inside the create()
, add the try-except clause?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh sorry, fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you keep the try-except block in all the setup/cleanup code?
stop_max_attempt_number=10, | ||
retry_on_exception=retry_if_server_exception) | ||
def clean_up(): | ||
datasets.delete_dataset(project_id, cloud_region, dataset_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ignore 404
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please keep @retry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
stop_max_attempt_number=10, | ||
retry_on_exception=retry_if_server_exception) | ||
def clean_up(): | ||
datasets.delete_dataset( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ignore 404
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
stop_max_attempt_number=10, | ||
retry_on_exception=retry_if_server_exception) | ||
def clean_up(): | ||
datasets.delete_dataset(project_id, cloud_region, dataset_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ignore 404
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I was not super clear. I'd like to keep @retry
.
wait_exponential_max=10000, | ||
stop_max_attempt_number=10) | ||
def create(): | ||
datasets.create_dataset(project_id, cloud_region, dataset_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry I was not clear.
The try and exception clause look good, but I'd like it to be retried. Can you keep the @retry
and def create()
, then inside the create()
, add the try-except clause?
stop_max_attempt_number=10, | ||
retry_on_exception=retry_if_server_exception) | ||
def clean_up(): | ||
datasets.delete_dataset(project_id, cloud_region, dataset_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please keep @retry
stop_max_attempt_number=10, | ||
retry_on_exception=retry_if_server_exception) | ||
def clean_up(): | ||
datasets.delete_dataset( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
stop_max_attempt_number=10, | ||
retry_on_exception=retry_if_server_exception) | ||
def clean_up(): | ||
datasets.delete_dataset(project_id, cloud_region, dataset_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! It looks great now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, one last change from me.
I think we should remove try-except from at least create_dataset
and delete_dataset
for robust test setup and clean up.
Also, maybe we should remove try-except from all the sample code because the current code is just throwing away all the information. This is not a good practice anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I'm afraid that you misunderstood my requests.
What I asked was to remove try-except from datasets.py
, not from datasets_test.py
.
I think we need both try-except and retry for setup/cleanup code in datasets_test.py
.
If we only retry, it's problematic because
- creation fails if the first
ceate_dataset
failed, but on the server side creation actually succeeded - deletion fails if the test code successfully delete the CRUD dataset
- deletion fails if the first
delete_dataset
failed, but on the server side deletion actually succeeded
wait_exponential_max=10000, | ||
stop_max_attempt_number=10) | ||
def create(): | ||
datasets.create_dataset(project_id, cloud_region, dataset_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you keep the try-except block in all the setup/cleanup code?
more robust test setup and cleanup
@noerog @busunkim96 |
The test is taking a long time because we have significant diff between master and the change base. @noerog |
No description provided.