Skip to content

add grpcio grpcio-status to support spark connect #2017

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Oct 30, 2023

Conversation

xieshuaihu
Copy link
Contributor

@xieshuaihu xieshuaihu commented Oct 25, 2023

Describe your changes

add grpcio and grpcio_status package in pyspark-notebook.

Before this pr, the pyspark-notebook:spark-3.5.0 will throw exception when connect to spark connect
image

Checklist (especially for first-time contributors)

  • I have performed a self-review of my code
  • If it is a core feature, I have added thorough tests
  • I will try not to use force-push to make the review process easier for reviewers
  • I have updated the documentation for significant changes

@mathbunnyru
Copy link
Member

I sorted install entries and fixed package name, it should be grpcio-status.

@mathbunnyru
Copy link
Member

Pyspark mentions these packages are required for Spark Connect, so it seems to be reasonable to install them.
https://spark.apache.org/docs/latest/api/python/getting_started/install.html#dependencies

@bjornjorgensen may I ask you to take a look here?

Copy link
Member

@mathbunnyru mathbunnyru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xieshuaihu xieshuaihu changed the title add grpcio grpcio_status to support spark connect add grpcio grpcio-status to support spark connect Oct 26, 2023
@xieshuaihu
Copy link
Contributor Author

@mathbunnyru docs is updated

@bjornjorgensen
Copy link
Contributor

It seams like we need to add versions and googleapis-common-protos

grpcio >=1.48,<1.57 Required for Spark Connect
grpcio-status >=1.48,<1.57 Required for Spark Connect
googleapis-common-protos ==1.56.4 Required for Spark Connect

@xieshuaihu
Copy link
Contributor Author

@bjornjorgensen grpcio and grpcio-status versions are added. There is no need to dep googleapis-common-protos directly which is a traverse dep of grpcio-status

@mathbunnyru
Copy link
Member

@xieshuaihu
Copy link
Contributor Author

@mathbunnyru thanks for your tips

Copy link
Member

@mathbunnyru mathbunnyru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@xieshuaihu a small tip for the future: try to keep lists sorted, if they are already sorted and you add new items to them.

@xieshuaihu
Copy link
Contributor Author

Sorry for that, I didn't know the lists sorted. And Thanks again for your patience and tips

@mathbunnyru
Copy link
Member

mathbunnyru commented Oct 29, 2023

@xieshuaihu please, fix the build failure.
To build images locally, you need to run make build/<somestack>, starting with docker-stacks-foundation (so you have the up-to-date version) and then building images in the hierarchy till pyspark-notebook.

@xieshuaihu
Copy link
Contributor Author

I am solving the deps conflict

@xieshuaihu
Copy link
Contributor Author

@mathbunnyru I inspected pyspark setup.py, the required grpcio version is >=1.56, so i removed the version
https://github.com/apache/spark/blob/branch-3.5/python/setup.py#L135

@xieshuaihu
Copy link
Contributor Author

It seams like we need to add versions and googleapis-common-protos

grpcio >=1.48,<1.57 Required for Spark Connect
grpcio-status >=1.48,<1.57 Required for Spark Connect
googleapis-common-protos ==1.56.4 Required for Spark Connect

It sees like this doc is out of date
https://github.com/apache/spark/blob/branch-3.5/python/setup.py#L135

@mathbunnyru
Copy link
Member

@mathbunnyru I inspected pyspark setup.py, the required grpcio version is >=1.56, so i removed the version https://github.com/apache/spark/blob/branch-3.5/python/setup.py#L135

Thanks, it makes sense.

@xieshuaihu
Copy link
Contributor Author

I don't know why the build failure again...

@xieshuaihu
Copy link
Contributor Author

I inspected this build error, It seems like test-sparklyr in all-spark-notebook was failed. I don't know sparklry, help is expected!

@mathbunnyru
Copy link
Member

This most likely has nothing to do with your PR.
Unfortunately, some tests seem to be flaky 😢
I restarted the failed builds.

@mathbunnyru
Copy link
Member

Restart helped, merging this PR.

@mathbunnyru mathbunnyru merged commit 66f7bef into jupyter:main Oct 30, 2023
@mathbunnyru
Copy link
Member

@xieshuaihu images have been updated, could you please test, that there is no error now?

@xieshuaihu
Copy link
Contributor Author

Thanks very much, @mathbunnyru . I will test it today.

@xieshuaihu
Copy link
Contributor Author

@xieshuaihu images have been updated, could you please test, that there is no error now?

I pulled the latest image, and run the following code, it works.

from pyspark.sql import SparkSession
spark = SparkSession.builder.remote("sc://host:port/;user_id=test").getOrCreate()
spark.sql("show databases").show()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants