-
Notifications
You must be signed in to change notification settings - Fork 988
Upgrade Base Image: colab_20250404-060113_RC00 #1484
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jaesong-colab Any thoughts about this diff from the new image?
Dockerfile.tmpl
Outdated
|
||
# b/408284435: Keras 3.6 broke test_keras.py > test_train > keras.datasets.mnist.load_data() | ||
# See https://github.com/keras-team/keras/commit/dcefb139863505d166dd1325066f329b3033d45a | ||
# Colab base is on Keras 3.8, we have to install the package separately | ||
RUN uv pip install --system google-cloud-automl==1.0.1 google-cloud-aiplatform google-cloud-translate==3.12.1 \ | ||
google-cloud-videointelligence google-cloud-vision google-genai "keras<3.6" | ||
RUN uv pip install --system "keras<3.6" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we stuck on old Keras?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same error as the Original:
load_dataset doesn't allow you to install the dataset in a specific dir, but only in the "keras cache dir".
I think we can change the test to upgrade keras if we like. wdyt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I presume updating a test for a package upgrade is reasonable.
Is there a reason not to update the test? Is it validating something that shouldn't change and will break something in Kaggle Notebooks?
Dockerfile.tmpl
Outdated
@@ -44,8 +47,10 @@ RUN uv pip install --no-build-isolation --system "git+https://github.com/Kaggle/ | |||
# b/408281617: Torch is adamant that it can not install cudnn 9.3.x, only 9.1.x, but Tensorflow can only support 9.3.x. | |||
# This conflict causes a number of package downgrades, which are handled in this command | |||
# b/302136621: Fix eli5 import for learntools | |||
# b/416137032: cuda 12.9.0 breaks datashader 1.18.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's datashader required for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't seem required for learn-tools and usage isn't high, we can remove.
@@ -7,6 +7,10 @@ FROM gcr.io/kaggle-images/python-lightgbm-whl:${BASE_IMAGE_TAG}-${LIGHTGBM_VERSI | |||
{{ end }} | |||
FROM ${BASE_IMAGE}:${BASE_IMAGE_TAG} | |||
|
|||
#b/415358342: UV reports missing requirements files https://github.com/googlecolab/colabtools/issues/5237 | |||
ENV UV_CONSTRAINT= \ | |||
UV_BUILD_CONSTRAINT= |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this and including the issue!
Dockerfile.tmpl
Outdated
RUN uv pip install --system --force-reinstall --extra-index-url https://pypi.nvidia.com "cuml-cu12==25.2.1" \ | ||
"nvidia-cudnn-cu12==9.3.0.75" scipy tsfresh scikit-learn==1.2.2 category-encoders eli5 | ||
"nvidia-cudnn-cu12==9.3.0.75" cuda-bindings==12.8.0 cuda-python==12.8.0 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvidia-cudnn-cu12 is already at 9.3.0.75. We do not install "cuda-bindings", "cuda-python", "tsfresh", "category-encoders", "eli5". perhaps it can be moved to requirements.txt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah these package were being problematic when we did this fix, seem like they can be re-added to req txt after testing it locally with learn tools.
# b/408284143: google-cloud-automl 2.0.0 introduced incompatible API changes, need to pin to 1.0.1 | ||
RUN uv pip install --system --force-reinstall --prerelease=allow kagglehub[pandas-datasets,hf-datasets,signing]>=0.3.12 \ | ||
google-cloud-automl==1.0.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not install "google-cloud-automl" perhaps it can be moved to requirements.txt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't breaks build due conflicting need for protobuf version. added a comment to make that clear and ensure it is revisited.
This particular image had issues with UV installs however does highlight a solution and will be included in the next image: googlecolab/colabtools#5237
Base image also removes Gensim due to SciPy 1.14.1, we included a fix to install both, since Gensim is a popular package 200 users per day.
Updated mocks for GCS related tests, latest version causes issues
Adding a few packages back into requirements.txt that were remove due to fixes that have been since resolved