Skip to content

build: replace rockylinux with chainguard/wolfi as a base image #423

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Jun 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -112,3 +112,9 @@ jobs:
source .venv/bin/activate
make docker-build
make docker-test
- name: Scan image
uses: anchore/scan-action@v3
with:
image: "pipeline-family-${{ env.PIPELINE_FAMILY }}-dev"
# NOTE(robinson) - revert this to medium when we bump libreoffice
severity-cutoff: high
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
## 0.0.71-dev0

* replace rockylinux with chainguard/wolfi as a base image for `amd64`

## 0.0.70

* Bump to `unstructured` 0.14.6
Expand Down
42 changes: 42 additions & 0 deletions Dockerfile-amd64
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# syntax=docker/dockerfile:experimental
FROM quay.io/unstructured-io/base-images:wolfi-base@sha256:6c00a236c648ffdaf196ccbc446f5c6cc9eb4e3ab9e437178abcfac710b2b373 as base

# NOTE(crag): NB_USER ARG for mybinder.org compat:
# https://mybinder.readthedocs.io/en/latest/tutorials/dockerfile.html
ARG NB_USER=notebook-user
ARG NB_UID=1000
ARG PIP_VERSION
ARG PIPELINE_PACKAGE
ARG PYTHON_VERSION="3.11"

# Set up environment
ENV PYTHON python${PYTHON_VERSION}
ENV PIP ${PYTHON} -m pip

WORKDIR ${HOME}
USER ${NB_USER}

ENV PYTHONPATH="${PYTHONPATH}:${HOME}"
ENV PATH="/home/${NB_USER}/.local/bin:${PATH}"

FROM base as python-deps
COPY --chown=${NB_USER}:${NB_USER} requirements/base.txt requirements-base.txt
RUN ${PIP} install pip==${PIP_VERSION}
RUN ${PIP} install --no-cache -r requirements-base.txt

FROM python-deps as model-deps
RUN ${PYTHON} -c "import nltk; nltk.download('punkt')" && \
${PYTHON} -c "import nltk; nltk.download('averaged_perceptron_tagger')" && \
${PYTHON} -c "from unstructured.partition.model_init import initialize; initialize()"

FROM model-deps as code
COPY --chown=${NB_USER}:${NB_USER} CHANGELOG.md CHANGELOG.md
COPY --chown=${NB_USER}:${NB_USER} logger_config.yaml logger_config.yaml
COPY --chown=${NB_USER}:${NB_USER} prepline_${PIPELINE_PACKAGE}/ prepline_${PIPELINE_PACKAGE}/
COPY --chown=${NB_USER}:${NB_USER} exploration-notebooks exploration-notebooks
COPY --chown=${NB_USER}:${NB_USER} scripts/app-start.sh scripts/app-start.sh

ENTRYPOINT ["scripts/app-start.sh"]
# Expose a default port of 8000. Note: The EXPOSE instruction does not actually publish the port,
# but some tooling will inspect containers and perform work contingent on networking support declared.
EXPOSE 8000
File renamed without changes.
2 changes: 1 addition & 1 deletion prepline_general/api/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
app = FastAPI(
title="Unstructured Pipeline API",
summary="Partition documents with the Unstructured library",
version="0.0.70",
version="0.0.71",
docs_url="/general/docs",
openapi_url="/general/openapi.json",
servers=[
Expand Down
4 changes: 2 additions & 2 deletions prepline_general/api/general.py
Original file line number Diff line number Diff line change
Expand Up @@ -713,7 +713,7 @@ def return_content_type(filename: str):


@router.get("/general/v0/general", include_in_schema=False)
@router.get("/general/v0.0.70/general", include_in_schema=False)
@router.get("/general/v0.0.71/general", include_in_schema=False)
async def handle_invalid_get_request():
raise HTTPException(
status_code=status.HTTP_405_METHOD_NOT_ALLOWED, detail="Only POST requests are supported."
Expand All @@ -728,7 +728,7 @@ async def handle_invalid_get_request():
description="Description",
operation_id="partition_parameters",
)
@router.post("/general/v0.0.70/general", include_in_schema=False)
@router.post("/general/v0.0.71/general", include_in_schema=False)
def general_partition(
request: Request,
# cannot use annotated type here because of a bug described here:
Expand Down
2 changes: 1 addition & 1 deletion preprocessing-pipeline-family.yaml
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
name: general
version: 0.0.70
version: 0.0.71
19 changes: 11 additions & 8 deletions scripts/docker-build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -9,14 +9,17 @@ DOCKER_IMAGE="${DOCKER_IMAGE:-pipeline-family-${PIPELINE_FAMILY}-dev}"
DOCKER_PLATFORM="${DOCKER_PLATFORM:-}"


DOCKER_BUILD_CMD=(docker buildx build --load -f Dockerfile \
--build-arg PIP_VERSION="$PIP_VERSION" \
--build-arg BUILDKIT_INLINE_CACHE=1 \
--build-arg PIPELINE_PACKAGE="$PIPELINE_PACKAGE" \
--progress plain \
--target code \
--cache-from "$DOCKER_REPOSITORY":latest \
-t "$DOCKER_IMAGE" .)
DOCKER_BUILD_CMD=(
docker buildx build --load -f Dockerfile-amd64
--build-arg PIP_VERSION="$PIP_VERSION"
--build-arg BUILDKIT_INLINE_CACHE=1
--build-arg PIPELINE_PACKAGE="$PIPELINE_PACKAGE"
--progress plain
--platform linux/amd64
--cache-from "$DOCKER_REPOSITORY:latest"
-t "$DOCKER_IMAGE"
.
)

# only build for specific platform if DOCKER_PLATFORM is set
if [ -n "${DOCKER_PLATFORM:-}" ]; then
Expand Down