Skip to content

[Win][Config] Enable full support of UnstructuredIO API features on Windows #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Jul 11, 2024

Conversation

tjtanaa
Copy link
Member

@tjtanaa tjtanaa commented Jul 11, 2024

Changes

  1. Update unstructuredio_api.spec.
  2. Update unstructuredio_api.py.
  3. Add additional setup dependencies to the docs/Windows.md.

christinestraub and others added 14 commits June 14, 2024 15:16
### Summary
Version bumps for regular maintenance and to address moderate CVEs from
security scans.
- bump `unstructured` to `0.14.6`
- bump `unstructured-inference` to `0.7.35`
…ructured-IO#423)

### Summary
Updates the Dockerfile to use the Chainguard wolfi-base image to reduce
CVEs. Also adds a step in the docker publish job that scans the images
and checks for CVEs before publishing.

### Testing
Run `make docker-build` and  `make docker-start-api`, then try:
```
from unstructured.partition.api import partition_via_api

elements = partition_via_api(
    filename=filename,
    api_url="http://localhost:8000/general/v0/general",
    api_key="<API-KEY>",
    strategy="hi_res",
)

print("\n\n".join([str(el) for el in elements]))
```
…dx build` command (Unstructured-IO#425)

I noticed that images on main branch are failing to build (and push) due
to missing `-f` parameter in `docker buildx build`. By default it
expects `Dockerfile` to exist, but we only have `Dockerfile-amd64` and
`Dockerfile-arm64`


![image](https://github.com/Unstructured-IO/unstructured-api/assets/64484917/4527165a-909e-498d-b0ee-8bba4b1a13e4)

---------

Co-authored-by: christinestraub <[email protected]>
unnecessary SHA update introduced in
Unstructured-IO#427 that needs
to be reverted
…-IO#429)

shell syntax error occurs in docker-publish.yml workflow
…ed-IO#430)

bug introduced in previous PR causing build failure on main
### Summary

Bumps dependency versions for the API. Closes Unstructured-IO#432.
…ctured-IO#436)

# Changes
**Fix for docx and other office files returning `{"detail":"File type
None is not supported."}`**
After moving to the wolfi base image, the `mimetypes` lib no longer
knows about these file extensions. To avoid issues like this, let's add
an explicit mapping for all the file extensions we care about. I added a
`filetypes.py` and moved `get_validated_mimetype` over. When this file
is imported, we'll call `mimetypes.add_type` for all file extensions we
support.

**Update smoke test coverage**
This bug snuck past because we were already providing the mimetype in
the docker smoke test. I updated `test_happy_path` to test against the
container with and without passing `content_type`. I added some missing
filetypes, and sorted the test params by extension so we can see when
new types are missing.

# Testing
The new smoke test will verify that all filetypes are working. You can
also `make docker-build && make docker-start-api`, and test out the docx
in the sample docs dir. On `main`, this file will give you the error
above.
```
curl 'http://localhost:8000/general/v0/general' \
--form 'files=@"fake.docx"'
```
@tjtanaa tjtanaa added priority: high High priority request type: documentation Improvements or additions to documentation type: enhancement / feature New feature or request labels Jul 11, 2024
@tjtanaa tjtanaa self-assigned this Jul 11, 2024
@tjtanaa tjtanaa merged commit 392a12e into win-tj Jul 11, 2024
@tjtanaa tjtanaa deleted the merge-main-tj branch July 11, 2024 06:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: high High priority request type: documentation Improvements or additions to documentation type: enhancement / feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants