-
Notifications
You must be signed in to change notification settings - Fork 17
feat: Parameter to send custom page range when splitting pdf #125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
When the client prepares the request, it turns list parameters into multiple instances of the same key. For instance: `extract_image_block_types=["Image", "Table"]` becomes `extract_image_block_types[]="Image"` `extract_image_block_types[]="Table"` We need to account for this in our `parse_form_data` helper if we want to use list params in our hooks. Likewise, we need to go the other way when recreating the request in `create_request_body`.
b441902
to
17f84c6
Compare
17f84c6
to
80902b5
Compare
61ff079
to
ab11a4d
Compare
pawel-kmiecik
approved these changes
Jul 12, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
awalker4
added a commit
to Unstructured-IO/unstructured-js-client
that referenced
this pull request
Aug 7, 2024
To match the python feature: Unstructured-IO/unstructured-python-client#125 Add a client-side param called `splitPdfPageRange` which takes a list of two integers, `[start, end]`. If `splitPdfPage` is `true` and a range is set, slice the doc from `start` up to and including `end`. Only this page range will be sent to the API. The subset of pages is still split up as needed. If `[start, end]` is out of bounds, throw an error to the user.
awalker4
added a commit
to Unstructured-IO/unstructured-js-client
that referenced
this pull request
Aug 9, 2024
To match the python feature: Unstructured-IO/unstructured-python-client#125 # New parameter Add a client-side param called `splitPdfPageRange` which takes a list of two integers, `[start, end]`. If `splitPdfPage` is `true` and a range is set, slice the doc from `start` up to and including `end`. Only this page range will be sent to the API. The subset of pages is still split up as needed. If `[start, end]` is out of bounds, throw an error to the user. # Testing Check out this branch and set up a request to your local API: ``` const client = new UnstructuredClient({ serverURL: "http://localhost:8000", security: { apiKeyAuth: key, }, }); const filename = "layout-parser-paper.pdf"; const data = fs.readFileSync(filename); client.general.partition({ partitionParameters: { files: { content: data, fileName: filename, }, strategy: Strategy.Fast, splitPdfPage: true, splitPdfPageRange: [4, 8], } }).then((res: PartitionResponse) => { if (res.statusCode == 200) { console.log(res.elements); } }).catch((e) => { if (e.statusCode) { console.log(e.statusCode); console.log(e.body); } else { console.log(e); } }); ``` Test out various page ranges and confirm that the returned elements are within the range. Invalid ranges should throw a useful Error (pages are out of bounds, or end_page < start_page).
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New parameter
Add a client side param called
split_pdf_page_range
which takes a list of two integers,[start_page, end_page]
. Ifsplit_pdf_page
isTrue
and a range is set, slice the doc fromstart_page
up to and includingend_page
. Only this page range will be sent to the API. The subset of pages is still split up as needed.Other changes
Allow our custom hooks to properly access list parameters, so we're able to intercept
split_pdf_page_range
. We need extra handling to get list params out of the request inparse_form_data
, and to rebuild the payload increate_request_body
.Testing
Check out this branch and set up a request to your local API:
Test out various page ranges and confirm that the returned elements are within the range. Invalid ranges should throw a ValueError (pages are out of bounds, or end_page < start_page).