fix: Allow split page logic to process files concurrently #175
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The Issue
split_pdf_hook.py
does not support multiple concurrent files. This is because we store the split request tasks inself.coroutines_to_execute[operation_id]
, whereoperation_id
is just the string "partition". Therefore, if we send two concurrent docs using the same SDK, they'll both try to await the same list of coroutines. This could result in interleaved results, but mostly it breaks withRuntimeError: coroutine is being awaited already
, as the second request gets ready to await its requests. This will block anyone trying to use the newpartition_async
to fan out their pdfs.Note that the js/ts client also has this issue.
The fix
We need to use an actual id to index into
coroutines_to_execute
. Inbefore_request
, let's make a uuid and build up the list of coroutines for this doc. We need to pass this id toafter_success
in order to retrieve the results, so we can set it as a header on our "dummy" request that's returned to the SDK.Testing
See the new integration test. We can verify this by sending two docs serially, and then with
asyncio.gather
, and confirm that the results are the same.