We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
Even if parser.from_text(x, service = 'meta') is selected, Tika extracts the content. For files that need OCR'ing this can take a lot of time.
parser.from_text(x, service = 'meta')
There are some solutions offered by Tika here to turn off OCR'ing. Since tika-python uses a Tika Server the last solution can be used:
parser.from_file(x, service = 'meta', headers = {"X-Tika-OCRskipOcr": 'true'})
This also works with service = 'all'. It returns the content if there is content that can be returned without OCR.