Remove words from DQA output #1041

Wauplin · 2024-11-18T16:08:30Z

Related to (unfortunately) private slack convo (here).

Hey, for document-question-answering, the JS spec has "words", which is:
The index of each word/box pair that is in the answer
In transformers we don't output this at all. I'm not really sure how it's generated/used in the spec, since transformers already had start, end, answer, etc.

and then

Update: I think the words output never exists because of a pipeline bug. The pipeline has two code paths in several functions, one for VisionEncoderDecoder models and one for everything else.
In preprocess(), the VisionEncoderDecoder path always sets words to None which means it cannot be passed through to the output.
In postprocess() , the non-VisionEncoderDecoder path calls postprocess_extractive_qa. However, this function rewrites the answers dict without a words key
In other words, the preprocess() method deletes words for VisionEncoderDecoder models and the postprocess() method deletes it for everything else, so it always gets deleted! The right solution is just to remove it from the docstring and the JS spec.

Also to mention: DQA is only served from transformers in the Inference API.

Wauplin · 2024-11-18T16:10:01Z

(closing as duplicate of #1040)

Remove words from DQA output

cb9b36d

Wauplin requested a review from Rocketknight1 November 18, 2024 16:08

Wauplin requested review from SBrandeis, gary149, julien-c, pcuenca and ngxson as code owners November 18, 2024 16:08

Wauplin closed this Nov 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove words from DQA output #1041

Remove words from DQA output #1041

Uh oh!

Wauplin commented Nov 18, 2024

Uh oh!

Wauplin commented Nov 18, 2024

Uh oh!

Uh oh!

Remove words from DQA output #1041

Remove words from DQA output #1041

Uh oh!

Conversation

Wauplin commented Nov 18, 2024

Uh oh!

Wauplin commented Nov 18, 2024

Uh oh!

Uh oh!