[Feature] SemanticF1: Consider decomposing LLM scoring from numerical P/R calculation #8283
Open
1 of 2 tasks
Labels
enhancement
New feature or request
Uh oh!
There was an error while loading. Please reload this page.
What feature would you like to see?
Hi DSPy team,
While exploring the SemanticF1 module I noticed that it asks LLMs to directly output
recall: float
andprecision: float
values. While LLMs excel at semantic understanding, they may be less reliable at generating precise, calibrated numerical scores.Current approach:
Suggestion:
Consider a more decomposed approach where:
This could provide more fine-grained, reliable, and interpretable semantic evaluation.
Would love to hear the team's thoughts on the design philosophy here and whether there's interest in exploring alternative semantic evaluation approaches.
Thanks for the great work!
Would you like to contribute?
Additional Context
relevant twitter thread
The text was updated successfully, but these errors were encountered: