Skip to content

[Feature] SemanticF1: Consider decomposing LLM scoring from numerical P/R calculation #8283

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1 of 2 tasks
komikat opened this issue May 26, 2025 · 1 comment
Open
1 of 2 tasks
Labels
enhancement New feature or request

Comments

@komikat
Copy link

komikat commented May 26, 2025

What feature would you like to see?

Hi DSPy team,

While exploring the SemanticF1 module I noticed that it asks LLMs to directly output recall: float and precision: float values. While LLMs excel at semantic understanding, they may be less reliable at generating precise, calibrated numerical scores.

Current approach:

  • LLM directly outputs precision/recall floats (0-1)
  • Potential for subjective or "impressionistic" scoring

Suggestion:
Consider a more decomposed approach where:

  1. LLM extracts and matches key ideas/claims from both texts
  2. A separate step (algorithmic or constrained LLM call) calculates P/R from these mappings

This could provide more fine-grained, reliable, and interpretable semantic evaluation.

Would love to hear the team's thoughts on the design philosophy here and whether there's interest in exploring alternative semantic evaluation approaches.

Thanks for the great work!

Would you like to contribute?

  • Yes, I'd like to help implement this.
  • No, I just want to request it.

Additional Context

relevant twitter thread

@komikat komikat added the enhancement New feature or request label May 26, 2025
@akshit-deccan akshit-deccan marked this as a duplicate of #8282 May 26, 2025
@okhat
Copy link
Collaborator

okhat commented May 26, 2025

Yes that is fair!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants