[Feature] SemanticF1: Consider decomposing LLM scoring from numerical P/R calculation #8283

komikat · 2025-05-26T20:21:12Z

What feature would you like to see?

Hi DSPy team,

While exploring the SemanticF1 module I noticed that it asks LLMs to directly output recall: float and precision: float values. While LLMs excel at semantic understanding, they may be less reliable at generating precise, calibrated numerical scores.

Current approach:

LLM directly outputs precision/recall floats (0-1)
Potential for subjective or "impressionistic" scoring

Suggestion:
Consider a more decomposed approach where:

LLM extracts and matches key ideas/claims from both texts
A separate step (algorithmic or constrained LLM call) calculates P/R from these mappings

This could provide more fine-grained, reliable, and interpretable semantic evaluation.

Would love to hear the team's thoughts on the design philosophy here and whether there's interest in exploring alternative semantic evaluation approaches.

Thanks for the great work!

Would you like to contribute?

Yes, I'd like to help implement this.
No, I just want to request it.

Additional Context

relevant twitter thread

The text was updated successfully, but these errors were encountered:

okhat · 2025-05-26T22:20:12Z

Yes that is fair!

komikat added the enhancement New feature or request label May 26, 2025

akshit-deccan marked this as a duplicate of #8282 May 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] SemanticF1: Consider decomposing LLM scoring from numerical P/R calculation #8283

[Feature] SemanticF1: Consider decomposing LLM scoring from numerical P/R calculation #8283

komikat commented May 26, 2025 •

edited

Loading

okhat commented May 26, 2025

Uh oh!

[Feature] SemanticF1: Consider decomposing LLM scoring from numerical P/R calculation #8283

[Feature] SemanticF1: Consider decomposing LLM scoring from numerical P/R calculation #8283

Comments

komikat commented May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What feature would you like to see?

Would you like to contribute?

Additional Context

okhat commented May 26, 2025

Uh oh!

komikat commented May 26, 2025 •

edited

Loading