Skip to content

Quantifying factual correctness with multiple modes at the same time #2065

Open
@Razvanip13

Description

@Razvanip13

Describe the Feature
Rather than choosing a single mode for quantifying factual evaluation, a user should have the freedom to request multiple modes at the same time

Why is the feature important for you?
Each mode (f1_score, precision, recall) allows you to interpret the correctness from a different angle. I consider that for a clear overview of the answer correctness, a user should interpret all results.
At the same time, calling the factual correctness multiple times is redundant, cosidering that we use the same statements processed by an LLM for quantifying the score in each mode.
By providing this feature, we can save computational time and API costs.

Additional context

Papers like https://arxiv.org/abs/2307.16877 and https://arxiv.org/pdf/2503.16161 discussed about the tradeoffs between each mode.

If the proposal is approved, I would like to be the one who implements it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions