Description
Describe the Feature
Rather than choosing a single mode for quantifying factual evaluation, a user should have the freedom to request multiple modes at the same time
Why is the feature important for you?
Each mode (f1_score, precision, recall) allows you to interpret the correctness from a different angle. I consider that for a clear overview of the answer correctness, a user should interpret all results.
At the same time, calling the factual correctness multiple times is redundant, cosidering that we use the same statements processed by an LLM for quantifying the score in each mode.
By providing this feature, we can save computational time and API costs.
Additional context
Papers like https://arxiv.org/abs/2307.16877 and https://arxiv.org/pdf/2503.16161 discussed about the tradeoffs between each mode.
If the proposal is approved, I would like to be the one who implements it.