Skip to content

evaluate supporting replicates #2052

Open
@jamesbraza

Description

@jamesbraza

Describe the Feature

It would be nice to do something like evaluate(..., num_replicates=30) so I can calculate mean/std dev of accuracy on a benchmark EvaluationDataset.

What I mean by replicates is basically running the task N times in parallel, and computing aggregate metrics across the parallel runs.

Why is the feature important for you?

Statistical significance is important.

Additional context

I have a custom task, and am trying to compared trained models' performance on that task.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestmodule-metricsthis is part of metrics module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions