-
Notifications
You must be signed in to change notification settings - Fork 64
improve docs for attaching evaluator to dataset #837
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
@@ -2,7 +2,7 @@ | |||
sidebar_position: 2 | |||
--- | |||
|
|||
# How to bind an evaluator to a dataset in the UI | |||
# Automatically run evaluators on experiments | |||
|
|||
While you can specify evaluators to grade the results of your experiments programmatically (see [this guide](./evaluate_llm_application) for more information), you can also bind evaluators to a dataset in the UI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While you can specify evaluators to grade the results of your experiments programmatically (see [this guide](./evaluate_llm_application) for more information), you can also bind evaluators to a dataset in the UI. | |
You can grade your experiment results in two ways: **programmatically**, by specifying evaluators in your code (see [this guide](./evaluate_llm_application) for details) or you can **automatically run evaluators** defined in the UI. By binding evaluators to a dataset in the UI, they'll automatically run on your experiments. These UI-configured evaluators complement any you've set up via the SDK. We support both LLM-based and custom Python code evaluators. |
|
||
 | ||
Once you have saved your new evaluator, **subsequent** experiment run from the dataset will now be evaluated by the evaluator you configured. Note that in the below image, each run in the experiment has a "correctness" score. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once you have saved your new evaluator, **subsequent** experiment run from the dataset will now be evaluated by the evaluator you configured. Note that in the below image, each run in the experiment has a "correctness" score. | |
Once you have created an evaluator, **subsequent** experiments on that dataset will be automatically graded by the evaluator(s) you configured. |
Would be good to add a "Next Steps" section at the end of the doc. Can link out to: |
No description provided.