Closed as not planned
Description
For Ragas 0.2, we released our third iteration of synthetic test generation for RAG. While developing this new approach we have kept some important feedback that we gathered from earlier versions
- Ability to generalize over more data formats and domains.
- Ability to customize and create scenarios related to one's own domain.
- Ability to persist and resample Q&A from same list of documents w/o redoing preprocessing
- Lower cost and faster iteration.
We have already noted some feedback from the community and we want to welcome more, hence this discussion thread. Please feel free share your thoughts and queries on the same. This will help us greatly improve the feature in coming weeks.
Known issues
Feature enhancement
- Seeding test generation with queries Seed questions for guiding test generation #1642
Quality enhancement
- Testset generation feedback #1568
- Creating Testdataset from dictionary/json file #1477
- Add support for synthesising multiturn conversation by @ahgraber
- [R-312] support adding metadata into testset generation #1614
Documentation improvements
- Language adaptation for test generation
- Custom prompts for test generation
- also noted in Testset generation feedback #1568
- custom True/False query generation Can the generated dataset be filled in the blank or true/false questions #1620
TODO
- Improve quality of generated questions
- Add node filter to avoid query creation from nodes with poor information quality.
- Heading splitter - add minium and max chunk size
- Make sure default settings work in small document set (PubMed summaries,etc, <500 tokens per doc)
- Fix long context issues with extraction (OOC)
- Adjust transforms/query generation for small number of documents (n=1)
- Improve sampling algorithm to sample diverse scenarios from available scenarios.
- Simplify code for writing custom scenarios
- Add documentation for prompt adaptation in test set generation
- Add documentation for writing custom scenarios and query types, For example Yes/No questions