[R-310] Test set generation improvements 

For Ragas 0.2, we released our third iteration of [synthetic test generation for RAG.](https://docs.ragas.io/en/stable/concepts/test_data_generation/rag/#example_1) While developing this new approach we have kept some important feedback that we gathered from earlier versions

1. Ability to generalize over more data formats and domains. 
2. Ability to customize and create scenarios related to one's own domain. 
3. Ability to persist and resample Q&A from same list of documents w/o redoing preprocessing
4. Lower cost and faster iteration. 

We have already noted some feedback from the community and we want to welcome more, hence this discussion thread.  Please feel free share your thoughts and queries on the same. This will help us greatly improve the feature in coming weeks. 

<sub>[R-310](https://linear.app/exploding-gradients/issue/R-310/test-set-generation-improvements)</sub>

## Known issues

**Feature enhancement**
- Seeding test generation with queries #1642 

**Quality enhancement**  
- #1568
- #1477
- Add support for synthesising multiturn conversation by @ahgraber 
- https://github.com/explodinggradients/ragas/issues/1614

**Documentation improvements**
- Language adaptation for test generation    
    - #1488 
    -  #1485
- Custom prompts for test generation 
     - also noted in #1568 
     - custom True/False query generation #1620 


TODO

- [x] Improve quality of generated questions
- [x] Add node filter to avoid query creation from nodes with poor information quality. 
- [x] Heading splitter - add minium and max chunk size
- [x] Make sure default settings work in small document set (PubMed summaries,etc, <500 tokens per doc)
- [x] Fix long context issues with extraction (OOC)
- [ ] Adjust transforms/query generation for small number of documents (n=1)
- [ ] Improve sampling algorithm to sample diverse scenarios from available scenarios.
- [ ] Simplify code for writing custom scenarios 
- [ ] Add documentation for prompt adaptation in test set generation
- [ ] Add documentation for writing custom scenarios and query types, For example Yes/No questions


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[R-310] Test set generation improvements #1577

Known issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[R-310] Test set generation improvements #1577

Description

Known issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions