Skip to content

[R-310] Test set generation improvements  #1577

Closed as not planned
Closed as not planned
@shahules786

Description

@shahules786

For Ragas 0.2, we released our third iteration of synthetic test generation for RAG. While developing this new approach we have kept some important feedback that we gathered from earlier versions

  1. Ability to generalize over more data formats and domains.
  2. Ability to customize and create scenarios related to one's own domain.
  3. Ability to persist and resample Q&A from same list of documents w/o redoing preprocessing
  4. Lower cost and faster iteration.

We have already noted some feedback from the community and we want to welcome more, hence this discussion thread. Please feel free share your thoughts and queries on the same. This will help us greatly improve the feature in coming weeks.

R-310

Known issues

Feature enhancement

Quality enhancement

Documentation improvements

TODO

  • Improve quality of generated questions
  • Add node filter to avoid query creation from nodes with poor information quality.
  • Heading splitter - add minium and max chunk size
  • Make sure default settings work in small document set (PubMed summaries,etc, <500 tokens per doc)
  • Fix long context issues with extraction (OOC)
  • Adjust transforms/query generation for small number of documents (n=1)
  • Improve sampling algorithm to sample diverse scenarios from available scenarios.
  • Simplify code for writing custom scenarios
  • Add documentation for prompt adaptation in test set generation
  • Add documentation for writing custom scenarios and query types, For example Yes/No questions

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions