Skip to content

I want to use Huggingface model for test data generation, but getting error: ERROR:ragas.testset.transforms.engine:unable to apply transformation: Node 3da710f8-fc20-40ef-97bf-7f11fb7be538 has no summary_embedding #1720

Open
@wanjeakshay

Description

@wanjeakshay

Describe the Feature
As cost for test data generation is too high, thats why I want to use huggingface opensource model for test data generation. But it is not compatible with current version.

Why is the feature important for you?
from langchain.llms import HuggingFacePipeline
from transformers import pipeline

Load HuggingFace model using transformers pipeline

generator = pipeline("text-generation", model="gpt2")

Create a HuggingFacePipeline LLM instance

llm = HuggingFacePipeline(pipeline=generator)

from langchain.embeddings import HuggingFaceEmbeddings
from sentence_transformers import SentenceTransformer

Load the HuggingFace embedding model (e.g., a sentence transformer model)

model_name = "all-MiniLM-L6-v2" # A common model for sentence embeddings

Create an embedding instance using HuggingFaceEmbeddings, providing the model_name

embedding = HuggingFaceEmbeddings(model_name=model_name)

from langchain_core.language_models import BaseLanguageModel
from langchain_core.embeddings import Embeddings

make sure to wrap them with wrappers

from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper

langchain_llm = LangchainLLMWrapper(llm)
langchain_embeddings = LangchainEmbeddingsWrapper(embedding)

! git clone https://huggingface.co/datasets/explodinggradients/prompt-engineering-guide-papers

from langchain_community.document_loaders import DirectoryLoader
loader = DirectoryLoader("./prompt-engineering-guide-papers/", glob="*.pdf")
documents = loader.load()

for document in documents:
document.metadata["filename"] = document.metadata["source"]

docs = [doc for doc in documents if len(doc.page_content.split()) > 5000]

from ragas.testset import TestsetGenerator

generator with openai models

generator_llm = langchain_llm
critic_llm = langchain_llm
embeddings = langchain_embeddings

generator = TestsetGenerator.from_langchain(llm=generator_llm, embedding_model=embeddings)

generate testset

testset = generator.generate_with_langchain_docs(documents[:2], testset_size=3)

for this code I am getting an error message as follows:

AttributeError: 'LangchainLLMWrapper' object has no attribute 'agenerate_prompt'
ERROR:ragas.testset.transforms.engine:unable to apply transformation: 'LangchainLLMWrapper' object has no attribute 'agenerate_prompt'
ERROR:ragas.testset.transforms.engine:unable to apply transformation: 'LangchainLLMWrapper' object has no attribute 'agenerate_prompt'
ERROR:ragas.testset.transforms.engine:unable to apply transformation: 'headlines' property not found in this node
ERROR:ragas.testset.transforms.engine:unable to apply transformation: 'headlines' property not found in this node
ERROR:ragas.testset.transforms.engine:unable to apply transformation: 'LangchainLLMWrapper' object has no attribute 'agenerate_prompt'
ERROR:ragas.testset.transforms.engine:unable to apply transformation: 'LangchainLLMWrapper' object has no attribute 'agenerate_prompt'
ERROR:ragas.testset.transforms.engine:unable to apply transformation: node.property('summary') must be a string, found '<class 'NoneType'>'
ERROR:ragas.testset.transforms.engine:unable to apply transformation: node.property('summary') must be a string, found '<class 'NoneType'>'
ERROR:ragas.testset.transforms.engine:unable to apply transformation: Node 3da710f8-fc20-40ef-97bf-7f11fb7be538 has no summary_embedding

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingenhancementNew feature or requestmodule-testsetgenModule testset generation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions