Skip to content

598 - Fix pydantic validation error #664

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

LorenzoPaleari
Copy link
Contributor

Pydantic validation error happens when using IterativeGraph module.
Problem arises when the IterativeGraph module tries to use the same instantiation of a class for multiple execution, causing the class to not pass correctly llm into his functions.

  • Modified Search graphs to move instantiation inside IterativeGraph
  • IterativeGraph now creates multiple instances of the class we want to iterate over. This can be made better.

Changed instatiation location of iterated graph classes
@rjbks
Copy link

rjbks commented Sep 13, 2024

@LorenzoPaleari

Openai models, 4o and 4o-mini, do not throw errors but also do not respect the pydantic models:
Models:

class MatchedSchool(BaseModel):
    name: str = Field(description="The name of the input candidate medical school.")
    alternate_names: List[str] = Field(description="A list of alternate names referencing this school. Could be abbreviations, or fully spelled out names, as well as names of individual departments responsible for the Medical Curriculum within the school.")
    is_med_school: Literal["true", "false"] = Field(description="Whether or not the input school is actually a medical school or has a medical program.")
    city: Optional[str] = Field(description="The city where the matched medical school campus/program facility is located, if available.")
    state: Optional[str] = Field(description="The state (or if international, the geographic/political region within the country) where the matched medical school campus/program facility is located, if available.")
    country: Optional[str] = Field(description="The country where the matched medical school campus/program facility is located, if available.")
    source: str = Field(description="Source URL where this match was found.")

class Matches(BaseModel):
    matches: List[MatchedSchool] = Field(description="A list of matched medical schools.")

Output:

{
 "name": "Centro Universit\u00e1rio Franciscano (UNIFRA)",
 "alternate_names": [
   "UNIFRA",
   "Centro Universit\u00e1rio Franciscano"
 ],
 "is_med_school": "true",
 "city": "Santa Maria",
 "state": "Rio Grande do Sul",
 "country": "Brazil",
 "sources": [
   "https://caper.ca/sites/default/files/pdf/CAPER_MedicalSchools_September_2022.xlsx",
   "https://www.facebook.com/engmatUNIFRA/",
   "https://unifra.academia.edu/ClariceMachado"
 ]
}

Gemini 1.5 flash works well with proper formatting, but gemini 1.5 pro still has the formatting issue:

File "/opt/anaconda3/envs/med_device/lib/python3.12/site-packages/langchain_core/output_parsers/json.py", line 87, in parse_result
    raise OutputParserException(msg, llm_output=text) from e
langchain_core.exceptions.OutputParserException: Invalid json output: ```json
{'matches': []}
\```

As detailed here #598

@LorenzoPaleari
Copy link
Contributor Author

LorenzoPaleari commented Sep 13, 2024

OpenAi

Schema is not correctly passed to Final Node: MergeAnswerNode.
Fixing!

Gemini

The error is not related to a coding problem. As can be observed by your log, the llm returned an empty matches list {'matches':[]}.
This cannot be parsed with the provided schema since the schema do not allow for empty lists.

See these:

https://stackoverflow.com/questions/61468548/check-if-list-is-not-empty-with-pydantic-in-an-elegant-way

pydantic/pydantic#367

Consider changing to pydantic if you are using langchain_core.pydantic_v1. It is an old pydantic version and the referenced links will not be of any help.

@VinciGit00
Copy link
Collaborator

thank you

@VinciGit00 VinciGit00 merged commit 2ae26e9 into ScrapeGraphAI:pre/beta Sep 13, 2024
Copy link

🎉 This PR is included in version 1.19.0-beta.10 🎉

The release is available on:

Your semantic-release bot 📦🚀

Copy link

🎉 This PR is included in version 1.20.0-beta.1 🎉

The release is available on:

Your semantic-release bot 📦🚀

Copy link

🎉 This PR is included in version 1.21.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants