Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing a dictionary of lists problem #27453

Open
5 tasks done
snassimr opened this issue Oct 18, 2024 · 7 comments
Open
5 tasks done

Parsing a dictionary of lists problem #27453

snassimr opened this issue Oct 18, 2024 · 7 comments
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature

Comments

@snassimr
Copy link

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

I want to get "a" as a key in ppp , but code (using Dict) below fails:

import os
 from pydantic import BaseModel, Field
 from langchain_openai import ChatOpenAI

 model = ChatOpenAI(model="gpt-4o-mini-2024-07-18", temperature=0.0)

 class A(BaseModel):
   a_1: str
   a_2: str
   r: str

 class B(BaseModel):
   b_1: str
   b_2: str
   r: str
 
 class C(BaseModel):
   ccc:List[A]
   ppp: Dict[str, List[B]]


 structured_llm = model.with_structured_output(C)

 response = structured_llm.invoke(prompt)

Error Message and Stack Trace (if applicable)

ValidationError: 1 validation error for C
ppp
Field required [type=missing, input_value={'ccc': [{'a_1': 'Price',...tant to Battery Life'}]}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.9/v/missing

Description

I have a code that works:

import os
from pydantic import BaseModel, Field
from langchain_openai import ChatOpenAI

model = ChatOpenAI(model="gpt-4o-mini-2024-07-18", temperature=0.0)

class A(BaseModel):
a_1: str
a_2: str
r: str

class B(BaseModel):
a: str
b_1: str
b_2: str
r: str

class C(BaseModel):
ccc:List[A]
ppp: List[B]

structured_llm = model.with_structured_output(C)

response = structured_llm.invoke(prompt)

System Info

System Information

OS: Linux
OS Version: #1 SMP PREEMPT_DYNAMIC Thu Jun 27 21:05:47 UTC 2024
Python Version: 3.10.12 (main, Sep 11 2024, 15:47:36) [GCC 11.4.0]

Package Information

langchain_core: 0.2.41
langchain: 0.2.16
langchain_community: 0.2.17
langsmith: 0.1.136
langchain_openai: 0.1.21
langchain_text_splitters: 0.2.4

Optional packages not installed

langgraph
langserve

Other Dependencies

aiohttp: 3.10.10
async-timeout: 4.0.3
dataclasses-json: 0.6.7
httpx: 0.27.2
jsonpatch: 1.33
numpy: 1.26.4
openai: 1.52.0
orjson: 3.10.7
packaging: 24.1
pydantic: 2.9.2
PyYAML: 6.0.2
requests: 2.32.3
requests-toolbelt: 1.0.0
SQLAlchemy: 2.0.35
tenacity: 8.5.0
tiktoken: 0.8.0
typing-extensions: 4.12.2

@dosubot dosubot bot added the 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature label Oct 18, 2024
@eyurtsev
Copy link
Collaborator

Just a quick glance -- but this does not appear to be a bug in langchain, but an issue with the chat model failing to produce the correct output

I'd suggest adding reference examples into the prompt to help the model output the correct thing

@ethanglide
Copy link

ethanglide commented Oct 18, 2024

For some reason, it seems that every time the LLM is prompted with generating some type of dictionary, it is not included in the response.

Consider this simple code and some variations:

from typing import Dict, List
from pydantic import BaseModel
from langchain_openai import ChatOpenAI

model = ChatOpenAI(model="gpt-4o-mini-2024-07-18", temperature=0.0)

class TestModel(BaseModel):
    # variations here

structured_llm = model.with_structured_output(TestModel)

response = structured_llm.invoke('prompt')
print(response)

When we define TestModel as follows:

output: int

Here is the output (I am also outputting the openai tool that gets bound to the model):

{'type': 'function', 'function': {'name': 'TestModel', 'description': '', 'parameters': {'properties': {'output': {'type': 'integer'}}, 'required': ['output'], 'type': 'object'}}}
output=5

Even if we define TestModel like this:

output: List

We get this:

{'type': 'function', 'function': {'name': 'TestModel', 'description': '', 'parameters': {'properties': {'output': {'items': {}, 'type': 'array'}}, 'required': ['output'], 'type': 'object'}}}
output=['What is your favorite book and why?', 'If you could travel anywhere in the world, where would you go and what would you do there?', 'What is a skill you would like to learn and why?', 'Describe a memorable experience you had in the past year.', 'If you could have dinner with any historical figure, who would it be and what would you ask them?']

But as soon as TestModel gets defined as so:

output: Dict

Then all of the sudden the model does not respond with anything!

{'type': 'function', 'function': {'name': 'TestModel', 'description': '', 'parameters': {'properties': {'output': {'type': 'object'}}, 'required': ['output'], 'type': 'object'}}}
pydantic_core._pydantic_core.ValidationError: 1 validation error for TestModel
output
  Field required [type=missing, input_value={}, input_type=dict]

If there is a Dict somwhere, as well as other keys, then those other keys will be included in the output but not the dictionary:

output: Dict
output_2: int

Gives:

{'type': 'function', 'function': {'name': 'TestModel', 'description': '', 'parameters': {'properties': {'output': {'additionalProperties': {'type': 'integer'}, 'type': 'object'}, 'output_2': {'type': 'integer'}}, 'required': ['output', 'output_2'], 'type': 'object'}}}
pydantic_core._pydantic_core.ValidationError: 1 validation error for TestModel
output
  Field required [type=missing, input_value={'output_2': 5}, input_type=dict]

Why are these fields getting ignored? Is this an issue with the model or what?

@snassimr
Copy link
Author

snassimr commented Oct 18, 2024

Actually I found a format that works more suitable for me. Anyway I can turn from on format to Dict with one line of Python code. @ethanglide does it work with examples and not just 'prompt' string ? I don't have too much experience providing examples for the case.

@ethanglide
Copy link

Unfortunately I am not able to get it to work with examples either, assuming I did those correctly.

from typing import Dict, List
from pydantic import BaseModel
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, FewShotChatMessagePromptTemplate

model = ChatOpenAI(model="gpt-4o-mini-2024-07-18", temperature=0.0)

class TestModel(BaseModel):
    output: Dict[str, str]

structured_llm = model.with_structured_output(TestModel)

examples = [
    {
        "input": "What is the capital of France?",
        "output": '{"output": "Paris"}'
    },
    {
        "input": "What is the capital of Germany?",
        "output": '{"output": "Berlin"}'
    },
    {
        "input": "What is the capital of Italy?",
        "output": '{"output": "Rome"}'
    }
]

example_prompt = ChatPromptTemplate.from_messages(
    [
        ("human", "{input}"),
        ("ai", "{output}"),
    ]
)

few_shot_prompt = FewShotChatMessagePromptTemplate(
    example_prompt=example_prompt,
    examples=examples,
)

final_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "You are a geography expert."),
        few_shot_prompt,
        ("human", "{input}"),
    ]
)

chain = final_prompt | structured_llm

response = chain.invoke({"input": "What is the capital of Lithuania?"})
print(response)

This gives:

{'type': 'function', 'function': {'name': 'TestModel', 'description': '', 'parameters': {'properties': {'output': {'additionalProperties': {'type': 'string'}, 'type': 'object'}}, 'required': ['output'], 'type': 'object'}}}
pydantic_core._pydantic_core.ValidationError: 1 validation error for TestModel
output
  Field required [type=missing, input_value={}, input_type=dict]

Which stays consistent with the issues I had above.

Of course, there are ways around this, and we should be asking ourselves whether or not having the model respond with arbitrarily structured dictionaries with arbitrary amounts of keys is something that should be done. But it really is strange that it works with Lists (they will even respond with lists of arbitrary size with arbitrary objects if you don't specify what kind of list it is) and not with Dicts.

@ethanglide
Copy link

@eyurtsev what do you think about the above?

@eyurtsev
Copy link
Collaborator

The examples should be ai messages with tool calls not just content since you're using the tool calling API. Check the how to guides for tool calling (apologies on 📱 right now)

Should look like

System, human, ai, tool, human, ai, tool

Or else squeeze the examples into the system prompt

@ethanglide
Copy link

Thank you for the guidance, I haven't quite been able to make it work but I'm sure it is possible.

Program:

from typing import Dict
from pydantic import BaseModel
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.messages import HumanMessage, AIMessage, ToolMessage
from langchain_core.runnables import RunnablePassthrough

model = ChatOpenAI(model="gpt-4o-mini-2024-07-18", temperature=0.0)

class TestModel(BaseModel):
    output: Dict

structured_llm = model.with_structured_output(TestModel)

examples = [
    HumanMessage("What is the capital of France?"),
    AIMessage(
        '',
        name='geography_assistant',
        tool_calls=[
            {
                'name': 'TestModel',
                'args': {"capital": "Paris"},
                'id': '1'
            },
        ],
    ),
    ToolMessage({"output": {"capital": "Paris"}}, tool_call_id='1'),
    HumanMessage("What is the capital of Germany?"),
    AIMessage(
        '',
        name='geography_assistant',
        tool_calls=[
            {
                'name': 'TestModel',
                'args': {"capital": "Berlin"},
                'id': '2'
            },
        ],
    ),
    ToolMessage({"output": {"capital": "Berlin"}}, tool_call_id='2'),
    HumanMessage("What is the capital of Italy?"),
    AIMessage(
        '',
        name='geography_assistant',
        tool_calls=[
            {
                'name': 'TestModel',
                'args': {"capital": "Rome"},
                'id': '3'
            },
        ],
    ),
    ToolMessage({"output": {"capital": "Rome"}}, tool_call_id='3'),
]

final_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "You are a geography expert."),
        *examples,
        ("human", "{input}"),
    ]
)

chain = {'input': RunnablePassthrough()} | final_prompt | structured_llm

response = chain.invoke('What is the capital of Lithuania?')
print(response)

Output:

pydantic_core._pydantic_core.ValidationError: 1 validation error for TestModel
output
  Field required [type=missing, input_value={'capital': 'Vilnius'}, input_type=dict]

As you can see dicts are now being input to the tool which is great. Its not what I need but I am sure that with enough toying around and using a more real-world example I would be able to make this work. But the whole problem is just that you are not able to simply put Dict as the type and call it a day, the model will not respond with arbitrary objects it will try to just pass strings. Up to you to determine whether or not that is a real issue, I doubt it would get in the way of anyone's development.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature
Projects
None yet
Development

No branches or pull requests

3 participants