Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: LLMRerank Default Prompt will fail and cause an error if an unrelated line containing a comma is included in the answer #17352

Open
okirmis opened this issue Dec 22, 2024 · 1 comment · May be fixed by #17353
Labels
bug Something isn't working triage Issue needs to be triaged/prioritized

Comments

@okirmis
Copy link

okirmis commented Dec 22, 2024

Bug Description

Similar to #11092 there is still an IndexError being thrown when the LLM's answer in LLMRerank contains at least one unrelated line which also contains a comma, in my case the answer was:

Doc: 5, Relevance: 9
Doc: 1, Relevance: 8

The document with the highest relevance score is Doc: 5, as it directly answers the question about the contract. 

While the first two lines and the third (empty) line are processed correctly, the third line throws an error in default_parse_choice_select_answer_fn in the following code as line_tokens[1] does not contain a colon:

        _answer_relevance = re.findall(r"\d+", line_tokens[1].split(":")[1].strip())[0]
        answer_relevances.append(float(_answer_relevance))

Version

0.12.8

Steps to Reproduce

The can be reproduced using the example code mentioned in https://docs.llamaindex.ai/en/stable/examples/node_postprocessor/LLMReranker-Gatsby/ but I cannot share the exact input data.

But manually calling default_parse_choice_select_answer_fn with the LLM answer mentioned will throw an IndexError, e.g.:

answer = """Doc: 5, Relevance: 9
Doc: 1, Relevance: 8

The document with the highest relevance score is Doc: 5, as it directly answers the question about the contract."""
default_parse_choice_select_answer_fn(answer, 5)

Relevant Logs/Tracbacks

Traceback (most recent call last):
  File "<REDACTED>/app.py", line 67, in <module>
    response = query_engine.query(query)
               ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.miniconda3/lib/python3.12/site-packages/llama_index/core/instrumentation/dispatcher.py", line 321, in wrapper
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.miniconda3/lib/python3.12/site-packages/llama_index/core/base/base_query_engine.py", line 52, in query
    query_result = self._query(str_or_query_bundle)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.miniconda3/lib/python3.12/site-packages/llama_index/core/instrumentation/dispatcher.py", line 321, in wrapper
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.miniconda3/lib/python3.12/site-packages/llama_index/core/query_engine/retriever_query_engine.py", line 178, in _query
    nodes = self.retrieve(query_bundle)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.miniconda3/lib/python3.12/site-packages/llama_index/core/query_engine/retriever_query_engine.py", line 134, in retrieve
    return self._apply_node_postprocessors(nodes, query_bundle=query_bundle)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.miniconda3/lib/python3.12/site-packages/llama_index/core/query_engine/retriever_query_engine.py", line 127, in _apply_node_postprocessors
    nodes = node_postprocessor.postprocess_nodes(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.miniconda3/lib/python3.12/site-packages/llama_index/core/postprocessor/types.py", line 54, in postprocess_nodes
    return self._postprocess_nodes(nodes, query_bundle)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.miniconda3/lib/python3.12/site-packages/llama_index/core/instrumentation/dispatcher.py", line 321, in wrapper
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.miniconda3/lib/python3.12/site-packages/llama_index/core/postprocessor/llm_rerank.py", line 96, in _postprocess_nodes
    raw_choices, relevances = self._parse_choice_select_answer_fn(
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.miniconda3/lib/python3.12/site-packages/llama_index/core/indices/utils.py", line 119, in default_parse_choice_select_answer_fn
    _answer_relevance = re.findall(r"\d+", line_tokens[1].split(":")[1].strip())[0]
                                           ~~~~~~~~~~~~~~~~~~~~~~~~~^^^
IndexError: list index out of range
@okirmis okirmis added bug Something isn't working triage Issue needs to be triaged/prioritized labels Dec 22, 2024
Copy link

dosubot bot commented Dec 22, 2024

Hey there, @okirmis! I'm here to help you with any bugs, questions, or contributions you have in mind. Let's work together to solve the issue you're facing.

To resolve the IndexError in the default_parse_choice_select_answer_fn function when the LLM's answer contains an unrelated line with a comma, you can modify the function to handle cases where the line does not contain the expected format. Specifically, ensure that the code checks for the presence of a colon (:) before attempting to split and access elements. Here's a possible solution:

def default_parse_choice_select_answer_fn(
    answer: str, num_choices: int, raise_error: bool = False
) -> Tuple[List[int], List[float]]:
    """Default parse choice select answer function."""
    answer_lines = answer.split("\n")
    answer_nums = []
    answer_relevances = []
    for answer_line in answer_lines:
        line_tokens = answer_line.split(",")
        if len(line_tokens) != 2:
            if not raise_error:
                continue
            else:
                raise ValueError(
                    f"Invalid answer line: {answer_line}. "
                    "Answer line must be of the form: "
                    "answer_num: <int>, answer_relevance: <float>"
                )
        try:
            # Check if the token contains a colon before splitting
            if ':' not in line_tokens[0]:
                if raise_error:
                    raise ValueError(
                        f"Invalid answer line: {answer_line}. "
                        "Answer line must be of the form: "
                        "answer_num: <int>, answer_relevance: <float>"
                    )
                continue
            answer_num = int(line_tokens[0].split(":")[1].strip())
        except (IndexError, ValueError) as e:
            if not raise_error:
                continue
            else:
                raise ValueError(
                    f"Invalid answer line: {answer_line}. "
                    "Answer line must be of the form: "
                    "answer_num: <int>, answer_relevance: <float>"
                )
        if answer_num > num_choices:
            continue
        # Check if the token contains a colon before splitting
        if ':' not in line_tokens[1]:
            if raise_error:
                raise ValueError(
                    f"Invalid answer line: {answer_line}. "
                    "Answer line must be of the form: "
                    "answer_num: <int>, answer_relevance: <float>"
                )
            continue
        _answer_relevance = re.findall(r"\d+", line_tokens[1].split(":")[1].strip())[0]
        answer_relevances.append(float(_answer_relevance))
    return answer_nums, answer_relevances

This modification ensures that the function checks for the presence of a colon before attempting to split the string, which helps prevent IndexError when the line format is incorrect [1].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

okirmis added a commit to okirmis/llama_index that referenced this issue Dec 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage Issue needs to be triaged/prioritized
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant