Dev v0.0.3 (#25)

* debug serper query up to 100 for each call * Update v0.0.3 doc * add todo
Libr-AI · May 31, 2024 · 7b72242 · 7b72242
1 parent a48dfd3
commit 7b72242
Show file tree

Hide file tree

Showing 6 changed files with 49 additions and 33 deletions.
diff --git a/docs/README.md b/docs/README.md
@@ -17,11 +17,8 @@ We welcome contributions and feedback from the community and recommend a few bes
 * PRs should be titled descriptively, and be opened with a brief description of the scope and intent of the new contribution.
 * New features should have appropriate documentation added alongside them.
 * Aim for code maintainability, and minimize code copying.
-* Minimal test are required before submit a PR, run `script/minimal_test.py` and all test cases are required to be passed.
-* Please make sure the code style is checked and aligned:
-    ```bash
-    pre-commit run --all-files
-    ```
+<!-- * Minimal test are required before submit a PR, run `script/minimal_test.py` and all test cases are required to be passed. -->
+* Please make sure the code style is checked and aligned, see [Code Style](#code-style) for more details.
 
 ### For Feature Requests
 

diff --git a/docs/RELEASE_LOG.md b/docs/RELEASE_LOG.md
@@ -1,5 +1,20 @@
 # Release Log
 
+## v0.0.3
+
+### New Features
+1. **Keep Original Text:** Add the mapping from each claim to the position in the original text. Add `restore_claims` function to **decomposer**, to restore the decomposed claims to the original user input.
+2. **Data Structure:** Define the data structure for several intermedia processing function and final output in `utils/data_class.py`.
+3. **Speed Up:** Parallel the `restore_claims`, `identify_checkworthiness` and `query_generation` functions to speed up the pipeline.
+4. **Token Count:** Add the token count for all component.
+5. **Evidence-wise Verification:** Change the verification logic from input all evidence together within a single LLM call, to verify the claim by each evidence for each LLM call.
+6. **Factuality Value:** Remove the deterministic output, change the factuality to a number in range [0,1], calculated by the judgement with each simple evidence.
+7. **Webpage:** Redesign the webpage.
+8. **Default LLM:** Change to GPT-4o.
+
+### Bug fixed
+1. **Serper Max Queries:** Serper API allows max of 100 queries in one request, we split the queries into multiple requests if the number of queries exceeds 100.
+2. **Evidence and URL:** Link each evidence to the corresponding URL.
 
 ## v0.0.2
 

diff --git a/docs/development_guide.md b/docs/development_guide.md
@@ -1,6 +1,6 @@
 # Development Guide
 
-This documentation page provides a guide for developers to want to contribute to the Loki project, for versions v0.0.2 and later.
+This documentation page provides a guide for developers to want to contribute to the Loki project, for versions v0.0.3 and later.
 
 - [Development Guide](#development-guide)
   - [Framework Introduction](#framework-introduction)
@@ -11,11 +11,11 @@ This documentation page provides a guide for developers to want to contribute to
 
 Loki leverage state-of-the-art language models to verify the veracity of textual claims. The pipeline is designed to be modular in `factcheck/core/`, which include the following components:
 
-- **Decomposer:** Breaks down extensive texts into digestible, independent claims, setting the stage for detailed analysis.
-- **Checkworthy:** Assesses each claim's potential significance, filtering out vague or ambiguous statements to focus on those that truly matter. For example, vague claims like "MBZUAI has a vast campus" are considered unworthy because of the ambiguous nature of "vast."
-- **Query Generator:** Transforms check-worthy claims into precise queries, ready to navigate the vast expanse of the internet in search of truth.
-- **Evidence Retriever:** Ventures into the digital realm, retrieving relevant evidence that forms the foundation of informed verification.
-- **ClaimVerify:** Examines the gathered evidence, determining the veracity of each claim to uphold the integrity of information.
+- **Decomposer:** Breaks down extensive texts into digestible, independent claims, setting the stage for detailed analysis. As well as provide the mapping between the original text and the decomposed claims.
+- **Checkworthy:** Assesses each claim's potential checkworthiness, filtering out vague or ambiguous statements, as well as the statement of opinion. For example, vague claims like "MBZUAI has a vast campus" are considered unworthy because of the ambiguous nature of "vast."
+- **Query Generator:** Transforms check-worthy claims into precise queries, ready to navigate the vast expanse of the internet in search of evidences.
+- **Evidence Retriever:** Retrieve relevant evidence that forms the foundation of informed verification, currently, for open-domain questions, we now use the google search (Serper API).
+- **ClaimVerify:** Judges each evidence against the claim, determining it is supporting, refuting, or irrelevant.
 
 To support each component's functionality, Loki relies on the following utils:
 - **Language Model:** Currently, 4 out of 5 components (including: Decomposer, Checkworthy, Query Generator, and  ClaimVerify) use the language model (LLMs) to perform their tasks. The supported LLMs are defined in `factcheck/core/utils/llmclient/` and can be easily extended to support more LLMs.
@@ -71,7 +71,7 @@ As Loki continues to evolve, our development plan focuses on broadening capabili
 - **Dockerization:**
   - Packaging Loki into Docker containers to simplify deployment and scale-up operations, ensuring Loki can be easily set up and maintained across different environments.
 
-### 5. Multi-language Support
+### 5. Multi-lingual Support
 - **Language Expansion:**
   - Support for additional languages beyond English, including Chinese, Arabic, etc, to cater to a global user base.
 

diff --git a/factcheck/__init__.py b/factcheck/__init__.py
@@ -30,10 +30,11 @@ def __init__(
         checkworthy_model: str = None,
         query_generator_model: str = None,
         evidence_retrieval_model: str = None,
-        claim_verify_model: str = None,  # "gpt-3.5-turbo",
+        claim_verify_model: str = "gpt-3.5-turbo",
         api_config: dict = None,
         num_seed_retries: int = 3,
     ):
+        # TODO: better handle raw token count
         self.encoding = tiktoken.get_encoding("cl100k_base")
 
         self.prompt = prompt_mapper(prompt_name=prompt)

diff --git a/factcheck/core/CheckWorthy.py b/factcheck/core/CheckWorthy.py
@@ -25,7 +25,6 @@ def identify_checkworthiness(self, texts: list[str], num_retries: int = 3, promp
             list[str]: a list of checkworthy claims, pairwise outputs
         """
         checkworthy_claims = texts
-        # TODO: better handle checkworthiness
         joint_texts = "\n".join([str(i + 1) + ". " + j for i, j in enumerate(texts)])
 
         if prompt is None:

diff --git a/factcheck/core/Retriever/serper_retriever.py b/factcheck/core/Retriever/serper_retriever.py
@@ -62,50 +62,54 @@ def _retrieve_evidence_4_all_claim(
         evidences = [[] for _ in query_list]
 
         # get the response from serper
-        # TODO: Can send up to 100 queries once
-        serper_response = self._request_serper_api(query_list)
-
-        if serper_response is None:
-            logger.error("Serper API request error!")
-            return evidences
+        serper_responses = []
+        for i in range(0, len(query_list), 100):
+            batch_query_list = query_list[i : i + 100]
+            batch_response = self._request_serper_api(batch_query_list)
+            if batch_response is None:
+                logger.error("Serper API request error!")
+                return evidences
+            else:
+                serper_responses += batch_response.json()
 
-        # get the results for queries with an answer box
+        # get the responses for queries with an answer box
         query_url_dict = {}
         url_to_date = {}  # TODO: decide whether to use date
         _snippet_to_check = []
-        for i, (query, result) in enumerate(zip(query_list, serper_response.json())):
-            if query != result.get("searchParameters").get("q"):
-                logger.error("Serper change query from {} TO {}".format(query, result.get("searchParameters").get("q")))
+        for i, (query, response) in enumerate(zip(query_list, serper_responses)):
+            if query != response.get("searchParameters").get("q"):
+                logger.error("Serper change query from {} TO {}".format(query, response.get("searchParameters").get("q")))
 
-            if "answerBox" in result:
-                if "answer" in result["answerBox"]:
+            # TODO: provide the link for the answer box
+            if "answerBox" in response:
+                if "answer" in response["answerBox"]:
                     evidences[i] = [
                         {
-                            "text": f"{query}\nAnswer: {result['answerBox']['answer']}",
+                            "text": f"{query}\nAnswer: {response['answerBox']['answer']}",
                             "url": "Google Answer Box",
                         }
                     ]
                 else:
                     evidences[i] = [
                         {
-                            "text": f"{query}\nAnswer: {result['answerBox']['snippet']}",
+                            "text": f"{query}\nAnswer: {response['answerBox']['snippet']}",
                             "url": "Google Answer Box",
                         }
                     ]
             # TODO: currently --- if there is google answer box, we only got 1 evidence, otherwise, we got multiple, this will deminish the value of the google answer.
             else:
-                results = result.get("organic", [])[:top_k]  # Choose top 5 result
+                topk_results = response.get("organic", [])[:top_k]  # Choose top 5 response
 
                 if (len(_snippet_to_check) == 0) or (not snippet_extend_flag):
                     evidences[i] += [
-                        {"text": re.sub(r"\n+", "\n", _result["snippet"]), "url": _result["link"]} for _result in results
+                        {"text": re.sub(r"\n+", "\n", _result["snippet"]), "url": _result["link"]} for _result in topk_results
                     ]
 
                 # Save date for each url
-                url_to_date.update({result.get("link"): result.get("date") for result in results})
+                url_to_date.update({_result.get("link"): _result.get("date") for _result in topk_results})
                 # Save query-url pair, 1 query may have multiple urls
-                query_url_dict.update({query: [result.get("link") for result in results]})
-                _snippet_to_check += [result["snippet"] if "snippet" in result else "" for result in results]
+                query_url_dict.update({query: [_result.get("link") for _result in topk_results]})
+                _snippet_to_check += [_result["snippet"] if "snippet" in _result else "" for _result in topk_results]
 
         # return if there is no snippet to check or snippet_extend_flag is False
         if (len(_snippet_to_check) == 0) or (not snippet_extend_flag):