feat: add ingest and delete tools

In addition to adding these extra tools, this change copies the gateway tool to the main tool, changing ${KNOWLEDGE_BIN} to ${GPTSCRIPT_TOOL_DIR}/bin/gptscript-tool so it can be used with the gptscript packaging. Signed-off-by: Donnie Adams <[email protected]>
gptscript-ai · Sep 11, 2024 · 14bd9a2 · 14bd9a2
1 parent 9134b45
commit 14bd9a2
Show file tree

Hide file tree

Showing 3 changed files with 159 additions and 1 deletion.
diff --git a/delete.gpt b/delete.gpt
@@ -0,0 +1,5 @@
+Name: Knowledge Deletion
+Description: Delete a dataset.
+Credential: github.com/gptscript-ai/gateway-creds as github.com/gptscript-ai/gateway
+
+#!${GPTSCRIPT_TOOL_DIR}/bin/gptscript-go-tool delete-dataset "${GPTSCRIPT_INPUT}"
diff --git a/ingest.gpt b/ingest.gpt
@@ -0,0 +1,5 @@
+Name: Knowledge Ingestion
+Description: Ingest content into a dataset.
+Credential: github.com/gptscript-ai/gateway-creds as github.com/gptscript-ai/gateway
+
+#!${GPTSCRIPT_TOOL_DIR}/bin/gptscript-go-tool ingest --prune --dataset ${GPTSCRIPT_DATASET} "${GPTSCRIPT_INPUT}"
diff --git a/tool.gpt b/tool.gpt
@@ -2,6 +2,7 @@ Name: Knowledge
 Description: Retrieve information from files uploaded by the user.
 Type: context
 Share Tools: Knowledge Retrieval
+Share Context: Knowledge Retrieval Context with *
 
 #!sys.echo
 
@@ -19,4 +20,151 @@ Credential: github.com/gptscript-ai/gateway-creds as github.com/gptscript-ai/gat
 Param: query: The query to search for in the knowledge base. It will be used for semantic similarity search, so enhance it accordingly to yield good results.
 Param: debug: (OPTIONAL) Set to "true" to enable debug mode - only use if you are explicitly asked to do so.
 
-#!${KNOWLEDGE_BIN} retrieve --dataset ${GPTSCRIPT_THREAD_ID} --dataset ${GPTSCRIPT_SCRIPT_ID} "${GPTSCRIPT_INPUT}"
+#!${GPTSCRIPT_TOOL_DIR}/bin/gptscript-go-tool retrieve --dataset ${GPTSCRIPT_THREAD_ID} --dataset ${GPTSCRIPT_SCRIPT_ID} "${GPTSCRIPT_INPUT}"
+
+---
+Name: Knowledge Retrieval Context
+Description: Add knowledge retrieved from uploaded files to the context.
+Credential: github.com/gptscript-ai/gateway-creds as github.com/gptscript-ai/gateway
+Type: context
+Input Filter: QueryRelevancy
+Output Filter: KnowledgeInstructions
+
+#!${GPTSCRIPT_TOOL_DIR}/bin/gptscript-go-tool retrieve --dataset ${GPTSCRIPT_THREAD_ID} --dataset ${GPTSCRIPT_SCRIPT_ID} "${GPTSCRIPT_INPUT}"
+
+---
+Name: QueryRelevancy
+Param: input: the message
+Context: LastUserInputOverview
+
+The user has put in the following message: "${INPUT}".
+Your task is to ensure that the input is good enough to be used as a query for the knowledge tool, i.e. good for a semantic similarity search.
+Use the following criteria to determine if the input is good enough:
+1. It's good if it contains keywords, names, entities, etc.
+2. It's good if it's written as a natural language question with proper semantic meaning.
+3. It's not good if it's too generic or vague or if it only contains stopwords and no real content.
+
+If you decide based on the above criteria that the input is good enough, return it as is without further considerations.
+If you decide that it's not good enough, check if it may be a follow-up question or in any way referring to an earlier message.
+Consider the chat history enclosed in the following <HISTORY></HISTORY> tags.
+<HISTORY>
+${GPTSCRIPT_CONTEXT}
+</HISTORY>
+If you think the input is connected to parts of the history, enrich it with details from there.
+If the chat history is irrelevant (i.e. there is no connection the current input) and you're certain that the input won't make for a good query, return a single - with no quotes or highlighting or anything else.
+Only reply with the final query and nothing else. Do not use any syntax highlighting.
+Here's are some examples:
+
+## Example 1
+Input: "Who's the employer?"
+History: "<USER_MESSAGES>[User Message #1] How much does John Doe earn?</USER_MESSAGES>"
+Output: "Who's the employer of John Doe?"
+Reasoning: The input looks like a follow-up question related to a previous query. The previous query is about the entity John Doe, so the current query is likely asking about the same entity.
+
+## Example 2
+Input: "What are its attributes?"
+History: "<USER_MESSAGES>[User Message #1] What's Oystersteel?</USER_MESSAGES>"
+Output: "What are the attributes of Oystersteel?"
+Reasoning: The input is a follow-up question related to a previous query. The previous query is about the entity Oystersteel, so the current query is likely asking about the attributes of the same entity.
+
+## Example 3
+Input: "What's the weather like?"
+History: "<USER_MESSAGES></USER_MESSAGES>"
+Output: -
+Reasoning: The input is too generic and doesn't contain any specific keywords or entities. There is no relevant history to enrich the input with, so it's not good enough to be used as a query.
+
+---
+Name: KnowledgeInstructions
+Params: output: the message
+Params: continuation: if the the conversation is still in progress
+Params: chat: if this is a chat conversation
+
+#!/usr/bin/env python3
+
+import os
+import json
+import asyncio
+
+async def main():
+    output = os.getenv('OUTPUT', '')
+    continuation = os.getenv('CONTINUATION', '') == 'true'
+    is_chat = os.getenv('CHAT', '') == 'true'
+
+    # only use the part of the output starting with "Retrieved the following"
+    if "Retrieved the following" in output:
+        output = "Retrieved the following"+output.split("Retrieved the following")[1]
+
+        msg = f"""
+Use the content within the following <KNOWLEDGE></KNOWLEDGE> tags as your learned knowledge.
+<KNOWLEDGE>
+{output}
+</KNOWLEDGE>
+If this knowledge seems irrelevant to the user query, ignore it.
+Avoid mentioning that you retrieved the information from the context or the knowledge tool.
+Only provide citations if explicitly asked for it and if the source references are available in the knowledge.
+Answer in the language that the user asked the question in.
+"""
+    else:
+        msg = "No data retrieved from knowledge base."
+
+    print(msg)
+
+
+asyncio.run(main())
+
+---
+Name: LastUserInputOverview
+Description: Get an overview over the last user inputs in the current chat
+Context: sys.chat.current
+Param: Limit: How many entries should be returned
+
+#!/usr/bin/env python3.12
+
+import json
+import asyncio
+import os
+
+
+async def main():
+    histories_str = os.getenv("GPTSCRIPT_CONTEXT", "")
+
+    if not histories_str:
+        print("<USER_MESSAGES></USER_MESSAGES>")
+        return
+
+    histories = json.loads(histories_str)
+
+    limit = int(os.getenv("LIMIT", "50"))
+
+    chat = ["<USER_MESSAGES>"]
+    msgs = []
+
+    completion = histories.get("completion", {})
+
+    i = 1
+
+    for message in completion.get("messages", []):
+        role = message.get("role", "")
+        text = " ".join(
+            [part["text"] for part in message.get("content", []) if "text" in part]
+        )
+        if role == "user" and len(text) > 0 and not text.startswith("Call "):
+            msgs.append(f"[User Message #{i}] {text}")
+            i += 1
+
+    if limit > len(msgs):
+        limit = len(msgs)
+
+    if limit == 0:
+        for msg in msgs:
+            chat.append(msg)
+    else:
+        for msg in msgs[-limit:]:
+            chat.append(msg)
+
+    chat.append("</USER_MESSAGES>")
+    print("\n".join(chat))
+
+
+asyncio.run(main())
+