Skip to content
This repository has been archived by the owner on Oct 30, 2024. It is now read-only.

Commit

Permalink
feat: add ingest and delete tools
Browse files Browse the repository at this point in the history
In addition to adding these extra tools, this change copies the gateway
tool to the main tool, changing ${KNOWLEDGE_BIN} to
${GPTSCRIPT_TOOL_DIR}/bin/gptscript-tool so it can be used with the
gptscript packaging.

Signed-off-by: Donnie Adams <[email protected]>
  • Loading branch information
thedadams authored and iwilltry42 committed Sep 11, 2024
1 parent 9134b45 commit 14bd9a2
Show file tree
Hide file tree
Showing 3 changed files with 159 additions and 1 deletion.
5 changes: 5 additions & 0 deletions delete.gpt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Name: Knowledge Deletion
Description: Delete a dataset.
Credential: github.com/gptscript-ai/gateway-creds as github.com/gptscript-ai/gateway

#!${GPTSCRIPT_TOOL_DIR}/bin/gptscript-go-tool delete-dataset "${GPTSCRIPT_INPUT}"
5 changes: 5 additions & 0 deletions ingest.gpt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Name: Knowledge Ingestion
Description: Ingest content into a dataset.
Credential: github.com/gptscript-ai/gateway-creds as github.com/gptscript-ai/gateway

#!${GPTSCRIPT_TOOL_DIR}/bin/gptscript-go-tool ingest --prune --dataset ${GPTSCRIPT_DATASET} "${GPTSCRIPT_INPUT}"
150 changes: 149 additions & 1 deletion tool.gpt
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ Name: Knowledge
Description: Retrieve information from files uploaded by the user.
Type: context
Share Tools: Knowledge Retrieval
Share Context: Knowledge Retrieval Context with *

#!sys.echo

Expand All @@ -19,4 +20,151 @@ Credential: github.com/gptscript-ai/gateway-creds as github.com/gptscript-ai/gat
Param: query: The query to search for in the knowledge base. It will be used for semantic similarity search, so enhance it accordingly to yield good results.
Param: debug: (OPTIONAL) Set to "true" to enable debug mode - only use if you are explicitly asked to do so.

#!${KNOWLEDGE_BIN} retrieve --dataset ${GPTSCRIPT_THREAD_ID} --dataset ${GPTSCRIPT_SCRIPT_ID} "${GPTSCRIPT_INPUT}"
#!${GPTSCRIPT_TOOL_DIR}/bin/gptscript-go-tool retrieve --dataset ${GPTSCRIPT_THREAD_ID} --dataset ${GPTSCRIPT_SCRIPT_ID} "${GPTSCRIPT_INPUT}"

---
Name: Knowledge Retrieval Context
Description: Add knowledge retrieved from uploaded files to the context.
Credential: github.com/gptscript-ai/gateway-creds as github.com/gptscript-ai/gateway
Type: context
Input Filter: QueryRelevancy
Output Filter: KnowledgeInstructions

#!${GPTSCRIPT_TOOL_DIR}/bin/gptscript-go-tool retrieve --dataset ${GPTSCRIPT_THREAD_ID} --dataset ${GPTSCRIPT_SCRIPT_ID} "${GPTSCRIPT_INPUT}"

---
Name: QueryRelevancy
Param: input: the message
Context: LastUserInputOverview

The user has put in the following message: "${INPUT}".
Your task is to ensure that the input is good enough to be used as a query for the knowledge tool, i.e. good for a semantic similarity search.
Use the following criteria to determine if the input is good enough:
1. It's good if it contains keywords, names, entities, etc.
2. It's good if it's written as a natural language question with proper semantic meaning.
3. It's not good if it's too generic or vague or if it only contains stopwords and no real content.

If you decide based on the above criteria that the input is good enough, return it as is without further considerations.
If you decide that it's not good enough, check if it may be a follow-up question or in any way referring to an earlier message.
Consider the chat history enclosed in the following <HISTORY></HISTORY> tags.
<HISTORY>
${GPTSCRIPT_CONTEXT}
</HISTORY>
If you think the input is connected to parts of the history, enrich it with details from there.
If the chat history is irrelevant (i.e. there is no connection the current input) and you're certain that the input won't make for a good query, return a single - with no quotes or highlighting or anything else.
Only reply with the final query and nothing else. Do not use any syntax highlighting.
Here's are some examples:

## Example 1
Input: "Who's the employer?"
History: "<USER_MESSAGES>[User Message #1] How much does John Doe earn?</USER_MESSAGES>"
Output: "Who's the employer of John Doe?"
Reasoning: The input looks like a follow-up question related to a previous query. The previous query is about the entity John Doe, so the current query is likely asking about the same entity.

## Example 2
Input: "What are its attributes?"
History: "<USER_MESSAGES>[User Message #1] What's Oystersteel?</USER_MESSAGES>"
Output: "What are the attributes of Oystersteel?"
Reasoning: The input is a follow-up question related to a previous query. The previous query is about the entity Oystersteel, so the current query is likely asking about the attributes of the same entity.

## Example 3
Input: "What's the weather like?"
History: "<USER_MESSAGES></USER_MESSAGES>"
Output: -
Reasoning: The input is too generic and doesn't contain any specific keywords or entities. There is no relevant history to enrich the input with, so it's not good enough to be used as a query.

---
Name: KnowledgeInstructions
Params: output: the message
Params: continuation: if the the conversation is still in progress
Params: chat: if this is a chat conversation

#!/usr/bin/env python3

import os
import json
import asyncio

async def main():
output = os.getenv('OUTPUT', '')
continuation = os.getenv('CONTINUATION', '') == 'true'
is_chat = os.getenv('CHAT', '') == 'true'

# only use the part of the output starting with "Retrieved the following"
if "Retrieved the following" in output:
output = "Retrieved the following"+output.split("Retrieved the following")[1]

msg = f"""
Use the content within the following <KNOWLEDGE></KNOWLEDGE> tags as your learned knowledge.
<KNOWLEDGE>
{output}
</KNOWLEDGE>
If this knowledge seems irrelevant to the user query, ignore it.
Avoid mentioning that you retrieved the information from the context or the knowledge tool.
Only provide citations if explicitly asked for it and if the source references are available in the knowledge.
Answer in the language that the user asked the question in.
"""
else:
msg = "No data retrieved from knowledge base."

print(msg)


asyncio.run(main())

---
Name: LastUserInputOverview
Description: Get an overview over the last user inputs in the current chat
Context: sys.chat.current
Param: Limit: How many entries should be returned

#!/usr/bin/env python3.12

import json
import asyncio
import os


async def main():
histories_str = os.getenv("GPTSCRIPT_CONTEXT", "")

if not histories_str:
print("<USER_MESSAGES></USER_MESSAGES>")
return

histories = json.loads(histories_str)

limit = int(os.getenv("LIMIT", "50"))

chat = ["<USER_MESSAGES>"]
msgs = []

completion = histories.get("completion", {})

i = 1

for message in completion.get("messages", []):
role = message.get("role", "")
text = " ".join(
[part["text"] for part in message.get("content", []) if "text" in part]
)
if role == "user" and len(text) > 0 and not text.startswith("Call "):
msgs.append(f"[User Message #{i}] {text}")
i += 1

if limit > len(msgs):
limit = len(msgs)

if limit == 0:
for msg in msgs:
chat.append(msg)
else:
for msg in msgs[-limit:]:
chat.append(msg)

chat.append("</USER_MESSAGES>")
print("\n".join(chat))


asyncio.run(main())

0 comments on commit 14bd9a2

Please sign in to comment.