Skip to content
This repository has been archived by the owner on Oct 30, 2024. It is now read-only.

Commit

Permalink
chore: new basic query tool used by otto
Browse files Browse the repository at this point in the history
  • Loading branch information
ibuildthecloud committed Oct 14, 2024
1 parent 82145e5 commit a5fe669
Showing 1 changed file with 3 additions and 184 deletions.
187 changes: 3 additions & 184 deletions tool.gpt
Original file line number Diff line number Diff line change
@@ -1,188 +1,7 @@
Name: Knowledge
Description: Retrieve information from files uploaded by the user.
Type: context
Share Tools: Knowledge Retrieval
Share Context: Knowledge Retrieval Context with *

#!sys.echo

You have access to a RAG tool named "Knowledge Retrieval".
It will work with files previously uploaded by the user.
Use it to answer questions from the user.
Only consider this tool if the knowledge from your context doesn't already hold the answer to the user query.
Give citations or source references only if asked for it and that doesn't conflict with any other instructions you've received in your system prompt.
If the answers that the knowledge tool returns seem irrelevant, you may use another tool.

---
Name: Knowledge Retrieval
Description: Retrieve information from files uploaded by the user.
Credential: github.com/gptscript-ai/credentials/model-provider
Param: query: The query to search for in the knowledge base. It will be used for semantic similarity search, so enhance it accordingly to yield good results.
Param: debug: (OPTIONAL) Set to "true" to enable debug mode - only use if you are explicitly asked to do so.

#!${GPTSCRIPT_TOOL_DIR}/bin/gptscript-go-tool retrieve ${GPTSCRIPT_THREAD_ID:+--dataset=${GPTSCRIPT_THREAD_ID}} ${GPTSCRIPT_SCRIPT_ID:+--dataset=${GPTSCRIPT_SCRIPT_ID}} "${GPTSCRIPT_INPUT}"

---
Name: Knowledge Retrieval Context
Description: Add knowledge retrieved from uploaded files to the context.
Credential: github.com/gptscript-ai/credentials/model-provider
Type: context
Input Filter: QueryRelevancy
Output Filter: KnowledgeInstructions

#!${GPTSCRIPT_TOOL_DIR}/bin/gptscript-go-tool retrieve ${GPTSCRIPT_THREAD_ID:+--dataset=${GPTSCRIPT_THREAD_ID}} ${GPTSCRIPT_SCRIPT_ID:+--dataset=${GPTSCRIPT_SCRIPT_ID}} "${GPTSCRIPT_INPUT}"

---
Name: QueryRelevancy
Param: input: the message
Context: LastUserInputOverview

The user has put in the following message: "${INPUT}".
Your task is to ensure that the input is good enough to be used as a query for the knowledge tool, i.e. good for a semantic similarity search.
Use the following criteria to determine if the input is good enough:
1. It's good if it contains keywords, names, entities, etc.
2. It's good if it's written as a natural language question with proper semantic meaning.
3. It's not good if it's too generic or vague or if it only contains stopwords and no real content.

If you decide based on the above criteria that the input is good enough, return it as is without further considerations.
If you decide that it's not good enough, check if it may be a follow-up question or in any way referring to an earlier message.
Consider the chat history enclosed in the following <HISTORY></HISTORY> tags.
<HISTORY>
${GPTSCRIPT_CONTEXT}
</HISTORY>
If you think the input is connected to parts of the history, enrich it with details from there.
If the chat history is irrelevant (i.e. there is no connection the current input) and you're certain that the input won't make for a good query, return a single - with no quotes or highlighting or anything else.
Only reply with the final query and nothing else. Do not use any syntax highlighting.
Here's are some examples:

## Example 1
Input: "Who's the employer?"
History: "<USER_MESSAGES>[User Message #1] How much does John Doe earn?</USER_MESSAGES>"
Output: "Who's the employer of John Doe?"
Reasoning: The input looks like a follow-up question related to a previous query. The previous query is about the entity John Doe, so the current query is likely asking about the same entity.

## Example 2
Input: "What are its attributes?"
History: "<USER_MESSAGES>[User Message #1] What's Oystersteel?</USER_MESSAGES>"
Output: "What are the attributes of Oystersteel?"
Reasoning: The input is a follow-up question related to a previous query. The previous query is about the entity Oystersteel, so the current query is likely asking about the attributes of the same entity.

## Example 3
Input: "What's the weather like?"
History: "<USER_MESSAGES></USER_MESSAGES>"
Output: -
Reasoning: The input is too generic and doesn't contain any specific keywords or entities. There is no relevant history to enrich the input with, so it's not good enough to be used as a query.

## Example 4
Input: "What's my Day 6 Travel Plan?"
History: "<USER_MESSAGES></USER_MESSAGES>"
Output: "What's my Day 6 Travel Plan?"
Reasoning: The query is about something that belongs to the user so it's unlikely to be present in the training data. It must be in some uploaded file. The query is also good enough, since it contains keywords and a specific day.

---
Name: KnowledgeInstructions
Params: output: the message
Params: continuation: if the the conversation is still in progress
Params: chat: if this is a chat conversation

#!/usr/bin/env python3

import os
import json
import asyncio

async def main():
output = os.getenv('OUTPUT', '')
continuation = os.getenv('CONTINUATION', '') == 'true'
is_chat = os.getenv('CHAT', '') == 'true'

# We're looping here instead of just search + split, since there are some edge cases, where
# the actual result output was followed by more logs that shouldn't be in the output.
result = ""
for part in output.split("\n"):
if part.startswith('{"originalQuery"'):
result = part
break

if result != "":
msg = f"""
Use the content within the following <KNOWLEDGE></KNOWLEDGE> tags as your learned knowledge.
<KNOWLEDGE>
{result}
</KNOWLEDGE>
If this knowledge seems irrelevant to the user query, ignore it.
Avoid mentioning that you retrieved the information from the context or the knowledge tool.
Only provide citations if explicitly asked for it and if the source references are available in the knowledge.
Answer in the language that the user asked the question in.
"""
else:
msg = "No data retrieved from knowledge base."

print(msg)

asyncio.run(main())

---
Name: LastUserInputOverview
Description: Get an overview over the last user inputs in the current chat
Context: sys.chat.current
Param: Limit: How many entries should be returned

#!/usr/bin/env python3

import json
import asyncio
import os


async def main():
histories_str = os.getenv("GPTSCRIPT_CONTEXT", "")

if not histories_str:
print("<USER_MESSAGES></USER_MESSAGES>")
return

histories = json.loads(histories_str)

limit = int(os.getenv("LIMIT", "50"))

chat = ["<USER_MESSAGES>"]
msgs = []

completion = histories.get("completion", {})

i = 1

for message in completion.get("messages", []):
role = message.get("role", "")
text = " ".join(
[part["text"] for part in message.get("content", []) if "text" in part]
)
if role == "user" and len(text) > 0 and not text.startswith("Call "):
msgs.append(f"[User Message #{i}] {text}")
i += 1

if limit > len(msgs):
limit = len(msgs)

if limit == 0:
for msg in msgs:
chat.append(msg)
else:
for msg in msgs[-limit:]:
chat.append(msg)

chat.append("</USER_MESSAGES>")
print("\n".join(chat))


asyncio.run(main())

---
Name: Knowledge Load
Description: Load a file and transform it to markdown
Params: input: Path to input file
Params: output: Path to output file where the extracted markdown content should be written to. Should end with .md
Params: loader: Optionally choose a specific loader for the file to not rely on the default. Only use if explicitly specified.
Params: Datasets: The datasets to query
Params: Query: The query to search for

#!${GPTSCRIPT_TOOL_DIR}/bin/gptscript-go-tool load ${INPUT} ${OUTPUT} ${LOADER:+--loader=${LOADER}}
#!${GPTSCRIPT_TOOL_DIR}/bin/gptscript-go-tool retrieve --dataset="${DATASETS}" "${QUERY}"

0 comments on commit a5fe669

Please sign in to comment.