Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check overhead for comment evaluation #174

Closed
gentlementlegen opened this issue Oct 28, 2024 · 19 comments · Fixed by #225
Closed

Check overhead for comment evaluation #174

gentlementlegen opened this issue Oct 28, 2024 · 19 comments · Fixed by #225

Comments

@gentlementlegen
Copy link
Member

          > ```diff

! Failed to run comment evaluation. Error: 400 This model's maximum context length is 128000 tokens. However, your messages resulted in 148540 tokens. Please reduce the length of the messages.

<!--
https://github.com/ubiquity-os-marketplace/text-conversation-rewards/actions/runs/11463496789
{
  "status": 400,
  "headers": {
    "access-control-expose-headers": "X-Request-ID",
    "alt-svc": "h3=\":443\"; ma=86400",
    "cf-cache-status": "DYNAMIC",
    "cf-ray": "8d6a8635dd992009-IAD",
    "connection": "keep-alive",
    "content-length": "284",
    "content-type": "application/json",
    "date": "Tue, 22 Oct 2024 15:29:41 GMT",
    "openai-organization": "ubiquity-dao-8veapj",
    "openai-processing-ms": "375",
    "openai-version": "2020-10-01",
    "server": "cloudflare",
    "set-cookie": "__cf_bm=urRioyrKlQBCiRkxcgeZKjDpvmvjEQsjfq1o9zASCxs-1729610981-1.0.1.1-u3eEr.AKdcx2EGJuW2nauw6LA5zK0ZDXyOKJiCI01E_pfZOpnzWIJoxgLq_OlO8BDT_WFfSD_jFjjW6Fnmx_Mw; path=/; expires=Tue, 22-Oct-24 15:59:41 GMT; domain=.api.openai.com; HttpOnly; Secure; SameSite=None, _cfuvid=qIG5Ao6fOQ9MAWT6hlX2fjC8G.yTYmXl4vzXjH7Qqsg-1729610981415-0.0.1.1-604800000; path=/; domain=.api.openai.com; HttpOnly; Secure; SameSite=None",
    "strict-transport-security": "max-age=31536000; includeSubDomains; preload",
    "x-content-type-options": "nosniff",
    "x-ratelimit-limit-requests": "5000",
    "x-ratelimit-limit-tokens": "450000",
    "x-ratelimit-remaining-requests": "4999",
    "x-ratelimit-remaining-tokens": "83951",
    "x-ratelimit-reset-requests": "12ms",
    "x-ratelimit-reset-tokens": "48.806s",
    "x-request-id": "req_bb581eb70b2276ea9a9c563b12f6343b"
  },
  "request_id": "req_bb581eb70b2276ea9a9c563b12f6343b",
  "error": {
    "message": "This model's maximum context length is 128000 tokens. However, your messages resulted in 148540 tokens. Please reduce the length of the messages.",
    "type": "invalid_request_error",
    "param": "messages",
    "code": "context_length_exceeded"
  },
  "code": "context_length_exceeded",
  "param": "messages",
  "type": "invalid_request_error",
  "caller": "/home/runner/work/text-conversation-rewards/text-conversation-rewards/dist/index.js:291:6136492"
}
-->

@gentlementlegen perhaps we have too much overhead with each pull? And by that I mean headers and such not the main content? Because I don't imagine that each pull actually has that much "body" content. This easily can be optimized as I see some have barely any comments.

Originally posted by @0x4007 in ubiquity-os/ubiquity-os-kernel#80 (comment)

Copy link

@gentlementlegen
Copy link
Member Author

/start

Copy link

Warning! This task was created over 52 days ago. Please confirm that this issue specification is accurate before starting.
Deadline Thu, Dec 19, 7:35 AM UTC
Beneficiary 0x0fC1b909ba9265A846b82CF4CE352fc3e7EeB2ED

Tip

  • Use /wallet 0x0000...0000 if you want to update your registered payment wallet address.
  • Be sure to open a draft pull request as soon as possible to communicate updates on your progress.
  • Be sure to provide timely updates to us when requested, or you will be automatically unassigned from the task.

@gentlementlegen
Copy link
Member Author

Was thinking about this and maybe there would be a few available approches:

  • summarization of comments: probably the most accurate but also very expensive because each comment should be summarized
  • truncation: just truncate the comment, but this implies losing precision and context
  • filter highly relevant comments first, with something like TF IDF. Not ideal but most likely the best compromise without needing to use API credits
  • take only a sample of the comments: easy to implement but losing precision and context as well

Any ideas? @sshivaditya2019 RFC

@0x4007
Copy link
Member

0x4007 commented Dec 20, 2024

I think high accuracy is the best choice from your selection. I think costs continue to decline with these LLMs as well.

@gentlementlegen
Copy link
Member Author

Let me test results with TF IDF first and see how accurate it gets, because it would also most likely be much simpler to implement that a summary of all the comments, will run some tests and post them here.

Copy link

ubiquity-os-beta bot commented Jan 20, 2025

 [ 115.94 WXDAI ] 

@gentlementlegen
Contributions Overview
ViewContributionCountReward
IssueTask1100
IssueSpecification115.94
Conversation Incentives
CommentFormattingRelevancePriorityReward
> ```diff@gentlementlegen perhaps we have too m…
7.97
content:
  content:
    p:
      score: 0
      elementCount: 2
    em:
      score: 0
      elementCount: 1
    a:
      score: 5
      elementCount: 1
  result: 5
regex:
  wordCount: 54
  wordValue: 0.1
  result: 2.97
1215.94

 [ 47.998 WXDAI ] 

@0x4007
Contributions Overview
ViewContributionCountReward
IssueComment10
ReviewComment2247.998
Conversation Incentives
CommentFormattingRelevancePriorityReward
I think high accuracy is the best choice from your selection. I …
1.38
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 22
  wordValue: 0.1
  result: 1.38
020
Very skeptical of tfidf approach. We should go simpler and filte…
0.94
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 14
  wordValue: 0.1
  result: 0.94
0.721.316
This depends on the model and possibly should be an environment …
1.11
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 17
  wordValue: 0.1
  result: 1.11
0.821.776
We should also filter out slash commands? And minimized comments?
0.71
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 10
  wordValue: 0.1
  result: 0.71
0.620.852
I'm skeptical about this whole TFIDF approach1. The tokenizer a…
7.64
content:
  content:
    p:
      score: 0
      elementCount: 1
    ol:
      score: 0
      elementCount: 1
    li:
      score: 0.5
      elementCount: 3
  result: 1.5
regex:
  wordCount: 127
  wordValue: 0.1
  result: 6.14
0.9213.752
Can you articulate the weaknesses or concerns
0.52
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 7
  wordValue: 0.1
  result: 0.52
0.520.52
Hard coding the 12400 doesn't seem like a solution there either
0.83
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 12
  wordValue: 0.1
  result: 0.83
0.721.162
Line 179 is hard coded
0.39
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 5
  wordValue: 0.1
  result: 0.39
0.620.468
Yes if we don't have it saved in our library or collection of kn…
1.28
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 20
  wordValue: 0.1
  result: 1.28
0.621.536
It shouldn't affect it at all. I would proceed with implicit app…
1.44
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 23
  wordValue: 0.1
  result: 1.44
0.421.152
Manually get the numbers from their docs then
0.59
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 8
  wordValue: 0.1
  result: 0.59
0.620.708
Why is this a constant? Makes more sense to use let and directly…
1.28
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 20
  wordValue: 0.1
  result: 1.28
0.721.792
```suggestion```
0
content:
  content: {}
  result: 0
regex:
  wordCount: 0
  wordValue: 0.1
  result: 0
0.220
Add more chunks if the request to OpenAI fails for being too lon…
2.1
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 36
  wordValue: 0.1
  result: 2.1
0.823.36
@shiv810 rfc
0.18
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 2
  wordValue: 0.1
  result: 0.18
0.320.108
Separate is fine then just as long as the current code is stable.
0.88
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 13
  wordValue: 0.1
  result: 0.88
0.520.88
More careful filtering of comments like removal of bot commands …
2.05
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 35
  wordValue: 0.1
  result: 2.05
0.823.28
Doing multiple calls to score everything and then concatenate re…
1.17
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 18
  wordValue: 0.1
  result: 1.17
0.721.638
Divide into two and do 150 each call. Receive the results array …
1.06
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 16
  wordValue: 0.1
  result: 1.06
0.621.272
Surely it's a bit of a trade off without all of the comments in …
5.58
content:
  content:
    p:
      score: 0
      elementCount: 1
    ol:
      score: 0
      elementCount: 1
    li:
      score: 0.5
      elementCount: 2
  result: 1
regex:
  wordCount: 90
  wordValue: 0.1
  result: 4.58
0.7528.37
It's hard for me to tell from the QA example but if it works it …
2.2
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 38
  wordValue: 0.1
  result: 2.2
0.522.2
For your low token limit example, I think your config was wrong …
1.49
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 24
  wordValue: 0.1
  result: 1.49
0.421.192
Relevance 1 is not expected of course unless it's the spec.
0.83
content:
  content:
    p:
      score: 0
      elementCount: 1
  result: 0
regex:
  wordCount: 12
  wordValue: 0.1
  result: 0.83
0.420.664

 [ 20.936 WXDAI ] 

@whilefoo
Contributions Overview
ViewContributionCountReward
ReviewComment320.936
Conversation Incentives
CommentFormattingRelevancePriorityReward
Are we sending user comments twice? Maybe we could decrease toke…
2.25
content:
  content:
    p:
      score: 0
      elementCount: 2
  result: 0
regex:
  wordCount: 39
  wordValue: 0.1
  result: 2.25
0.924.05
Yeah that's why we could leave `allComments` so it makes…
3.02
content:
  content:
    p:
      score: 0
      elementCount: 2
  result: 0
regex:
  wordCount: 55
  wordValue: 0.1
  result: 3.02
0.824.832
I didn't know [prompt caching](https://platform.openai.com/docs/…
8.61
content:
  content:
    p:
      score: 0
      elementCount: 3
    a:
      score: 5
      elementCount: 1
  result: 5
regex:
  wordCount: 68
  wordValue: 0.1
  result: 3.61
0.7212.054

@shiv810
Copy link

shiv810 commented Jan 20, 2025

@gentlementlegen is this expected ? I don't see a permit generated for my id.

@gentlementlegen
Copy link
Member Author

gentlementlegen commented Jan 21, 2025

@shiv810 Yes, because your profile is private, you do not appear as part of ubiquity-os collaborator and rewards are set to zero for contributors in pull-request, so you basically got no reward and got stripped down from the result table.

@0x4007
Copy link
Member

0x4007 commented Jan 21, 2025

I thought we fixed that by checking collaborators on the repository level.

@gentlementlegen
Copy link
Member Author

Yes that feature is live, but look at the collaborators inside this repository:

Image

@0x4007
Copy link
Member

0x4007 commented Jan 21, 2025

I suppose the right thing to do is to add every core team member to every organization.

I wanted to experiment with not having to do this in order to operate like a "real DAO" but I realize that there is a need for a distinction between "trusted" and "not trusted" contributor especially for:

  1. merging pulls
  2. setting labels
  3. generating permits

In the future, an XP system should be able to handle this dynamically.

@0x4007
Copy link
Member

0x4007 commented Jan 21, 2025

@shiv810 I can regenerate once you accept your invitation https://github.com/ubiquity-os-marketplace

@gentlementlegen I'm assuming that the status is inherited from the organization level.

@gentlementlegen
Copy link
Member Author

There are two ways currently to be considered as a collaborator:

  • be part of the organization (has to be public on the profile)
  • be added as a collaborator in the repository (like you did in ubiquity/business for example)

@0x4007
Copy link
Member

0x4007 commented Jan 21, 2025

  • be added as a collaborator in the repository (like you did in ubiquity/business for example)

So then this is the only solution if the collaborator has a private profile. I wonder if there is a solution for them to be added to the org.

@gentlementlegen
Copy link
Member Author

Well they can be added to the organization but the API won't be able to retrieve the information from that user, since the "private" mode hides all this information.

@0x4007
Copy link
Member

0x4007 commented Jan 21, 2025

I wonder if we can build a shim for this problem in the form of some type of persistent JSON storage. Synchronizing would be difficult to do in realtime though, which matters more for if they were removed from the team.

@gentlementlegen
Copy link
Member Author

This would mean that we should manually keep that list updated, and third parties most likely wouldn't enjoy that either. I am not sure to understand why people don't create burner accounts to make them public and only used in our organization for example.

@0x4007
Copy link
Member

0x4007 commented Jan 23, 2025

Well to be more specific my vision was to append to this cache if another module detects that they are part of the organization collaborators, or if they set a label or performed some privileged action.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment