You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Would you be open to talking about how the scores are created?
I created some gpt-4 scores in a project in the past and found them not good enough (they would fluctuate based on input sentences with the same meanings, scores somewhat too arbitrary, different days would give different scores for the same input). At least you should pin the gpt-4 version so you have better control when they roll updates to gpt-4
For code one could add unit tests to check the created functions
The text was updated successfully, but these errors were encountered:
Hey,
great initiative to track local llms!
Would you be open to talking about how the scores are created?
I created some gpt-4 scores in a project in the past and found them not good enough (they would fluctuate based on input sentences with the same meanings, scores somewhat too arbitrary, different days would give different scores for the same input). At least you should pin the gpt-4 version so you have better control when they roll updates to gpt-4
For code one could add unit tests to check the created functions
The text was updated successfully, but these errors were encountered: