-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enabling new OpenAI models for MAGE #1587
Enabling new OpenAI models for MAGE #1587
Conversation
BTW, I've thought about writing some unit tests for the feature, but I was unsure how to best do this. I noticed that there aren't really any unit test for most of MAGE's functionality. Furthermore, the type of test I'm going for seems kinda complicated to write, since it would require the OpenAI API key to be set during testing, and would have to send HTTP requests. Also, I've empirically verified (through inspecting the Considering all this, does it make more sense to skip writing unit tests for something like this? |
Right, so there are multiple steps:
For (1) we use GPT_4 and that is fixed. You expanded our codebase with knowledge about more GPT models, which is great. One more thing we want to do is to start using the newest GPT 4 for step (1) by default. I believe you can set this up in Plan.hs, and maybe some more changes are needed, not sure -> that logic with fallbacking from GPT4 to 3.5 if 4 is not available that we have somewhere, that might also need updating, I am not sure without checking it more thoroughly. Actually, I imagined this task to consist of two things:
What you did is a step in right direction, as it somewhat satisfies (2). What would be great now is to also do (1) and maybe make (2) even more flexible as I described. |
When you say the "newest GPT-4 model", I assume you're referring to From my experience with using the OpenAI API for various other things, the newsest GPT-4 is:
To clarify the last point, in general This is not detrimental when you're summarizing books of 100k tokens, where the output doesn't have a predefined structure. To verify this, I'll run some qualitative tests with using
Does that sound okay? |
Regarding your second point, one option I looked more into the option of dynamically fetching (with HTTP) the list of currently available models, and then offering only those models to the user. The problem, however, is that there's currently no way to only get a list of exclusively chat-completion models from OpenAI.The endpoint that lists the models ( We only use chat-completion models. We would therefore have to resort to some heuristic for filtering this list of models (e.g. some regex matching). Any update to the list of available models would also necessitate an update to this heuristic. When compared to explicitly keeping a list of chat-completion models, the dynamic way of doing it would only give us an over-engineered way to offer some obscure legacy version of GPT models to the 0.01% of users who actually have a valid reason to try using it. It's more difficult to implement and to maintain. This is quite an opinionated comment, but I'm open to further discussion about it. Let me know what you think! |
GPT4 vs GPT4-1106Ok @jankovidakovic makes sense -> don't spend too much time on the financial comparison, that is quite clear to me, but do some tests to see if there is any performance loss. If there is a non-significant impact, we will not adopt the new one for now. If there is not, we will adopt it. Allow users to dynamically choose the GPT modelSorry I should have explained it better: I didn't mean that we dynamically catch list of chat models from OpenAI (I agree we shouldn't bother with that for now), what I meant is that user can provide whatever string for a name of the chat model and we will try to use it. That might sound weird because how can they provide it via the web app, but I wasn't referring to the web app, but to the usage via the CLI, where they can actually specify a name of the model via string. However right now, we will check if that string is a model we know of, and will reject if we don't know about it. We could instead allow any model. This is how they can specify the model via CLI currently:
|
Whoops I misclicked and closed the PR, reopening it. |
Let me just clarify what you mean here. Currently, the user of our CLI can only specify the model string that has a valid representation in our Instead of this, the request for change here is to allow the user to specify any string as the name of the OpenAI model? So, for example, if the user writes |
Yup, that would be it. We had users asking for this, and I think it makes sense. It gives them some freedom in using the models they want to use. Although, can it really help them? I mean we know which model works the best, right? So why would they pick some other model? It makes sense only in case when new open-ai model comes out and we don't yet have it in our list of models (our enum). Or if they have their own pretrained model up there on OpenAI, but I don't think anybody will have on for Wasp. Ok, so you know what, let's prioritize and let's do the following:
|
07fd15c
to
2559d30
Compare
I think that the users probably asked for this because they wanted to use the newer OpenAI chat models. Once this PR is merged, all of these models will have been enabled. Therefore, I agree with your point -- there is no need to support other strings as models for now. The additional note regarding the fine-tuned OpenAI models is interesting though, we might want to look into that in the future. |
In the meantime, I've added the option to specify the GPT model used for the "Plan" part. The model can be specified in the I don't necessarily think that we should expose this to the web app, but at least it enables us (the developers) to change the model versions freely without needing to run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice we are very close, but left some more comments!
One more thought: there is a piece of our code where we check if users have GPT4 available, and if not, we falledback to using GPT3.5 for planning, if I am correct. That made sense when GPT4 was yet coming out, but now everybody has access so it doesn't make much sense anymore to do that. Therefore, I would suggest we also remove that logic, it is redundant. You can find it in src/Wasp/AI/CodeAgent.hs. We can do this as separate PR, or as a part of this PR if you want, whatever you like better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found two things confusing regarding naming:
planGptModel
vsdefaultGptModel
sounds a bit confusing, it is not clean what doesdefault
mean -> what is this one used for?plan
is pretty clear, butdefault
is a bit ambiguous.- In functions below that give us
defaultChatGptParams
anddefaultChatGptParmsForPlan
, worddefault
is starting to lose its meaning and it was a bit hard for me to figure out what is the relationship of this anddefaultGptModel
and so on.
We had some of these problems already before, but now they are getting emphasized with these new changes, and I think we should look to resolve them.
The fact is, we have two GPT models: one for generating plan, another for generating code / fixing code. We want each one to be specifiable via CLI. We also want to have defaults for each of them in case they are not specified.
Name ideas for the GPT model that produces plan: planningGptModel
, planGptModel
. Both seem quite ok.
Name ideas for the GPT model that does everything else:
deafultGptModel
-> bad, confusingbaseGptModel
-> not so bad, not super clear either, but general enough.generationGptModel
-> hm confusing I think, what isgeneration
.gptModel
-> ha maybe not even so bad, at least it doesn't push you in the wrong direction, but then maybebaseGptModel
is a bit better.codegenGptModel
-> hmm, not so bad. Although planning one also does some codegen. But this is pretty precise, I like that. Maybe too precise? What about fixing? I guess that falls under codegen, we shouldn't think about fixing.
I think going for planGptModel
and baseGptModel
could work well. With codegenGptModel
being and alternative, but I think baseGptModel
wins for a small bit.
Now that we got rid of default
, we can have two variables, defaultPlanGptModel = GPT_4
and we can have defaultBaseGptModel = GPT_3_5_Turbo
, and those will make code clearer.
Then, we can have functions:
baseChatGptParams :: NewProjectDetails -> ChatGPTParams
baseChatGptParams = undefined
where
defaultGptModel = GPT_3_5_Turbo
planChatGptParams :: NewProjectDetails -> ChatGPTParams
planChatGptParams = undefined
where
planChatGptParams = GPT_4
fixingChatGptParams :: ChatGPTParams -> ChatGPTParams
fixingChatGptParams = undefined
I did a bit of a trick here -> I think we can just get by with this one function for fixing, fixingChatGptParams
, that alters other ChatGPTParams by subtracting 0.2 from their temp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh and let's set defaultGptModel for base
case to be GPT_3_5_turbo_0613
, right? To we make sure we don't experience a drop in performance once they make gpt-3.5
point to newer one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should also pin down the version for GPT 4?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about plannerGptModel
and coderGptModel
? Since we can presume that we have two "separate" GPT-based agents -- one that plans, and one that codes (and fixes code)? If not, I'm gonna go with planGptModel
and baseGptModel
.
Enabling new OpenAI models to MAGE Code Agent
This PR enables MAGE to be used with all non-legacy OpenAI chat models.
Models
The list of models is taken from the OpenAI website, available here: https://platform.openai.com/docs/models. The list is likely to change (e.g. models become deprecated, new models are added), so updates to the list of models that MAGE supports are probably expected in the future.
One additional idea would be to dynamically generate the list of supported models by querying the API. Although we already do something (kinda) similar when checking whether the GPT-4 model is available, this suggestion seems more complicated to implement and it's unclear if it's actually needed at this time.
Plan generation vs the fixing step
Note that the model configuration is still enabled only for the later steps of ChatGPT prompting (fixing the generated Wasp/JS files). The initial step (plan generation) is fixed to use GPT-4. This makes sense (for now) as this is the most powerful model. In the future, though, we might want to expose this to the CLI configuration as well.