What's currently the 'smartest' language model?

@SurpriZe@lemm.ee · 10 months ago

What's currently the 'smartest' language model?

@solberg@lemmy.blahaj.zone · 10 months ago

I’ve been using Sonnet 3.5 a lot recently. Does seem like it’s better and more creative than others for a lot of tasks. I also think it’s training set is up to April 2024 which is nice.

I’ve also found that GPT-4o is worse than GPT-4 in my experience. Seems to hallucinate more

@yboutros@infosec.pub · 10 months ago

Ollama (+ web-ui but ollama serve & && ollama run is all you need) then compare and contrast the various models

I’ve had luck with Mistral for example

mozz · 10 months ago

GPT-4 is apparently the model to beat. I haven’t seen all that much difference in practice between GPT-4 and 4o. I’ve heard various claims about various other models outperforming it (notably including Claude) but I haven’t seen the claims materialize over the long haul as yet.

I have however heard that Mistral can get quite close to GPT-4, run for free locally with the right hardware, if you build up a hand curated set of around 100 query/response pairs from GPT-4 that are what you want it to do, and then fine-tune Mistral against that training set. I haven’t tried it but that’s what I’ve heard.

@SurpriZe@lemm.ee · 10 months ago

I’m a total layman when it comes to setting up a language model locally. Any step by step guide on how to do it? And I mostly use AIs on my Android phone, not PC. Is it possible to synchronize it between two devices?

mozz · 10 months ago

GPT4all can do it pretty easily on a desktop with a good GPU. I think it’s unlikely that anything can run locally on your phone (LLMs are notably hogs in terms of even pretty capable desktop PC resources; there’s just not a cheap way to do them). You could use colab or something via your phone, and there is probably a little howto guide somewhere that shows how to do a Mistral setup on colab. It’ll take some technical skill though.

You might just bite the bullet and do $20/mo for the GPT-4 subscription also. It can also do web searches, I think, although in practice it’s pretty clunky the times it’s tried to do things like that for me. I’m not aware of one that does the “search the web for answers and get back to me” thing really all that perfectly or smoothly I’m sad to say.

Alex · 10 months ago

Why do the $20 subscription when the API pricing is much cheaper, especially if you are trying different models out. I’m currently playing about with Gemini and that’s free (albeit rate limited).

mozz · 10 months ago

100% right; unless you are using it a ton the API pricing is likely to be cheaper

@SurpriZe@lemm.ee · 10 months ago

And also, any recommendations on a specific GPT4 addon or is the base model pretty much perfect as is?

mozz · 10 months ago

GPT-4 generally doesn’t need fine tuning or anything no

@Bluefruit@lemmy.world · 10 months ago

Most models that I’ve played with are only about as good as what you put into it. If you ask it the right questions in the right way, you can get pretty good results.

GPT3.5 has worked well for me. I’ve also run AI on my pc locally using Ollama and lots of different models. Most do well with simple questions or requests.

Llama 3 instruct is what I’ve liked the most so far.

@bpalmerau@aussie.zone · 10 months ago

Hence the job title ‘prompt engineer’ I guess. If you know about Soylent Green, AI is people!

@Bluefruit@lemmy.world · 10 months ago

Lol prompts are important for sure. Me and my boss often talk about what you can do with chatgpt when we use it at work amd what kind of prompts we use.

Veraxus · 10 months ago

ChatGPT 4o is the top dog right now, by a lot.

@frightful_hobgoblin@lemmy.ml · 10 months ago

It’s GPT-4 to tell the truth.

Not sure it’ll do the tasks you list at the start but it’s the front runner.

@Sbuiko@lemmy.world · 10 months ago

Humans. For the best experience, get some third world contractor. Costs more tho.

@pavnilschanda@lemmy.world · 10 months ago

Reducing people from third world countries to “language models” as an attempt to critique AI aint it

@Cwilliams@beehaw.org · 10 months ago

Not sure about paid models, but Claude Sonnet 3.5 is so good it’s not even funny. I’ve had arguments with it, where it was right in the end, and it never even considered that I was right (because I wasn’t; I ended up looking it up afterwards). I’ve never seen that with any other model