Researchers figured out how to run a 120-billion parameter model across four regular desktop PCs

noumenon@lemmy.world · 3 months ago

Researchers figured out how to run a 120-billion parameter model across four regular desktop PCs

PM_ME_VINTAGE_30S [he/him]@anarchist.nexus · edit-2 3 months ago

I like how there’s no fucking code repo or even a white paper or any evidence that this system ever actually existed 🤦‍♂️🤦‍♂️🤦‍♂️🤦‍♂️🤦‍♂️🤦‍♂️🤦‍♂️🤦‍♂️

afk_strats@lemmy.world · 3 months ago

This is basically meaningless. You can already run gpt-OSS 120 across consumer grade machines. In fact, I’ve done it with open source software with a proper open source licence, offline, at my house. It’s called llama.cpp and it is one of the most popular projects on GitHub. It’s the basis of ollama which Facebook coopted and is the engine for LMStudio, a popular LLM app.

The only thing you need is around 64 gigs of free RAM and you can serve gpt-oss120 as an OpenAI-like api endpoint. VRAM is preferred but llama.cpp can run in system RAM or on top of multiple different GPU addressing technologies. It has a built-in server which allows it to pool resources from multiple machines…

I bet you could even do it over a series of high-ram phones in a network.

So I ask is this novel or is it an advertisement packaged as a press release?

madcaesar@lemmy.world · 3 months ago

So what do you get with a home run LLM? How capable is it what can you use it for?

afk_strats@lemmy.world · 3 months ago

I still think AI is mostly a toy and a corporate inflation device. There are valid use cases but I don’t think that’s the majority of the bubble

For my personal use, I used it to learn how models work from a compute perspective. I’ve been interested and involved with natural language processing and sentiment analysis since before LLMs became a thing. Modern models are an evolution of that.
A small, consumer grade model like GPT-oss-20 is around 13GB and can run on a single mid-grade consumer GPU and maybe some RAM. It’s capable of parsing text and summarizing, troubleshooting computer issues, and some basic coding or code review for personal use. I built some bash and home assistant automatons for myself using these models as crutches. Also, there is software that can index text locally to help you have conversations with large documents. I use this with documentation for my music keyboard which is a nightmare to program and with complex APIs.
A mid-size model like Nemotron3 30B is around 20GB can run on a larger consumer card (like my 7900xtx with 24 gb of VRAM, or 2 5060tis with 16gb of vRAM each) and will have vaguely the same usability as the small commercial models, like Gemini Flash, or Claude Haiku. These can write better, more complex code. I also use these to help me organize personal notes. I dump everything in my brain to text and have the model give it structure.
A large model like GLM4.7 is around 150GB can do all the things ChatGPT or Gemini Pro can do, given web access and a pretty wrapper. This requires big RAM and some patience or a lot of VRAM. There is software designed to run these larger models in RAM faster, namely ik_llama but, at this scale, you’re throwing money at AI.

I played around with image creation and there isn’t anything there other than a toy for me. I take pictures with a camera.

Special Wall@midwest.social · edit-2 3 months ago

Now, EPFL researchers… have released new software that allows users to download open-source AI models and use them locally, with no need for the cloud to answer questions or complete tasks.

It’s cool that they got LLMs running on local clusters of computers, but with the way it’s written, they make it sound like people have not already been using local LLMs for a long time (including GPT-OSS 120B).

WereCat@lemmy.world · 3 months ago

OFC you can… I can run the 70B DeepSeek on my 16GB RX 6800 XT with 64GB RAM already…

SharkAttak@kbin.melroy.org · 3 months ago

That last amount you mentioned could be a little problematic, at the moment…

Researchers figured out how to run a 120-billion parameter model across four regular desktop PCs

Researchers figured out how to run a 120-billion parameter model across four regular desktop PCs

Do we really need big data centers for AI?