[Resource] Llama3 70B Successfully Deployed on a Single 4GB GPU

pavnilschanda@lemmy.world · 8 months ago

[Resource] Llama3 70B Successfully Deployed on a Single 4GB GPU

voracitude@lemmy.world · 8 months ago

That’s very cool, any idea about tokens/sec performance and on what hardware? For reference my 3070 gets ~19-25 tokens/sec with llama3 7B.

Mechaguana@programming.dev · edit-2 8 months ago

I tried running ollama with the mistral model running, you need a good graphics card to run your own llm, i had to wait 20 minutes for one full response.

Granted, the laptop i was running it with was garbage but it really put into perspective how expensive running an llm can really be.

This shit wont be free forever.

Aquila@sh.itjust.works · 8 months ago

Only works on apple silicon. Am I reading that right?

hyperhypervisor@programming.dev · 8 months ago

No, they just mention that only Apple silicon is supported if you’re using MacOS

[Resource] Llama3 70B Successfully Deployed on a Single 4GB GPU

[Resource] Llama3 70B Successfully Deployed on a Single 4GB GPU

Run the strongest open-source LLM model: Llama3 70B with just a single 4GB GPU!