kenna@lemmy.dbzer0.comM to human centered computing@lemmy.dbzer0.com · 11 months agoLLM in a flash: Efficient Large Language Model Inference with Limited Memoryhuggingface.coexternal-linkmessage-square0fedilinkarrow-up11arrow-down10cross-posted to: [email protected][email protected][email protected]
arrow-up11arrow-down1external-linkLLM in a flash: Efficient Large Language Model Inference with Limited Memoryhuggingface.cokenna@lemmy.dbzer0.comM to human centered computing@lemmy.dbzer0.com · 11 months agomessage-square0fedilinkcross-posted to: [email protected][email protected][email protected]