CPU-only i7-1355U koboldcpp works surprisingly well

raffa@lemmynsfw.com · 8 days ago

CPU-only i7-1355U koboldcpp works surprisingly well

NSFW

magn418@lemmynsfw.com · edit-2 8 days ago

Nice! KoboldCpp is also my software of choice. It’s easy to install, all-in-one and has a good amount of features.

What kind of model size do you use to arrive at 1token/s? I’m in the same ballpark. Though my old desktop PC is a bit faster than my laptop. Probably because it has dual-channel memory and doesn’t throttle.

I think that’s the point where it gets usable. At least for consecutive chat. If I feed in longer text, or KoboldCpp decides to recalculate large portions of the context, it’ll be several minutes for me until I get a reply. And that’s less fun for use-cases like dialougue.

raffa@lemmynsfw.com · 8 days ago

My first test was with Starcannon-Unleashed-12B-v1.0-f16, a 23Gbyte model. I did not expect that laptop to be usable at all.

magn418@lemmynsfw.com · edit-2 8 days ago

I think doing the calculations at full precision (FP16) is a waste. You should try somewhere between the Q4_K_M version to Q6_K (or at least Q8_0, that’s supposed to the same quality as FP16). That way it should be considerably faster… At least twice as fast.

(The GGUF page of that model has a list of recommended quantization levels.)

raffa@lemmynsfw.com · 8 days ago

thanks for the tips!