CPU-only i7-1355U koboldcpp works surprisingly well

raffa@lemmynsfw.com · 8 days ago

CPU-only i7-1355U koboldcpp works surprisingly well

NSFW

magn418@lemmynsfw.com · edit-2 8 days ago

I think doing the calculations at full precision (FP16) is a waste. You should try somewhere between the Q4_K_M version to Q6_K (or at least Q8_0, that’s supposed to the same quality as FP16). That way it should be considerably faster… At least twice as fast.

(The GGUF page of that model has a list of recommended quantization levels.)

raffa@lemmynsfw.com · 8 days ago

thanks for the tips!