CPU-only i7-1355U koboldcpp works surprisingly well

raffa@lemmynsfw.com · 3 months ago

CPU-only i7-1355U koboldcpp works surprisingly well

NSFW

KinkyThoughts@lemmynsfw.com · 3 months ago

15 seconds per reply with just 1 token/s?! How short are they? What’s the context size to be processed? I get like 5 tokens per second on my GPU and need 1-2 minutes per reply on 4k context size.

raffa@lemmynsfw.com · 3 months ago

context size default of 4096, replies are like 16 tokens or so.

KinkyThoughts@lemmynsfw.com · 3 months ago

I mean the actual context size to be processed for the message, based on chat history, character cards, world info, etc. And which model?