Ollama Support for Custom num_ctx
planned
Adam Ram
MindMac model setting has "Maximum Tokens" already there. It might be just model information only, because maximum tokens already set by endpoints.
For Ollama, sadly it's only fallback to default 2048 max tokens if not explicitly written in modelfile OR in inference message. This is why models doesn't follow Occupations (system prompt) because long system prompt already taken a lot of tokens.
I think you should add num_ctx on every input message sent to Ollama, as set in Maximum Tokens setting on MindMac. Or maybe you can add to Parameter settings on main window?
For now I have to make my own modelfile which explicitly set num_ctx. For Mistral model should set 32768, for Llama3 should 8192. This makes Ollama model a bit messy to run with MindMac, I have to "compile" my own model.
MindMac
Adam Ram Excellent observations. In fact, we should focus on two key aspects:
- Max Tokens: the maximum number of tokens to generate in the completion (referred to as num_predictin Ollama).
- Window Context Length: size of the context window used to generate the next token (num_ctxin Ollama).
Presently, in Ollama models, when you specify a Max Token value under the Model Detail View, the Max Token slider becomes available in the Parameter Config View. This value is forwarded as the
num_predict
value.I am considering introducing an option to adjust the Window Context Length (
num_ctx
) for Ollama models. Please stay tuned.Adam Ram
MindMac Great, thanks. Previously I posted a feedback that blame it as a MindMac bug, but I've trying on Ollama side. All Ollama model from it's original repo are all producing garbage, especially when my Occupations/system prompt are lengthy like ±4k tokens. Surely default Ollama 2k num_ctx is simply insufficient, thus it always producing garbage.
I don't know about num_predict, but in fact it is producing garbage and make Ollama very very limited functionality to just a chatbot which like a non-MindMac native experience. I believe you can make a workaround, so I don't have to create custom modelfile.
MindMac
planned