Chat Configuration
While BoltAI Chat UI is simple to use, power users can always tweak advanced configurations for better AI responses.
How to customize the chat configurtion?
You can then customize basic information such as chat title, AI service & model, custom system instruction or advanced parameters such as Context Limits, Temperature, Max Tokens...
Advanced Parameters
Context Limit
By default, most large language models (LLM) do not maintain the between each request. It's up to the chat client to decide which content to be included.
BoltAI supports multiple strategy to control this chat context:
All Previous Messages: when enabled, BoltAI would include all previous messages in the conversation to LLM.
No Context: each request contains only the System Instruction and your prompt.
First n Messages*: BoltAI would pick the first n messages in the conversation (excluding your prompt) as the chat context.
Last n Messages: BoltAI would pick the last n messages in the conversation (excluding your prompt) as the chat context.
By default, BoltAI use last 10 messages as the chat context.
Note:
Depending on your AI service/model, the actual number of messages in the chat context might be slighly different. The reason is because some model such as Anthropic Claude requires a strict order of user/assistant messages.
Temperature
This setting influences the variety in the model's responses. Lower values lead to more predictable and typical responses, while higher values encourage more diverse and less common responses
Max Tokens
This sets the upper limit for the number of tokens the model can generate in response. It won't produce more than this limit. The maximum value is the context length minus the prompt length.
Presence Penalty
Adjusts how often the model repeats specific tokens already used in the input. Higher values make such repetition less likely, while negative values do the opposite. Token penalty does not scale with the number of occurrences. Negative values will encourage token reuse.
Frequency Penalty
This setting aims to control the repetition of tokens based on how often they appear in the input. It tries to use less frequently those tokens that appear more in the input, proportional to how frequently they occur. Token penalty scales with the number of occurrences. Negative values will encourage token reuse.
Top P
This setting limits the model's choices to a percentage of likely tokens: only the top tokens whose probabilities add up to P. A lower value makes the model's responses more predictable, while the default setting allows for a full range of token choices. Think of it like a dynamic Top-K.
Top K
This limits the model's choice of tokens at each step, making it choose from a smaller set. A value of 1 means the model will always pick the most likely next token, leading to predictable results. By default this setting is disabled, making the model to consider all choices.
Reference: https://openrouter.ai/docs/parameters
Last updated