LLM for Unity
v2.2.5
Create characters in Unity with LLMs!
|
Class implementing the LLM characters. More...
Public Member Functions | |
void | Awake () |
The Unity Awake function that initializes the state before the application starts. The following actions are executed: | |
virtual string | GetSavePath (string filename) |
virtual string | GetJsonSavePath (string filename) |
virtual string | GetCacheSavePath (string filename) |
void | SetPrompt (string newPrompt, bool clearChat=true) |
Set the system prompt for the LLMCharacter. | |
async Task | LoadTemplate () |
Load the chat template of the LLMCharacter. | |
async void | SetGrammar (string path) |
Set the grammar file of the LLMCharacter. | |
void | AddMessage (string role, string content) |
void | AddPlayerMessage (string content) |
void | AddAIMessage (string content) |
async Task< string > | Chat (string query, Callback< string > callback=null, EmptyCallback completionCallback=null, bool addToHistory=true) |
Chat functionality of the LLM. It calls the LLM completion based on the provided query including the previous chat history. The function allows callbacks when the response is partially or fully received. The question is added to the history if specified. | |
async Task< string > | Complete (string prompt, Callback< string > callback=null, EmptyCallback completionCallback=null) |
Pure completion functionality of the LLM. It calls the LLM completion based solely on the provided prompt (no formatting by the chat template). The function allows callbacks when the response is partially or fully received. | |
async Task | Warmup (EmptyCallback completionCallback=null) |
Allow to warm-up a model by processing the prompt. The prompt processing will be cached (if cachePrompt=true) allowing for faster initialisation. The function allows callback for when the prompt is processed and the response received. | |
async Task< string > | AskTemplate () |
Asks the LLM for the chat template to use. | |
async Task< List< int > > | Tokenize (string query, Callback< List< int > > callback=null) |
Tokenises the provided query. | |
async Task< string > | Detokenize (List< int > tokens, Callback< string > callback=null) |
Detokenises the provided tokens to a string. | |
async Task< List< float > > | Embeddings (string query, Callback< List< float > > callback=null) |
Computes the embeddings of the provided input. | |
virtual async Task< string > | Save (string filename) |
Saves the chat history and cache to the provided filename / relative path. | |
virtual async Task< string > | Load (string filename) |
Load the chat history and cache from the provided filename / relative path. | |
void | CancelRequests () |
Cancel the ongoing requests e.g. Chat, Complete. | |
Public Attributes | |
bool | advancedOptions = false |
toggle to show/hide advanced options in the GameObject | |
bool | remote = false |
toggle to use remote LLM server or local LLM | |
LLM | llm |
the LLM object to use | |
string | host = "localhost" |
host to use for the LLM server | |
int | port = 13333 |
port to use for the LLM server | |
int | numRetries = 10 |
number of retries to use for the LLM server requests (-1 = infinite) | |
string | APIKey |
allows to use a server with API key | |
string | save = "" |
file to save the chat history. The file is saved only for Chat calls with addToHistory set to true. The file will be saved within the persistentDataPath directory (see https://docs.unity3d.com/ScriptReference/Application-persistentDataPath.html). | |
bool | saveCache = false |
toggle to save the LLM cache. This speeds up the prompt calculation but also requires ~100MB of space per character. | |
bool | debugPrompt = false |
select to log the constructed prompt the Unity Editor. | |
bool | stream = true |
option to receive the reply from the model as it is produced (recommended!). If it is not selected, the full reply from the model is received in one go | |
string | grammar = null |
grammar file used for the LLM in .cbnf format (relative to the Assets/StreamingAssets folder) | |
bool | cachePrompt = true |
option to cache the prompt as it is being created by the chat to avoid reprocessing the entire prompt every time (default: true) | |
int | slot = -1 |
specify which slot of the server to use for computation (affects caching) | |
int | seed = 0 |
seed for reproducibility. For random results every time set to -1. | |
int | numPredict = 256 |
number of tokens to predict (-1 = infinity, -2 = until context filled). This is the amount of tokens the model will maximum predict. When N predict is reached the model will stop generating. This means words / sentences might not get finished if this is too low. | |
float | temperature = 0.2f |
LLM temperature, lower values give more deterministic answers. The temperature setting adjusts how random the generated responses are. Turning it up makes the generated choices more varied and unpredictable. Turning it down makes the generated responses more predictable and focused on the most likely options. | |
int | topK = 40 |
top-k sampling (0 = disabled). The top k value controls the top k most probable tokens at each step of generation. This value can help fine tune the output and make this adhere to specific patterns or constraints. | |
float | topP = 0.9f |
top-p sampling (1.0 = disabled). The top p value controls the cumulative probability of generated tokens. The model will generate tokens until this theshold (p) is reached. By lowering this value you can shorten output & encourage / discourage more diverse output. | |
float | minP = 0.05f |
minimum probability for a token to be used. The probability is defined relative to the probability of the most likely token. | |
float | repeatPenalty = 1.1f |
control the repetition of token sequences in the generated text. The penalty is applied to repeated tokens. | |
float | presencePenalty = 0f |
repeated token presence penalty (0.0 = disabled). Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics. | |
float | frequencyPenalty = 0f |
repeated token frequency penalty (0.0 = disabled). Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim. | |
float | tfsZ = 1f |
enable tail free sampling with parameter z (1.0 = disabled). | |
float | typicalP = 1f |
enable locally typical sampling with parameter p (1.0 = disabled). | |
int | repeatLastN = 64 |
last n tokens to consider for penalizing repetition (0 = disabled, -1 = ctx-size). | |
bool | penalizeNl = true |
penalize newline tokens when applying the repeat penalty. | |
string | penaltyPrompt |
prompt for the purpose of the penalty evaluation. Can be either null, a string or an array of numbers representing tokens (null/"" = use original prompt) | |
int | mirostat = 0 |
enable Mirostat sampling, controlling perplexity during text generation (0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0). | |
float | mirostatTau = 5f |
set the Mirostat target entropy, parameter tau. | |
float | mirostatEta = 0.1f |
set the Mirostat learning rate, parameter eta. | |
int | nProbs = 0 |
if greater than 0, the response also contains the probabilities of top N tokens for each generated token. | |
bool | ignoreEos = false |
ignore end of stream token and continue generating. | |
int | nKeep = -1 |
number of tokens to retain from the prompt when the model runs out of context (-1 = LLMCharacter prompt tokens if setNKeepToPrompt is set to true). | |
List< string > | stop = new List<string>() |
stopwords to stop the LLM in addition to the default stopwords from the chat template. | |
Dictionary< int, string > | logitBias = null |
the logit bias option allows to manually adjust the likelihood of specific tokens appearing in the generated text. By providing a token ID and a positive or negative bias value, you can increase or decrease the probability of that token being generated. | |
string | playerName = "user" |
the name of the player | |
string | AIName = "assistant" |
the name of the AI | |
string | prompt = "A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions." |
a description of the AI role. This defines the LLMCharacter system prompt | |
bool | setNKeepToPrompt = true |
option to set the number of tokens to retain from the prompt (nKeep) based on the LLMCharacter system prompt | |
Class implementing the LLM characters.
Definition at line 19 of file LLMCharacter.cs.
Definition at line 414 of file LLMCharacter.cs.
Definition at line 403 of file LLMCharacter.cs.
Definition at line 409 of file LLMCharacter.cs.
Asks the LLM for the chat template to use.
Definition at line 585 of file LLMCharacter.cs.
|
inline |
The Unity Awake function that initializes the state before the application starts. The following actions are executed:
Definition at line 143 of file LLMCharacter.cs.
|
inline |
Cancel the ongoing requests e.g. Chat, Complete.
Definition at line 726 of file LLMCharacter.cs.
|
inline |
Chat functionality of the LLM. It calls the LLM completion based on the provided query including the previous chat history. The function allows callbacks when the response is partially or fully received. The question is added to the history if specified.
query | user query |
callback | callback function that receives the response as string |
completionCallback | callback function called when the full response has been received |
addToHistory | whether to add the user query to the chat history |
Definition at line 491 of file LLMCharacter.cs.
|
inline |
Pure completion functionality of the LLM. It calls the LLM completion based solely on the provided prompt (no formatting by the chat template). The function allows callbacks when the response is partially or fully received.
prompt | user query |
callback | callback function that receives the response as string |
completionCallback | callback function called when the full response has been received |
Definition at line 544 of file LLMCharacter.cs.
|
inline |
Detokenises the provided tokens to a string.
tokens | tokens to detokenise |
callback | callback function called with the result string |
Definition at line 611 of file LLMCharacter.cs.
|
inline |
Computes the embeddings of the provided input.
tokens | input to compute the embeddings for |
callback | callback function called with the result string |
Definition at line 626 of file LLMCharacter.cs.
Definition at line 248 of file LLMCharacter.cs.
Definition at line 243 of file LLMCharacter.cs.
Definition at line 238 of file LLMCharacter.cs.
Load the chat history and cache from the provided filename / relative path.
filename | filename / relative path to load the chat history from |
Definition at line 669 of file LLMCharacter.cs.
Saves the chat history and cache to the provided filename / relative path.
filename | filename / relative path to save the chat history |
Definition at line 650 of file LLMCharacter.cs.
Set the grammar file of the LLMCharacter.
path | path to the grammar file |
Definition at line 350 of file LLMCharacter.cs.
Set the system prompt for the LLMCharacter.
newPrompt | the system prompt |
clearChat | whether to clear (true) or keep (false) the current chat history on top of the system prompt. |
Definition at line 279 of file LLMCharacter.cs.
|
inline |
Tokenises the provided query.
query | query to tokenise |
callback | callback function called with the result tokens |
Definition at line 596 of file LLMCharacter.cs.
|
inline |
Allow to warm-up a model by processing the prompt. The prompt processing will be cached (if cachePrompt=true) allowing for faster initialisation. The function allows callback for when the prompt is processed and the response received.
The function calls the Chat function with a predefined query without adding it to history.
completionCallback | callback function called when the full response has been received |
query | user prompt used during the initialisation (not added to history) |
Definition at line 567 of file LLMCharacter.cs.
toggle to show/hide advanced options in the GameObject
Definition at line 22 of file LLMCharacter.cs.
string LLMUnity.LLMCharacter.AIName = "assistant" |
the name of the AI
Definition at line 118 of file LLMCharacter.cs.
string LLMUnity.LLMCharacter.APIKey |
allows to use a server with API key
Definition at line 34 of file LLMCharacter.cs.
option to cache the prompt as it is being created by the chat to avoid reprocessing the entire prompt every time (default: true)
Definition at line 49 of file LLMCharacter.cs.
select to log the constructed prompt the Unity Editor.
Definition at line 42 of file LLMCharacter.cs.
float LLMUnity.LLMCharacter.frequencyPenalty = 0f |
repeated token frequency penalty (0.0 = disabled). Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Definition at line 83 of file LLMCharacter.cs.
grammar file used for the LLM in .cbnf format (relative to the Assets/StreamingAssets folder)
Definition at line 47 of file LLMCharacter.cs.
string LLMUnity.LLMCharacter.host = "localhost" |
host to use for the LLM server
Definition at line 28 of file LLMCharacter.cs.
ignore end of stream token and continue generating.
Definition at line 105 of file LLMCharacter.cs.
LLM LLMUnity.LLMCharacter.llm |
the LLM object to use
Definition at line 26 of file LLMCharacter.cs.
Dictionary<int, string> LLMUnity.LLMCharacter.logitBias = null |
the logit bias option allows to manually adjust the likelihood of specific tokens appearing in the generated text. By providing a token ID and a positive or negative bias value, you can increase or decrease the probability of that token being generated.
Definition at line 113 of file LLMCharacter.cs.
minimum probability for a token to be used. The probability is defined relative to the probability of the most likely token.
Definition at line 74 of file LLMCharacter.cs.
int LLMUnity.LLMCharacter.mirostat = 0 |
enable Mirostat sampling, controlling perplexity during text generation (0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0).
Definition at line 97 of file LLMCharacter.cs.
set the Mirostat learning rate, parameter eta.
Definition at line 101 of file LLMCharacter.cs.
set the Mirostat target entropy, parameter tau.
Definition at line 99 of file LLMCharacter.cs.
int LLMUnity.LLMCharacter.nKeep = -1 |
number of tokens to retain from the prompt when the model runs out of context (-1 = LLMCharacter prompt tokens if setNKeepToPrompt is set to true).
Definition at line 108 of file LLMCharacter.cs.
int LLMUnity.LLMCharacter.nProbs = 0 |
if greater than 0, the response also contains the probabilities of top N tokens for each generated token.
Definition at line 103 of file LLMCharacter.cs.
int LLMUnity.LLMCharacter.numPredict = 256 |
number of tokens to predict (-1 = infinity, -2 = until context filled). This is the amount of tokens the model will maximum predict. When N predict is reached the model will stop generating. This means words / sentences might not get finished if this is too low.
Definition at line 58 of file LLMCharacter.cs.
int LLMUnity.LLMCharacter.numRetries = 10 |
number of retries to use for the LLM server requests (-1 = infinite)
Definition at line 32 of file LLMCharacter.cs.
penalize newline tokens when applying the repeat penalty.
Definition at line 92 of file LLMCharacter.cs.
string LLMUnity.LLMCharacter.penaltyPrompt |
prompt for the purpose of the penalty evaluation. Can be either null, a string or an array of numbers representing tokens (null/"" = use original prompt)
Definition at line 95 of file LLMCharacter.cs.
string LLMUnity.LLMCharacter.playerName = "user" |
the name of the player
Definition at line 116 of file LLMCharacter.cs.
int LLMUnity.LLMCharacter.port = 13333 |
port to use for the LLM server
Definition at line 30 of file LLMCharacter.cs.
float LLMUnity.LLMCharacter.presencePenalty = 0f |
repeated token presence penalty (0.0 = disabled). Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
Definition at line 80 of file LLMCharacter.cs.
string LLMUnity.LLMCharacter.prompt = "A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions." |
a description of the AI role. This defines the LLMCharacter system prompt
Definition at line 120 of file LLMCharacter.cs.
toggle to use remote LLM server or local LLM
Definition at line 24 of file LLMCharacter.cs.
int LLMUnity.LLMCharacter.repeatLastN = 64 |
last n tokens to consider for penalizing repetition (0 = disabled, -1 = ctx-size).
Definition at line 90 of file LLMCharacter.cs.
control the repetition of token sequences in the generated text. The penalty is applied to repeated tokens.
Definition at line 77 of file LLMCharacter.cs.
string LLMUnity.LLMCharacter.save = "" |
file to save the chat history. The file is saved only for Chat calls with addToHistory set to true. The file will be saved within the persistentDataPath directory (see https://docs.unity3d.com/ScriptReference/Application-persistentDataPath.html).
Definition at line 38 of file LLMCharacter.cs.
toggle to save the LLM cache. This speeds up the prompt calculation but also requires ~100MB of space per character.
Definition at line 40 of file LLMCharacter.cs.
int LLMUnity.LLMCharacter.seed = 0 |
seed for reproducibility. For random results every time set to -1.
Definition at line 53 of file LLMCharacter.cs.
option to set the number of tokens to retain from the prompt (nKeep) based on the LLMCharacter system prompt
Definition at line 122 of file LLMCharacter.cs.
int LLMUnity.LLMCharacter.slot = -1 |
specify which slot of the server to use for computation (affects caching)
Definition at line 51 of file LLMCharacter.cs.
stopwords to stop the LLM in addition to the default stopwords from the chat template.
Definition at line 110 of file LLMCharacter.cs.
option to receive the reply from the model as it is produced (recommended!). If it is not selected, the full reply from the model is received in one go
Definition at line 45 of file LLMCharacter.cs.
LLM temperature, lower values give more deterministic answers. The temperature setting adjusts how random the generated responses are. Turning it up makes the generated choices more varied and unpredictable. Turning it down makes the generated responses more predictable and focused on the most likely options.
Definition at line 63 of file LLMCharacter.cs.
enable tail free sampling with parameter z (1.0 = disabled).
Definition at line 86 of file LLMCharacter.cs.
int LLMUnity.LLMCharacter.topK = 40 |
top-k sampling (0 = disabled). The top k value controls the top k most probable tokens at each step of generation. This value can help fine tune the output and make this adhere to specific patterns or constraints.
Definition at line 66 of file LLMCharacter.cs.
top-p sampling (1.0 = disabled). The top p value controls the cumulative probability of generated tokens. The model will generate tokens until this theshold (p) is reached. By lowering this value you can shorten output & encourage / discourage more diverse output.
Definition at line 71 of file LLMCharacter.cs.
enable locally typical sampling with parameter p (1.0 = disabled).
Definition at line 88 of file LLMCharacter.cs.