Inheritance diagram for LLMUnity.LLMClient:

[legend]

Additional Inherited Members
Public Member Functions inherited from LLMUnity.LLMCharacter
void	Awake ()
	The Unity Awake function that initializes the state before the application starts. The following actions are executed:

virtual string	GetSavePath (string filename)

virtual string	GetJsonSavePath (string filename)

virtual string	GetCacheSavePath (string filename)

void	SetPrompt (string newPrompt, bool clearChat=true)
	Set the system prompt for the LLMCharacter.

async Task	LoadTemplate ()
	Load the chat template of the LLMCharacter.

async void	SetGrammar (string path)
	Set the grammar file of the LLMCharacter.

void	AddMessage (string role, string content)

void	AddPlayerMessage (string content)

void	AddAIMessage (string content)

async Task< string >	Chat (string query, Callback< string > callback=null, EmptyCallback completionCallback=null, bool addToHistory=true)
	Chat functionality of the LLM. It calls the LLM completion based on the provided query including the previous chat history. The function allows callbacks when the response is partially or fully received. The question is added to the history if specified.

async Task< string >	Complete (string prompt, Callback< string > callback=null, EmptyCallback completionCallback=null)
	Pure completion functionality of the LLM. It calls the LLM completion based solely on the provided prompt (no formatting by the chat template). The function allows callbacks when the response is partially or fully received.

async Task	Warmup (EmptyCallback completionCallback=null)
	Allow to warm-up a model by processing the prompt. The prompt processing will be cached (if cachePrompt=true) allowing for faster initialisation. The function allows callback for when the prompt is processed and the response received.

async Task< string >	AskTemplate ()
	Asks the LLM for the chat template to use.

async Task< List< int > >	Tokenize (string query, Callback< List< int > > callback=null)
	Tokenises the provided query.

async Task< string >	Detokenize (List< int > tokens, Callback< string > callback=null)
	Detokenises the provided tokens to a string.

async Task< List< float > >	Embeddings (string query, Callback< List< float > > callback=null)
	Computes the embeddings of the provided input.

virtual async Task< string >	Save (string filename)
	Saves the chat history and cache to the provided filename / relative path.

virtual async Task< string >	Load (string filename)
	Load the chat history and cache from the provided filename / relative path.

void	CancelRequests ()
	Cancel the ongoing requests e.g. Chat, Complete.

Public Attributes inherited from LLMUnity.LLMCharacter
bool	advancedOptions = false
	toggle to show/hide advanced options in the GameObject

bool	remote = false
	toggle to use remote LLM server or local LLM

LLM	llm
	the LLM object to use

string	host = "localhost"
	host to use for the LLM server

int	port = 13333
	port to use for the LLM server

int	numRetries = 10
	number of retries to use for the LLM server requests (-1 = infinite)

string	APIKey
	allows to use a server with API key

string	save = ""
	file to save the chat history. The file is saved only for Chat calls with addToHistory set to true. The file will be saved within the persistentDataPath directory (see https://docs.unity3d.com/ScriptReference/Application-persistentDataPath.html).

bool	saveCache = false
	toggle to save the LLM cache. This speeds up the prompt calculation but also requires ~100MB of space per character.

bool	debugPrompt = false
	select to log the constructed prompt the Unity Editor.

bool	stream = true
	option to receive the reply from the model as it is produced (recommended!). If it is not selected, the full reply from the model is received in one go

string	grammar = null
	grammar file used for the LLM in .cbnf format (relative to the Assets/StreamingAssets folder)

bool	cachePrompt = true
	option to cache the prompt as it is being created by the chat to avoid reprocessing the entire prompt every time (default: true)

int	slot = -1
	specify which slot of the server to use for computation (affects caching)

int	seed = 0
	seed for reproducibility. For random results every time set to -1.

int	numPredict = 256
	number of tokens to predict (-1 = infinity, -2 = until context filled). This is the amount of tokens the model will maximum predict. When N predict is reached the model will stop generating. This means words / sentences might not get finished if this is too low.

float	temperature = 0.2f
	LLM temperature, lower values give more deterministic answers. The temperature setting adjusts how random the generated responses are. Turning it up makes the generated choices more varied and unpredictable. Turning it down makes the generated responses more predictable and focused on the most likely options.

int	topK = 40
	top-k sampling (0 = disabled). The top k value controls the top k most probable tokens at each step of generation. This value can help fine tune the output and make this adhere to specific patterns or constraints.

float	topP = 0.9f
	top-p sampling (1.0 = disabled). The top p value controls the cumulative probability of generated tokens. The model will generate tokens until this theshold (p) is reached. By lowering this value you can shorten output & encourage / discourage more diverse output.

float	minP = 0.05f
	minimum probability for a token to be used. The probability is defined relative to the probability of the most likely token.

float	repeatPenalty = 1.1f
	control the repetition of token sequences in the generated text. The penalty is applied to repeated tokens.

float	presencePenalty = 0f
	repeated token presence penalty (0.0 = disabled). Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

float	frequencyPenalty = 0f
	repeated token frequency penalty (0.0 = disabled). Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

float	tfsZ = 1f
	enable tail free sampling with parameter z (1.0 = disabled).

float	typicalP = 1f
	enable locally typical sampling with parameter p (1.0 = disabled).

int	repeatLastN = 64
	last n tokens to consider for penalizing repetition (0 = disabled, -1 = ctx-size).

bool	penalizeNl = true
	penalize newline tokens when applying the repeat penalty.

string	penaltyPrompt
	prompt for the purpose of the penalty evaluation. Can be either null, a string or an array of numbers representing tokens (null/"" = use original prompt)

int	mirostat = 0
	enable Mirostat sampling, controlling perplexity during text generation (0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0).

float	mirostatTau = 5f
	set the Mirostat target entropy, parameter tau.

float	mirostatEta = 0.1f
	set the Mirostat learning rate, parameter eta.

int	nProbs = 0
	if greater than 0, the response also contains the probabilities of top N tokens for each generated token.

bool	ignoreEos = false
	ignore end of stream token and continue generating.

int	nKeep = -1
	number of tokens to retain from the prompt when the model runs out of context (-1 = LLMCharacter prompt tokens if setNKeepToPrompt is set to true).

List< string >	stop = new List<string>()
	stopwords to stop the LLM in addition to the default stopwords from the chat template.

Dictionary< int, string >	logitBias = null
	the logit bias option allows to manually adjust the likelihood of specific tokens appearing in the generated text. By providing a token ID and a positive or negative bias value, you can increase or decrease the probability of that token being generated.

string	playerName = "user"
	the name of the player

string	AIName = "assistant"
	the name of the AI

string	prompt = "A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions."
	a description of the AI role. This defines the LLMCharacter system prompt

bool	setNKeepToPrompt = true
	option to set the number of tokens to retain from the prompt (nKeep) based on the LLMCharacter system prompt

Detailed Description

Definition at line 5 of file LLMClient.cs.

Constructor & Destructor Documentation

◆ LLMClient()

LLMUnity.LLMClient.LLMClient ( )

inline

Definition at line 7 of file LLMClient.cs.

The documentation for this class was generated from the following file:

Runtime/LLMClient.cs

Additional Inherited Members

Detailed Description

Constructor & Destructor Documentation

◆ LLMClient()