Class implementing the LLM server. More...

Inheritance diagram for LLMUnity.LLM:

Public Member Functions
async void	Awake ()
	The Unity Awake function that starts the LLM server.

async Task	WaitUntilReady ()
	Allows to wait until the LLM is ready.

void	SetModel (string path)
	Allows to set the model used by the LLM. The model provided is copied to the Assets/StreamingAssets folder that allows it to also work in the build. Models supported are in .gguf format.

void	SetLora (string path, float weight=1)
	Allows to set a LORA model to use in the LLM. The model provided is copied to the Assets/StreamingAssets folder that allows it to also work in the build. Models supported are in .gguf format.

void	AddLora (string path, float weight=1)
	Allows to add a LORA model to use in the LLM. The model provided is copied to the Assets/StreamingAssets folder that allows it to also work in the build. Models supported are in .gguf format.

void	RemoveLora (string path)
	Allows to remove a LORA model from the LLM. Models supported are in .gguf format.

void	RemoveLoras ()
	Allows to remove all LORA models from the LLM.

void	SetLoraWeight (string path, float weight)
	Allows to change the weight (scale) of a LORA model in the LLM.

void	SetLoraWeights (Dictionary< string, float > loraToWeight)
	Allows to change the weights (scale) of the LORA models in the LLM.

void	UpdateLoras ()

void	SetTemplate (string templateName, bool setDirty=true)
	Set the chat template for the LLM.

void	SetEmbeddings (int embeddingLength, bool embeddingsOnly)
	Set LLM Embedding parameters.

void	SetSSLCert (string path)
	Use a SSL certificate for the LLM server.

void	SetSSLKey (string path)
	Use a SSL key for the LLM server.

string	GetTemplate ()
	Returns the chat template of the LLM.

int	Register (LLMCaller llmCaller)
	Registers a local LLMCaller object. This allows to bind the LLMCaller "client" to a specific slot of the LLM.

void	Update ()
	The Unity Update function. It is used to retrieve the LLM replies.

async Task< string >	Tokenize (string json)
	Tokenises the provided query.

async Task< string >	Detokenize (string json)
	Detokenises the provided query.

async Task< string >	Embeddings (string json)
	Computes the embeddings of the provided query.

void	ApplyLoras ()
	Sets the lora scale, only works after the LLM service has started.

async Task< List< LoraWeightResult > >	ListLoras ()
	Gets a list of the lora adapters.

async Task< string >	Slot (string json)
	Allows to save / restore the state of a slot.

async Task< string >	Completion (string json, Callback< string > streamCallback=null)
	Allows to use the chat and completion functionality of the LLM.

void	CancelRequest (int id_slot)
	Allows to cancel the requests in a specific slot of the LLM.

void	Destroy ()
	Stops and destroys the LLM.

void	OnDestroy ()
	The Unity OnDestroy function called when the onbject is destroyed. The function StopProcess is called to stop the LLM server.

Static Public Member Functions
static async Task< bool >	WaitUntilModelSetup (Callback< float > downloadProgressCallback=null)
	Allows to wait until the LLM models are downloaded and ready.

Public Attributes
bool	advancedOptions = false
	show/hide advanced options in the GameObject

bool	remote = false
	enable remote server functionality

int	port = 13333
	port to use for the remote LLM server

int	numThreads = -1
	number of threads to use (-1 = all)

int	numGPULayers = 0
	number of model layers to offload to the GPU (0 = GPU not used). If the user's GPU is not supported, the LLM will fall back to the CPU

bool	debug = false
	log the output of the LLM in the Unity Editor.

int	parallelPrompts = -1
	number of prompts that can happen in parallel (-1 = number of LLMCaller objects)

bool	dontDestroyOnLoad = true
	do not destroy the LLM GameObject when loading a new Scene.

int	contextSize = 8192
	Size of the prompt context (0 = context size of the model). This is the number of tokens the model can take as input when generating responses.

int	batchSize = 512
	Batch size for prompt processing.

string	model = ""
	LLM model to use (.gguf format)

string	chatTemplate = ChatTemplate.DefaultTemplate
	Chat template for the model.

string	lora = ""
	LORA models to use (.gguf format)

string	loraWeights = ""
	the weights of the LORA models being used.

bool	flashAttention = false
	enable use of flash attention

string	APIKey
	API key to use for the server.

string	SSLCertPath = ""

string	SSLKeyPath = ""

Properties
bool	started = false `[get]`
	Boolean set to true if the server has started and is ready to receive requests, false otherwise.

bool	failed = false `[get]`
	Boolean set to true if the server has failed to start.

static bool	modelSetupFailed = false `[get]`
	Boolean set to true if the models were not downloaded successfully.

static bool	modelSetupComplete = false `[get]`
	Boolean set to true if the server has started and is ready to receive requests, false otherwise.

Detailed Description

Class implementing the LLM server.

Definition at line 18 of file LLM.cs.

Constructor & Destructor Documentation

◆ LLM()

LLMUnity.LLM.LLM ( )

inline

Definition at line 110 of file LLM.cs.

Member Function Documentation

◆ AddLora()

void LLMUnity.LLM.AddLora	(	string	path,
		float	weight = 1 )

inline

Allows to add a LORA model to use in the LLM. The model provided is copied to the Assets/StreamingAssets folder that allows it to also work in the build. Models supported are in .gguf format.

Parameters

path	path to LORA model to use (.gguf format)

Definition at line 273 of file LLM.cs.

Here is the caller graph for this function:

◆ ApplyLoras()

void LLMUnity.LLM.ApplyLoras ( )

inline

Sets the lora scale, only works after the LLM service has started.

Returns: switch result

Definition at line 721 of file LLM.cs.

Here is the caller graph for this function:

◆ Awake()

async void LLMUnity.LLM.Awake ( )

inline

The Unity Awake function that starts the LLM server.

Definition at line 127 of file LLM.cs.

◆ CancelRequest()

void LLMUnity.LLM.CancelRequest ( int id_slot )

inline

Allows to cancel the requests in a specific slot of the LLM.

Parameters

id_slot slot of the LLM

Definition at line 798 of file LLM.cs.

◆ Completion()

async Task< string > LLMUnity.LLM.Completion	(	string	json,
		Callback< string >	streamCallback = null )

inline

Allows to use the chat and completion functionality of the LLM.

Parameters

json	json request containing the query
streamCallback	callback function to call with intermediate responses

Returns: completion result

Definition at line 780 of file LLM.cs.

◆ Destroy()

void LLMUnity.LLM.Destroy ( )

inline

Stops and destroys the LLM.

Definition at line 808 of file LLM.cs.

Here is the caller graph for this function:

◆ Detokenize()

async Task< string > LLMUnity.LLM.Detokenize ( string json )

inline

Detokenises the provided query.

Parameters

json	json request containing the query

Returns: detokenisation result

Definition at line 692 of file LLM.cs.

◆ Embeddings()

async Task< string > LLMUnity.LLM.Embeddings ( string json )

inline

Computes the embeddings of the provided query.

Parameters

json	json request containing the query

Returns: embeddings result

Definition at line 707 of file LLM.cs.

◆ GetTemplate()

string LLMUnity.LLM.GetTemplate ( )

inline

Returns the chat template of the LLM.

Returns: chat template of the LLM

Definition at line 400 of file LLM.cs.

◆ ListLoras()

async Task< List< LoraWeightResult > > LLMUnity.LLM.ListLoras ( )

inline

Gets a list of the lora adapters.

Returns: list of lara adapters

Definition at line 746 of file LLM.cs.

◆ OnDestroy()

void LLMUnity.LLM.OnDestroy ( )

inline

The Unity OnDestroy function called when the onbject is destroyed. The function StopProcess is called to stop the LLM server.

Definition at line 843 of file LLM.cs.

◆ Register()

int LLMUnity.LLM.Register ( LLMCaller llmCaller )

inline

Registers a local LLMCaller object. This allows to bind the LLMCaller "client" to a specific slot of the LLM.

Parameters

llmCaller

Returns

Definition at line 570 of file LLM.cs.

◆ RemoveLora()

void LLMUnity.LLM.RemoveLora ( string path )

inline

Allows to remove a LORA model from the LLM. Models supported are in .gguf format.

Parameters

path	path to LORA model to remove (.gguf format)

Definition at line 285 of file LLM.cs.

Here is the caller graph for this function:

◆ RemoveLoras()

void LLMUnity.LLM.RemoveLoras ( )

inline

Allows to remove all LORA models from the LLM.

Definition at line 295 of file LLM.cs.

◆ SetEmbeddings()

void LLMUnity.LLM.SetEmbeddings	(	int	embeddingLength,
		bool	embeddingsOnly )

inline

Set LLM Embedding parameters.

Parameters

embeddingLength	number of embedding dimensions
embeddingsOnly	if true, the LLM will be used only for embeddings

Definition at line 352 of file LLM.cs.

Here is the caller graph for this function:

◆ SetLora()

void LLMUnity.LLM.SetLora	(	string	path,
		float	weight = 1 )

inline

Allows to set a LORA model to use in the LLM. The model provided is copied to the Assets/StreamingAssets folder that allows it to also work in the build. Models supported are in .gguf format.

Parameters

path	path to LORA model to use (.gguf format)

Definition at line 260 of file LLM.cs.

◆ SetLoraWeight()

void LLMUnity.LLM.SetLoraWeight	(	string	path,
		float	weight )

inline

Allows to change the weight (scale) of a LORA model in the LLM.

Parameters

path	path of LORA model to change (.gguf format)
weight	weight of LORA

Definition at line 307 of file LLM.cs.

◆ SetLoraWeights()

void LLMUnity.LLM.SetLoraWeights ( Dictionary< string, float > loraToWeight )

inline

Allows to change the weights (scale) of the LORA models in the LLM.

Parameters

loraToWeight Dictionary (string, float) mapping the path of LORA models with weights to change

Definition at line 318 of file LLM.cs.

◆ SetModel()

void LLMUnity.LLM.SetModel ( string path )

inline

Allows to set the model used by the LLM. The model provided is copied to the Assets/StreamingAssets folder that allows it to also work in the build. Models supported are in .gguf format.

Parameters

path	path to model to use (.gguf format)

Definition at line 232 of file LLM.cs.

◆ SetSSLCert()

void LLMUnity.LLM.SetSSLCert ( string path )

inline

Use a SSL certificate for the LLM server.

Parameters

templateName the SSL certificate path

Definition at line 380 of file LLM.cs.

◆ SetSSLKey()

void LLMUnity.LLM.SetSSLKey ( string path )

inline

Use a SSL key for the LLM server.

Parameters

templateName the SSL key path

Definition at line 390 of file LLM.cs.

◆ SetTemplate()

void LLMUnity.LLM.SetTemplate	(	string	templateName,
		bool	setDirty = true )

inline

Set the chat template for the LLM.

Parameters

templateName the chat template to use. The available templates can be found in the ChatTemplate.templates.Keys array

Definition at line 338 of file LLM.cs.

Here is the caller graph for this function:

◆ Slot()

async Task< string > LLMUnity.LLM.Slot ( string json )

inline

Allows to save / restore the state of a slot.

Parameters

json	json request containing the query

Returns: slot result

Definition at line 764 of file LLM.cs.

◆ Tokenize()

async Task< string > LLMUnity.LLM.Tokenize ( string json )

inline

Tokenises the provided query.

Parameters

json	json request containing the query

Returns: tokenisation result

Definition at line 677 of file LLM.cs.

◆ Update()

void LLMUnity.LLM.Update ( )

inline

The Unity Update function. It is used to retrieve the LLM replies.

Definition at line 604 of file LLM.cs.

◆ UpdateLoras()

void LLMUnity.LLM.UpdateLoras ( )

inline

Definition at line 325 of file LLM.cs.

◆ WaitUntilModelSetup()

static async Task< bool > LLMUnity.LLM.WaitUntilModelSetup ( Callback< float > downloadProgressCallback = null )

inlinestatic

Allows to wait until the LLM models are downloaded and ready.

Parameters

downloadProgressCallback function to call with the download progress (float)

Definition at line 162 of file LLM.cs.

◆ WaitUntilReady()

async Task LLMUnity.LLM.WaitUntilReady ( )

inline

Allows to wait until the LLM is ready.

Definition at line 153 of file LLM.cs.

Member Data Documentation

◆ advancedOptions

bool LLMUnity.LLM.advancedOptions = false

show/hide advanced options in the GameObject

Definition at line 22 of file LLM.cs.

◆ APIKey

string LLMUnity.LLM.APIKey

API key to use for the server.

Definition at line 77 of file LLM.cs.

◆ batchSize

int LLMUnity.LLM.batchSize = 512

Batch size for prompt processing.

Definition at line 51 of file LLM.cs.

◆ chatTemplate

string LLMUnity.LLM.chatTemplate = ChatTemplate.DefaultTemplate

Chat template for the model.

Definition at line 65 of file LLM.cs.

◆ contextSize

int LLMUnity.LLM.contextSize = 8192

Size of the prompt context (0 = context size of the model). This is the number of tokens the model can take as input when generating responses.

Definition at line 48 of file LLM.cs.

◆ debug

bool LLMUnity.LLM.debug = false

log the output of the LLM in the Unity Editor.

Definition at line 38 of file LLM.cs.

◆ dontDestroyOnLoad

bool LLMUnity.LLM.dontDestroyOnLoad = true

do not destroy the LLM GameObject when loading a new Scene.

Definition at line 44 of file LLM.cs.

◆ flashAttention

bool LLMUnity.LLM.flashAttention = false

enable use of flash attention

Definition at line 74 of file LLM.cs.

◆ lora

string LLMUnity.LLM.lora = ""

LORA models to use (.gguf format)

Definition at line 68 of file LLM.cs.

◆ loraWeights

string LLMUnity.LLM.loraWeights = ""

the weights of the LORA models being used.

Definition at line 71 of file LLM.cs.

◆ model

string LLMUnity.LLM.model = ""

LLM model to use (.gguf format)

Definition at line 62 of file LLM.cs.

◆ numGPULayers

int LLMUnity.LLM.numGPULayers = 0

number of model layers to offload to the GPU (0 = GPU not used). If the user's GPU is not supported, the LLM will fall back to the CPU

Definition at line 35 of file LLM.cs.

◆ numThreads

int LLMUnity.LLM.numThreads = -1

number of threads to use (-1 = all)

Definition at line 31 of file LLM.cs.

◆ parallelPrompts

int LLMUnity.LLM.parallelPrompts = -1

number of prompts that can happen in parallel (-1 = number of LLMCaller objects)

Definition at line 41 of file LLM.cs.

◆ port

int LLMUnity.LLM.port = 13333

port to use for the remote LLM server

Definition at line 28 of file LLM.cs.

◆ remote

bool LLMUnity.LLM.remote = false

enable remote server functionality

Definition at line 25 of file LLM.cs.

◆ SSLCertPath

string LLMUnity.LLM.SSLCertPath = ""

Definition at line 82 of file LLM.cs.

◆ SSLKeyPath

string LLMUnity.LLM.SSLKeyPath = ""

Definition at line 86 of file LLM.cs.

Property Documentation

◆ failed

bool LLMUnity.LLM.failed = false

get

Boolean set to true if the server has failed to start.

Definition at line 55 of file LLM.cs.

◆ modelSetupComplete

bool LLMUnity.LLM.modelSetupComplete = false

staticget

Boolean set to true if the server has started and is ready to receive requests, false otherwise.

Definition at line 59 of file LLM.cs.

◆ modelSetupFailed

bool LLMUnity.LLM.modelSetupFailed = false

staticget

Boolean set to true if the models were not downloaded successfully.

Definition at line 57 of file LLM.cs.

◆ started

bool LLMUnity.LLM.started = false

get

Boolean set to true if the server has started and is ready to receive requests, false otherwise.

Definition at line 53 of file LLM.cs.

The documentation for this class was generated from the following file:

Runtime/LLM.cs

Public Member Functions

Static Public Member Functions

Public Attributes

Properties

Detailed Description

Constructor & Destructor Documentation

◆ LLM()

Member Function Documentation

◆ AddLora()

◆ ApplyLoras()

◆ Awake()

◆ CancelRequest()

◆ Completion()

◆ Destroy()

◆ Detokenize()

◆ Embeddings()

◆ GetTemplate()

◆ ListLoras()

◆ OnDestroy()

◆ Register()

◆ RemoveLora()

◆ RemoveLoras()

◆ SetEmbeddings()

◆ SetLora()

◆ SetLoraWeight()

◆ SetLoraWeights()

◆ SetModel()

◆ SetSSLCert()

◆ SetSSLKey()

◆ SetTemplate()

◆ Slot()

◆ Tokenize()

◆ Update()

◆ UpdateLoras()

◆ WaitUntilModelSetup()

◆ WaitUntilReady()

Member Data Documentation

◆ advancedOptions

◆ APIKey

◆ batchSize

◆ chatTemplate

◆ contextSize

◆ debug

◆ dontDestroyOnLoad

◆ flashAttention

◆ lora

◆ loraWeights

◆ model

◆ numGPULayers

◆ numThreads

◆ parallelPrompts

◆ port

◆ remote

◆ SSLCertPath

◆ SSLKeyPath

Property Documentation

◆ failed

◆ modelSetupComplete

◆ modelSetupFailed

◆ started