Abstract base class for Large Language Model operations. More...

#include <LLM.h>

Inheritance diagram for LLM:

Public Member Functions
virtual	~LLM ()=default
	Virtual destructor.

virtual std::vector< int >	tokenize (const std::string &query)
	Tokenize text.

virtual std::string	tokenize_json (const json &data)=0
	Tokenize input (override)

virtual std::string	detokenize (const std::vector< int32_t > &tokens)
	Convert tokens to text.

virtual std::string	detokenize_json (const json &data)=0
	Convert tokens back to text.

virtual std::vector< float >	embeddings (const std::string &query)
	Generate embeddings.

virtual std::string	embeddings_json (const json &data)=0
	Generate embeddings with HTTP response support.

virtual void	set_completion_params (json completion_params_)
	Set completion parameters.

virtual std::string	get_completion_params ()
	Get current completion parameters.

virtual std::string	completion (const std::string &prompt, CharArrayFn callback=nullptr, int id_slot=-1, bool return_response_json=false)
	Generate completion.

virtual std::string	completion_json (const json &data, CharArrayFn callback, bool callbackWithJSON)=0
	Generate text completion.

virtual void	set_grammar (std::string grammar_)
	Set grammar for constrained generation.

virtual std::string	get_grammar ()
	Get current grammar specification.

virtual std::string	apply_template (const json &messages)
	Apply template to messages.

virtual std::string	apply_template_json (const json &data)=0
	Apply a chat template to message data.

Static Public Member Functions
static bool	has_gpu_layers (const std::string &command)
	Check if command line arguments specify GPU layers.

static std::string	LLM_args_to_command (const std::string &model_path, int num_slots=1, int num_threads=-1, int num_GPU_layers=0, bool flash_attention=false, int context_size=4096, int batch_size=2048, bool embedding_only=false, const std::vector< std::string > &lora_paths={})
	Convert LLM parameters to command line arguments.

Public Attributes
int32_t	n_keep = 0
	Number of tokens to keep from the beginning of the context.

std::string	grammar = ""
	Grammar specification in GBNF format or JSON schema.

json	completion_params
	JSON object containing completion parameters.

Protected Member Functions
virtual json	build_apply_template_json (const json &messages)
	Build JSON for template application.

virtual std::string	parse_apply_template_json (const json &result)
	Parse template application result.

virtual json	build_tokenize_json (const std::string &query)
	Build JSON for tokenization.

virtual std::vector< int >	parse_tokenize_json (const json &result)
	Parse tokenization result.

virtual json	build_detokenize_json (const std::vector< int32_t > &tokens)
	Build JSON for detokenization.

virtual std::string	parse_detokenize_json (const json &result)
	Parse detokenization result.

virtual json	build_embeddings_json (const std::string &query)
	Build JSON for embeddings generation.

virtual std::vector< float >	parse_embeddings_json (const json &result)
	Parse embeddings result.

virtual json	build_completion_json (const std::string &prompt, int id_slot=-1)
	Build JSON for completion generation.

virtual std::string	parse_completion_json (const json &result)
	Parse completion result.

Detailed Description

Abstract base class for Large Language Model operations.

Provides the core interface for LLM functionality including text completion, tokenization, embeddings, and template application. This is the base class that all LLM implementations must inherit from.

Definition at line 59 of file LLM.h.

Member Function Documentation

◆ apply_template()

std::string LLM::apply_template ( const json & messages )

virtual

Apply template to messages.

Parameters

messages JSON array of chat messages

Returns: Formatted chat string

Definition at line 144 of file LLM.cpp.

Here is the caller graph for this function:

◆ apply_template_json()

virtual std::string LLM::apply_template_json ( const json & data )

pure virtual

Apply a chat template to message data.

Parameters

data	JSON object containing messages to format

Returns: Formatted string with template applied

Pure virtual method for applying chat templates to conversation data

Implemented in LLMAgent, LLMClient, LLMService, and LLMService.

Here is the caller graph for this function:

◆ build_apply_template_json()

json LLM::build_apply_template_json ( const json & messages )

protectedvirtual

Build JSON for template application.

Parameters

messages JSON array of chat messages

Returns: JSON object ready for apply_template_json

Definition at line 125 of file LLM.cpp.

Here is the caller graph for this function:

◆ build_completion_json()

json LLM::build_completion_json	(	const std::string &	prompt,
		int	id_slot = -1 )

protectedvirtual

Build JSON for completion generation.

Parameters

prompt	Input text prompt
id_slot	Slot ID for the request (-1 for auto)

Returns: JSON object ready for completion_json

Definition at line 235 of file LLM.cpp.

Here is the caller graph for this function:

◆ build_detokenize_json()

json LLM::build_detokenize_json ( const std::vector< int32_t > & tokens )

protectedvirtual

Build JSON for detokenization.

Parameters

tokens Vector of token IDs to convert

Returns: JSON object ready for detokenize_json

Definition at line 178 of file LLM.cpp.

Here is the caller graph for this function:

◆ build_embeddings_json()

json LLM::build_embeddings_json ( const std::string & query )

protectedvirtual

Build JSON for embeddings generation.

Parameters

query Text string to embed

Returns: JSON object ready for embeddings_json

Definition at line 204 of file LLM.cpp.

Here is the caller graph for this function:

◆ build_tokenize_json()

json LLM::build_tokenize_json ( const std::string & query )

protectedvirtual

Build JSON for tokenization.

Parameters

query Text string to tokenize

Returns: JSON object ready for tokenize_json

Definition at line 151 of file LLM.cpp.

Here is the caller graph for this function:

◆ completion()

std::string LLM::completion	(	const std::string &	prompt,
		CharArrayFn	callback = nullptr,
		int	id_slot = -1,
		bool	return_response_json = false )

virtual

Generate completion.

Parameters

prompt	Input text prompt
callback	Optional callback for streaming
id_slot	Slot ID for the request (-1 for auto)
return_response_json	Whether to return full JSON response

Returns: Generated completion text or JSON response

Definition at line 283 of file LLM.cpp.

Here is the caller graph for this function:

◆ completion_json()

virtual std::string LLM::completion_json	(	const json &	data,
		CharArrayFn	callback,
		bool	callbackWithJSON )

pure virtual

Generate text completion.

Parameters

data	JSON object containing prompt and parameters
callback	Optional callback function for streaming responses
callbackWithJSON	Whether callback receives JSON or plain text

Returns: JSON string containing generated completion text or JSON response

Pure virtual method for text generation with optional streaming

Implemented in LLMAgent, LLMClient, LLMService, and LLMService.

Here is the caller graph for this function:

◆ detokenize()

std::string LLM::detokenize ( const std::vector< int32_t > & tokens )

virtual

Convert tokens to text.

Parameters

tokens Vector of token IDs to convert

Returns: Detokenized text string

Definition at line 197 of file LLM.cpp.

Here is the caller graph for this function:

◆ detokenize_json()

virtual std::string LLM::detokenize_json ( const json & data )

pure virtual

Convert tokens back to text.

Parameters

data	JSON object containing token IDs

Returns: JSON string containing detokenized text

Pure virtual method for converting token sequences back to text

Implemented in LLMAgent, LLMClient, LLMService, and LLMService.

Here is the caller graph for this function:

◆ embeddings()

std::vector< float > LLM::embeddings ( const std::string & query )

virtual

Generate embeddings.

Parameters

query Text string to embed

Returns: Vector of embedding values

Definition at line 228 of file LLM.cpp.

Here is the caller graph for this function:

◆ embeddings_json()

virtual std::string LLM::embeddings_json ( const json & data )

pure virtual

Generate embeddings with HTTP response support.

Parameters

data	JSON object containing embedding request

Returns: JSON string with embedding data

Protected method used internally for server-based embedding generation

Implemented in LLMAgent, LLMClient, LLMService, and LLMService.

Here is the caller graph for this function:

◆ get_completion_params()

virtual std::string LLM::get_completion_params ( )

inlinevirtual

Get current completion parameters.

Returns: JSON string of current completion parameters

Definition at line 109 of file LLM.h.

◆ get_grammar()

virtual std::string LLM::get_grammar ( )

inlinevirtual

Get current grammar specification.

Returns: Current grammar string

Definition at line 134 of file LLM.h.

◆ has_gpu_layers()

bool LLM::has_gpu_layers ( const std::string & command )

static

Check if command line arguments specify GPU layers.

Parameters

command Command line string to analyze

Returns: true if GPU layers are specified, false otherwise

Definition at line 65 of file LLM.cpp.

Here is the caller graph for this function:

◆ LLM_args_to_command()

std::string LLM::LLM_args_to_command	(	const std::string &	model_path,
		int	num_slots = 1,
		int	num_threads = -1,
		int	num_GPU_layers = 0,
		bool	flash_attention = false,
		int	context_size = 4096,
		int	batch_size = 2048,
		bool	embedding_only = false,
		const std::vector< std::string > &	lora_paths = {} )

static

Convert LLM parameters to command line arguments.

Parameters

model_path	Path to the model file
num_slots	Number of parallel slots to use
num_threads	Number of CPU threads to use (-1 for auto)
num_GPU_layers	Number of layers to offload to GPU
flash_attention	Whether to use flash attention optimization
context_size	Maximum context length in tokens (default: 4096, 0 = loaded from model)
batch_size	Batch size for processing
embedding_only	Whether to run in embedding-only mode
lora_paths	Vector of paths to LoRA adapter files

Returns: Command line string with all parameters

Definition at line 46 of file LLM.cpp.

Here is the caller graph for this function:

◆ parse_apply_template_json()

std::string LLM::parse_apply_template_json ( const json & result )

protectedvirtual

Parse template application result.

Parameters

result JSON response from apply_template_json

Returns: Formatted chat string

Definition at line 132 of file LLM.cpp.

Here is the caller graph for this function:

◆ parse_completion_json()

std::string LLM::parse_completion_json ( const json & result )

protectedvirtual

Parse completion result.

Parameters

result JSON response from completion_json

Returns: Generated completion text

Definition at line 264 of file LLM.cpp.

Here is the caller graph for this function:

◆ parse_detokenize_json()

std::string LLM::parse_detokenize_json ( const json & result )

protectedvirtual

Parse detokenization result.

Parameters

result JSON response from detokenize_json

Returns: Detokenized text string

Definition at line 185 of file LLM.cpp.

Here is the caller graph for this function:

◆ parse_embeddings_json()

std::vector< float > LLM::parse_embeddings_json ( const json & result )

protectedvirtual

Parse embeddings result.

Parameters

result JSON response from embeddings_json

Returns: Vector of embedding values

Definition at line 211 of file LLM.cpp.

Here is the caller graph for this function:

◆ parse_tokenize_json()

std::vector< int > LLM::parse_tokenize_json ( const json & result )

protectedvirtual

Parse tokenization result.

Parameters

result JSON response from tokenize_json

Returns: Vector of token IDs

Definition at line 158 of file LLM.cpp.

Here is the caller graph for this function:

◆ set_completion_params()

virtual void LLM::set_completion_params ( json completion_params_ )

inlinevirtual

Set completion parameters.

Parameters

completion_params_ JSON object containing completion parameters

Parameters may include temperature, n_predict, etc.,

Definition at line 105 of file LLM.h.

Here is the caller graph for this function:

◆ set_grammar()

virtual void LLM::set_grammar ( std::string grammar_ )

inlinevirtual

Set grammar for constrained generation.

Parameters

grammar_ Grammar specification in GBNF format or JSON schema

See https://github.com/ggml-org/llama.cpp/tree/master/grammars for format details

Definition at line 130 of file LLM.h.

Here is the caller graph for this function:

◆ tokenize()

std::vector< int > LLM::tokenize ( const std::string & query )

virtual

Tokenize text.

Parameters

query Text string to tokenize

Returns: Vector of token IDs

Definition at line 170 of file LLM.cpp.

Here is the caller graph for this function:

◆ tokenize_json()

virtual std::string LLM::tokenize_json ( const json & data )

pure virtual

Tokenize input (override)

Parameters

data	JSON object containing text to tokenize

Returns: JSON string with token data

Implemented in LLMAgent, LLMClient, LLMService, and LLMService.

Here is the caller graph for this function:

Member Data Documentation

◆ completion_params

json LLM::completion_params

JSON object containing completion parameters.

Definition at line 64 of file LLM.h.

◆ grammar

std::string LLM::grammar = ""

Grammar specification in GBNF format or JSON schema.

Definition at line 63 of file LLM.h.

◆ n_keep

int32_t LLM::n_keep = 0

Number of tokens to keep from the beginning of the context.

Definition at line 62 of file LLM.h.

The documentation for this class was generated from the following files:

include/LLM.h
src/LLM.cpp

Public Member Functions

Static Public Member Functions

Public Attributes

Protected Member Functions

Detailed Description

Member Function Documentation

◆ apply_template()

◆ apply_template_json()

◆ build_apply_template_json()

◆ build_completion_json()

◆ build_detokenize_json()

◆ build_embeddings_json()

◆ build_tokenize_json()

◆ completion()

◆ completion_json()

◆ detokenize()

◆ detokenize_json()

◆ embeddings()

◆ embeddings_json()

◆ get_completion_params()

◆ get_grammar()

◆ has_gpu_layers()

◆ LLM_args_to_command()

◆ parse_apply_template_json()

◆ parse_completion_json()

◆ parse_detokenize_json()

◆ parse_embeddings_json()

◆ parse_tokenize_json()

◆ set_completion_params()

◆ set_grammar()

◆ tokenize()

◆ tokenize_json()

Member Data Documentation

◆ completion_params

◆ grammar

◆ n_keep