LlamaLib  v2.0.2
Cross-platform library for local LLMs
Loading...
Searching...
No Matches
LLM Class Referenceabstract

Abstract base class for Large Language Model operations. More...

#include <LLM.h>

Inheritance diagram for LLM:
[legend]

Public Member Functions

virtual ~LLM ()=default
 Virtual destructor.
 
virtual std::vector< int > tokenize (const std::string &query)
 Tokenize text.
 
virtual std::string tokenize_json (const json &data)=0
 Tokenize input (override)
 
virtual std::string detokenize (const std::vector< int32_t > &tokens)
 Convert tokens to text.
 
virtual std::string detokenize_json (const json &data)=0
 Convert tokens back to text.
 
virtual std::vector< float > embeddings (const std::string &query)
 Generate embeddings.
 
virtual std::string embeddings_json (const json &data)=0
 Generate embeddings with HTTP response support.
 
virtual void set_completion_params (json completion_params_)
 Set completion parameters.
 
virtual std::string get_completion_params ()
 Get current completion parameters.
 
virtual std::string completion (const std::string &prompt, CharArrayFn callback=nullptr, int id_slot=-1, bool return_response_json=false)
 Generate completion.
 
virtual std::string completion_json (const json &data, CharArrayFn callback, bool callbackWithJSON)=0
 Generate text completion.
 
virtual void set_grammar (std::string grammar_)
 Set grammar for constrained generation.
 
virtual std::string get_grammar ()
 Get current grammar specification.
 
virtual std::string apply_template (const json &messages)
 Apply template to messages.
 
virtual std::string apply_template_json (const json &data)=0
 Apply a chat template to message data.
 

Static Public Member Functions

static bool has_gpu_layers (const std::string &command)
 Check if command line arguments specify GPU layers.
 
static std::string LLM_args_to_command (const std::string &model_path, int num_slots=1, int num_threads=-1, int num_GPU_layers=0, bool flash_attention=false, int context_size=4096, int batch_size=2048, bool embedding_only=false, const std::vector< std::string > &lora_paths={})
 Convert LLM parameters to command line arguments.
 

Public Attributes

int32_t n_keep = 0
 Number of tokens to keep from the beginning of the context.
 
std::string grammar = ""
 Grammar specification in GBNF format or JSON schema.
 
json completion_params
 JSON object containing completion parameters.
 

Protected Member Functions

virtual json build_apply_template_json (const json &messages)
 Build JSON for template application.
 
virtual std::string parse_apply_template_json (const json &result)
 Parse template application result.
 
virtual json build_tokenize_json (const std::string &query)
 Build JSON for tokenization.
 
virtual std::vector< int > parse_tokenize_json (const json &result)
 Parse tokenization result.
 
virtual json build_detokenize_json (const std::vector< int32_t > &tokens)
 Build JSON for detokenization.
 
virtual std::string parse_detokenize_json (const json &result)
 Parse detokenization result.
 
virtual json build_embeddings_json (const std::string &query)
 Build JSON for embeddings generation.
 
virtual std::vector< float > parse_embeddings_json (const json &result)
 Parse embeddings result.
 
virtual json build_completion_json (const std::string &prompt, int id_slot=-1)
 Build JSON for completion generation.
 
virtual std::string parse_completion_json (const json &result)
 Parse completion result.
 

Detailed Description

Abstract base class for Large Language Model operations.

Provides the core interface for LLM functionality including text completion, tokenization, embeddings, and template application. This is the base class that all LLM implementations must inherit from.

Definition at line 59 of file LLM.h.

Member Function Documentation

◆ apply_template()

std::string LLM::apply_template ( const json & messages)
virtual

Apply template to messages.

Parameters
messagesJSON array of chat messages
Returns
Formatted chat string

Definition at line 144 of file LLM.cpp.

Here is the caller graph for this function:

◆ apply_template_json()

virtual std::string LLM::apply_template_json ( const json & data)
pure virtual

Apply a chat template to message data.

Parameters
dataJSON object containing messages to format
Returns
Formatted string with template applied

Pure virtual method for applying chat templates to conversation data

Implemented in LLMAgent, LLMClient, LLMService, and LLMService.

Here is the caller graph for this function:

◆ build_apply_template_json()

json LLM::build_apply_template_json ( const json & messages)
protectedvirtual

Build JSON for template application.

Parameters
messagesJSON array of chat messages
Returns
JSON object ready for apply_template_json

Definition at line 125 of file LLM.cpp.

Here is the caller graph for this function:

◆ build_completion_json()

json LLM::build_completion_json ( const std::string & prompt,
int id_slot = -1 )
protectedvirtual

Build JSON for completion generation.

Parameters
promptInput text prompt
id_slotSlot ID for the request (-1 for auto)
Returns
JSON object ready for completion_json

Definition at line 235 of file LLM.cpp.

Here is the caller graph for this function:

◆ build_detokenize_json()

json LLM::build_detokenize_json ( const std::vector< int32_t > & tokens)
protectedvirtual

Build JSON for detokenization.

Parameters
tokensVector of token IDs to convert
Returns
JSON object ready for detokenize_json

Definition at line 178 of file LLM.cpp.

Here is the caller graph for this function:

◆ build_embeddings_json()

json LLM::build_embeddings_json ( const std::string & query)
protectedvirtual

Build JSON for embeddings generation.

Parameters
queryText string to embed
Returns
JSON object ready for embeddings_json

Definition at line 204 of file LLM.cpp.

Here is the caller graph for this function:

◆ build_tokenize_json()

json LLM::build_tokenize_json ( const std::string & query)
protectedvirtual

Build JSON for tokenization.

Parameters
queryText string to tokenize
Returns
JSON object ready for tokenize_json

Definition at line 151 of file LLM.cpp.

Here is the caller graph for this function:

◆ completion()

std::string LLM::completion ( const std::string & prompt,
CharArrayFn callback = nullptr,
int id_slot = -1,
bool return_response_json = false )
virtual

Generate completion.

Parameters
promptInput text prompt
callbackOptional callback for streaming
id_slotSlot ID for the request (-1 for auto)
return_response_jsonWhether to return full JSON response
Returns
Generated completion text or JSON response

Definition at line 283 of file LLM.cpp.

Here is the caller graph for this function:

◆ completion_json()

virtual std::string LLM::completion_json ( const json & data,
CharArrayFn callback,
bool callbackWithJSON )
pure virtual

Generate text completion.

Parameters
dataJSON object containing prompt and parameters
callbackOptional callback function for streaming responses
callbackWithJSONWhether callback receives JSON or plain text
Returns
JSON string containing generated completion text or JSON response

Pure virtual method for text generation with optional streaming

Implemented in LLMAgent, LLMClient, LLMService, and LLMService.

Here is the caller graph for this function:

◆ detokenize()

std::string LLM::detokenize ( const std::vector< int32_t > & tokens)
virtual

Convert tokens to text.

Parameters
tokensVector of token IDs to convert
Returns
Detokenized text string

Definition at line 197 of file LLM.cpp.

Here is the caller graph for this function:

◆ detokenize_json()

virtual std::string LLM::detokenize_json ( const json & data)
pure virtual

Convert tokens back to text.

Parameters
dataJSON object containing token IDs
Returns
JSON string containing detokenized text

Pure virtual method for converting token sequences back to text

Implemented in LLMAgent, LLMClient, LLMService, and LLMService.

Here is the caller graph for this function:

◆ embeddings()

std::vector< float > LLM::embeddings ( const std::string & query)
virtual

Generate embeddings.

Parameters
queryText string to embed
Returns
Vector of embedding values

Definition at line 228 of file LLM.cpp.

Here is the caller graph for this function:

◆ embeddings_json()

virtual std::string LLM::embeddings_json ( const json & data)
pure virtual

Generate embeddings with HTTP response support.

Parameters
dataJSON object containing embedding request
Returns
JSON string with embedding data

Protected method used internally for server-based embedding generation

Implemented in LLMAgent, LLMClient, LLMService, and LLMService.

Here is the caller graph for this function:

◆ get_completion_params()

virtual std::string LLM::get_completion_params ( )
inlinevirtual

Get current completion parameters.

Returns
JSON string of current completion parameters

Definition at line 109 of file LLM.h.

◆ get_grammar()

virtual std::string LLM::get_grammar ( )
inlinevirtual

Get current grammar specification.

Returns
Current grammar string

Definition at line 134 of file LLM.h.

◆ has_gpu_layers()

bool LLM::has_gpu_layers ( const std::string & command)
static

Check if command line arguments specify GPU layers.

Parameters
commandCommand line string to analyze
Returns
true if GPU layers are specified, false otherwise

Definition at line 65 of file LLM.cpp.

Here is the caller graph for this function:

◆ LLM_args_to_command()

std::string LLM::LLM_args_to_command ( const std::string & model_path,
int num_slots = 1,
int num_threads = -1,
int num_GPU_layers = 0,
bool flash_attention = false,
int context_size = 4096,
int batch_size = 2048,
bool embedding_only = false,
const std::vector< std::string > & lora_paths = {} )
static

Convert LLM parameters to command line arguments.

Parameters
model_pathPath to the model file
num_slotsNumber of parallel slots to use
num_threadsNumber of CPU threads to use (-1 for auto)
num_GPU_layersNumber of layers to offload to GPU
flash_attentionWhether to use flash attention optimization
context_sizeMaximum context length in tokens (default: 4096, 0 = loaded from model)
batch_sizeBatch size for processing
embedding_onlyWhether to run in embedding-only mode
lora_pathsVector of paths to LoRA adapter files
Returns
Command line string with all parameters

Definition at line 46 of file LLM.cpp.

Here is the caller graph for this function:

◆ parse_apply_template_json()

std::string LLM::parse_apply_template_json ( const json & result)
protectedvirtual

Parse template application result.

Parameters
resultJSON response from apply_template_json
Returns
Formatted chat string

Definition at line 132 of file LLM.cpp.

Here is the caller graph for this function:

◆ parse_completion_json()

std::string LLM::parse_completion_json ( const json & result)
protectedvirtual

Parse completion result.

Parameters
resultJSON response from completion_json
Returns
Generated completion text

Definition at line 264 of file LLM.cpp.

Here is the caller graph for this function:

◆ parse_detokenize_json()

std::string LLM::parse_detokenize_json ( const json & result)
protectedvirtual

Parse detokenization result.

Parameters
resultJSON response from detokenize_json
Returns
Detokenized text string

Definition at line 185 of file LLM.cpp.

Here is the caller graph for this function:

◆ parse_embeddings_json()

std::vector< float > LLM::parse_embeddings_json ( const json & result)
protectedvirtual

Parse embeddings result.

Parameters
resultJSON response from embeddings_json
Returns
Vector of embedding values

Definition at line 211 of file LLM.cpp.

Here is the caller graph for this function:

◆ parse_tokenize_json()

std::vector< int > LLM::parse_tokenize_json ( const json & result)
protectedvirtual

Parse tokenization result.

Parameters
resultJSON response from tokenize_json
Returns
Vector of token IDs

Definition at line 158 of file LLM.cpp.

Here is the caller graph for this function:

◆ set_completion_params()

virtual void LLM::set_completion_params ( json completion_params_)
inlinevirtual

Set completion parameters.

Parameters
completion_params_JSON object containing completion parameters

Parameters may include temperature, n_predict, etc.,

Definition at line 105 of file LLM.h.

Here is the caller graph for this function:

◆ set_grammar()

virtual void LLM::set_grammar ( std::string grammar_)
inlinevirtual

Set grammar for constrained generation.

Parameters
grammar_Grammar specification in GBNF format or JSON schema

See https://github.com/ggml-org/llama.cpp/tree/master/grammars for format details

Definition at line 130 of file LLM.h.

Here is the caller graph for this function:

◆ tokenize()

std::vector< int > LLM::tokenize ( const std::string & query)
virtual

Tokenize text.

Parameters
queryText string to tokenize
Returns
Vector of token IDs

Definition at line 170 of file LLM.cpp.

Here is the caller graph for this function:

◆ tokenize_json()

virtual std::string LLM::tokenize_json ( const json & data)
pure virtual

Tokenize input (override)

Parameters
dataJSON object containing text to tokenize
Returns
JSON string with token data

Implemented in LLMAgent, LLMClient, LLMService, and LLMService.

Here is the caller graph for this function:

Member Data Documentation

◆ completion_params

json LLM::completion_params

JSON object containing completion parameters.

Definition at line 64 of file LLM.h.

◆ grammar

std::string LLM::grammar = ""

Grammar specification in GBNF format or JSON schema.

Definition at line 63 of file LLM.h.

◆ n_keep

int32_t LLM::n_keep = 0

Number of tokens to keep from the beginning of the context.

Definition at line 62 of file LLM.h.


The documentation for this class was generated from the following files: