![]() |
LlamaLib
v2.0.2
Cross-platform library for local LLMs
|
Abstract base class for Large Language Model operations. More...
#include <LLM.h>
Public Member Functions | |
| virtual | ~LLM ()=default |
| Virtual destructor. | |
| virtual std::vector< int > | tokenize (const std::string &query) |
| Tokenize text. | |
| virtual std::string | tokenize_json (const json &data)=0 |
| Tokenize input (override) | |
| virtual std::string | detokenize (const std::vector< int32_t > &tokens) |
| Convert tokens to text. | |
| virtual std::string | detokenize_json (const json &data)=0 |
| Convert tokens back to text. | |
| virtual std::vector< float > | embeddings (const std::string &query) |
| Generate embeddings. | |
| virtual std::string | embeddings_json (const json &data)=0 |
| Generate embeddings with HTTP response support. | |
| virtual void | set_completion_params (json completion_params_) |
| Set completion parameters. | |
| virtual std::string | get_completion_params () |
| Get current completion parameters. | |
| virtual std::string | completion (const std::string &prompt, CharArrayFn callback=nullptr, int id_slot=-1, bool return_response_json=false) |
| Generate completion. | |
| virtual std::string | completion_json (const json &data, CharArrayFn callback, bool callbackWithJSON)=0 |
| Generate text completion. | |
| virtual void | set_grammar (std::string grammar_) |
| Set grammar for constrained generation. | |
| virtual std::string | get_grammar () |
| Get current grammar specification. | |
| virtual std::string | apply_template (const json &messages) |
| Apply template to messages. | |
| virtual std::string | apply_template_json (const json &data)=0 |
| Apply a chat template to message data. | |
Static Public Member Functions | |
| static bool | has_gpu_layers (const std::string &command) |
| Check if command line arguments specify GPU layers. | |
| static std::string | LLM_args_to_command (const std::string &model_path, int num_slots=1, int num_threads=-1, int num_GPU_layers=0, bool flash_attention=false, int context_size=4096, int batch_size=2048, bool embedding_only=false, const std::vector< std::string > &lora_paths={}) |
| Convert LLM parameters to command line arguments. | |
Public Attributes | |
| int32_t | n_keep = 0 |
| Number of tokens to keep from the beginning of the context. | |
| std::string | grammar = "" |
| Grammar specification in GBNF format or JSON schema. | |
| json | completion_params |
| JSON object containing completion parameters. | |
Protected Member Functions | |
| virtual json | build_apply_template_json (const json &messages) |
| Build JSON for template application. | |
| virtual std::string | parse_apply_template_json (const json &result) |
| Parse template application result. | |
| virtual json | build_tokenize_json (const std::string &query) |
| Build JSON for tokenization. | |
| virtual std::vector< int > | parse_tokenize_json (const json &result) |
| Parse tokenization result. | |
| virtual json | build_detokenize_json (const std::vector< int32_t > &tokens) |
| Build JSON for detokenization. | |
| virtual std::string | parse_detokenize_json (const json &result) |
| Parse detokenization result. | |
| virtual json | build_embeddings_json (const std::string &query) |
| Build JSON for embeddings generation. | |
| virtual std::vector< float > | parse_embeddings_json (const json &result) |
| Parse embeddings result. | |
| virtual json | build_completion_json (const std::string &prompt, int id_slot=-1) |
| Build JSON for completion generation. | |
| virtual std::string | parse_completion_json (const json &result) |
| Parse completion result. | |
Abstract base class for Large Language Model operations.
Provides the core interface for LLM functionality including text completion, tokenization, embeddings, and template application. This is the base class that all LLM implementations must inherit from.
|
virtual |
|
pure virtual |
Apply a chat template to message data.
| data | JSON object containing messages to format |
Pure virtual method for applying chat templates to conversation data
Implemented in LLMAgent, LLMClient, LLMService, and LLMService.
|
protectedvirtual |
|
protectedvirtual |
|
protectedvirtual |
|
protectedvirtual |
|
protectedvirtual |
|
virtual |
Generate completion.
| prompt | Input text prompt |
| callback | Optional callback for streaming |
| id_slot | Slot ID for the request (-1 for auto) |
| return_response_json | Whether to return full JSON response |
Definition at line 283 of file LLM.cpp.
|
pure virtual |
Generate text completion.
| data | JSON object containing prompt and parameters |
| callback | Optional callback function for streaming responses |
| callbackWithJSON | Whether callback receives JSON or plain text |
Pure virtual method for text generation with optional streaming
Implemented in LLMAgent, LLMClient, LLMService, and LLMService.
|
virtual |
|
pure virtual |
Convert tokens back to text.
| data | JSON object containing token IDs |
Pure virtual method for converting token sequences back to text
Implemented in LLMAgent, LLMClient, LLMService, and LLMService.
|
virtual |
|
pure virtual |
Generate embeddings with HTTP response support.
| data | JSON object containing embedding request |
Protected method used internally for server-based embedding generation
Implemented in LLMAgent, LLMClient, LLMService, and LLMService.
|
inlinevirtual |
|
inlinevirtual |
|
static |
|
static |
Convert LLM parameters to command line arguments.
| model_path | Path to the model file |
| num_slots | Number of parallel slots to use |
| num_threads | Number of CPU threads to use (-1 for auto) |
| num_GPU_layers | Number of layers to offload to GPU |
| flash_attention | Whether to use flash attention optimization |
| context_size | Maximum context length in tokens (default: 4096, 0 = loaded from model) |
| batch_size | Batch size for processing |
| embedding_only | Whether to run in embedding-only mode |
| lora_paths | Vector of paths to LoRA adapter files |
Definition at line 46 of file LLM.cpp.
|
protectedvirtual |
|
protectedvirtual |
|
protectedvirtual |
|
protectedvirtual |
|
protectedvirtual |
|
inlinevirtual |
|
inlinevirtual |
Set grammar for constrained generation.
| grammar_ | Grammar specification in GBNF format or JSON schema |
See https://github.com/ggml-org/llama.cpp/tree/master/grammars for format details
Definition at line 130 of file LLM.h.
|
virtual |
|
pure virtual |
Tokenize input (override)
| data | JSON object containing text to tokenize |
Implemented in LLMAgent, LLMClient, LLMService, and LLMService.
| json LLM::completion_params |
| std::string LLM::grammar = "" |
| int32_t LLM::n_keep = 0 |