LlamaLib  v2.0.2
Cross-platform library for local LLMs
Loading...
Searching...
No Matches
Overview

Cross-Platform High-Level LLM Library

License: Apache Reddit LinkedIn GitHub Repo stars Documentation

LlamaLib is a high-level C++ and C# library for running Large Language Models (LLMs) anywhere - from PCs to mobile devices and VR headsets.


At a glance

  • High-Level API
    C++ and C# implementations with intuitive object-oriented design.
  • 📦 Self-Contained and Embedded
    Runs embedded within your application.
    No need for a separate server, open ports or external processes.
    Zero external dependencies.
  • 🌍 Runs Anywhere
    Cross-platform and cross-device.
    Works on all major platforms:

    • Desktop: Windows, macOS, Linux
    • Mobile: Android, iOS
    • VR/AR: Meta Quest, Apple Vision, Magic Leap

    and hardware architectures:

    • CPU: Intel, AMD, Apple Silicon
    • GPU: NVIDIA, AMD, Metal
  • 🔍 Architecture Detection at runtime
    Automatically selects the optimal backend at runtime supporting all major GPU and CPU architectures.
  • 💾 Small footprint
    Integration requires around 100 MB for CPU architectures and offers GPU support with 70MB (Vulkan) / 370 MB (tinyBLAS) / 1.3 GB (cuBLAS).
  • 🛠️ Production ready
    Designed for easy integration into C++ and C# applications.
    Supports both local and client-server deployment.

Why LlamaLib?

Developer API

  • Direct implementation of LLM operations (completion, tokenization, embeddings)
  • Clean architecture for services, clients, and agents
  • Simple server-client setup with built-in SSL and authentication support

Universal Deployment

  • The only library that lets you build for any hardware with runtime detection unlike alternatives limited to specific GPU vendors or CPU-only execution
  • GPU backend auto-selection: Automatically chooses NVIDIA, AMD, Metal or switch to CPU
  • CPU optimization: Identifies and uses optimal CPU instruction sets

Production Ready

  • Embedded deployment: No need for open ports or external processes
  • Small footprint: Compact builds ideal for PC or mobile deployment
  • Battle-tested: Powers LLM for Unity, the most widely used LLM integration for games

How to help

  • Star the repo and spread the word!
  • ❤️ Sponsor development or support with a Ko-fi
  • 💬 Join our Discord community
  • 🐛 Contribute with feature requests, bug reports, or pull requests

Projects using LlamaLib

  • LLM for Unity: The most widely used solution to integrate LLMs in games

Quick Start

Documentation

Language Guides:

Core classes

LlamaLib provides three main classes for different use cases:

Class Purpose Best For
LLMService LLM backend engine Building standalone apps or servers
LLMClient Local or remote LLM access Connecting to existing LLM services
LLMAgent Conversational AI with memory Building chatbots or interactive AI

C++ Example

#include "LlamaLib.h"
int main() {
// LlamaLib automatically detects your hardware and selects optimal backend
LLMService llm("path/to/model.gguf");
/* Optional parameters:
threads=-1, // CPU threads (-1 = auto)
gpu_layers=0, // GPU layers (0 = CPU only)
num_slots=1 // parallel slots/clients
*/
// Start service
llm.start();
// Generate completion
std::string response = llm.completion("Hello, how are you?");
std::cout << response << std::endl;
// Supports streaming operation to your function:
// llm.completion(prompt, streaming_callback);
return 0;
}
Main include file for the LLama library.
Runtime loader for LLM libraries.
Definition LLM_runtime.h:63

📖 See the C++ guide for installation, building, and complete API reference.

C# Example

using LlamaLib;
class Program {
static void Main() {
// Same API, different language
LLMService llm = new LLMService("path/to/model.gguf");
/* Optional parameters:
threads=-1, // CPU threads (-1 = auto)
gpu_layers=0, // GPU layers (0 = CPU only)
num_slots=1 // parallel slots/clients
*/
llm.Start();
string response = llm.Completion("Hello, how are you?");
Console.WriteLine(response);
// Supports streaming operation to your function:
// llm.Completion(prompt, streamingCallback);
}
}

📖 See the C# guide for installation, NuGet setup, and complete API reference.


License

LlamaLib is licensed under the Apache 2.0.