The llmedge-examples repo demonstrates multiple real-world uses.

Example Activities

The llmedge-examples repository contains complete working demonstrations:

Activity Purpose
LocalAssetDemoActivity Load bundled GGUF model, blocking & streaming inference
HuggingFaceDemoActivity Download from HF Hub, reasoning controls
ImageToTextActivity Camera capture, OCR text extraction
LlavaVisionActivity Vision model integration (prepared architecture)
StableDiffusionActivity On-device image generation
VideoGenerationActivity On-device text-to-video generation (Wan 2.1)
RagActivity PDF indexing, vector search, Q&A
TTSActivity Text-to-speech synthesis (Bark)
STTActivity Speech-to-text transcription (Whisper)

LocalAssetDemoActivity

Demonstrates loading a bundled model asset and running inference using LLMEdgeManager.

Key patterns:

val modelPath = copyAssetIfNeeded("YourModel.gguf")

// Run inference (Manager handles loading & caching)
val response = LLMEdgeManager.generateText(
    context = context,
    params = LLMEdgeManager.TextGenerationParams(
        prompt = "Say 'hello from llmedge'.",
        modelPath = modelPath,
        modelId = "local-model" // Unique ID for caching
    )
)

ImageToTextActivity

Demonstrates camera integration and OCR text extraction using LLMEdgeManager.

Key features:

  • Camera permission handling
  • High-resolution image capture
  • ML Kit OCR integration via Manager

Code snippet:

// Process image
val bitmap = ImageUtils.fileToBitmap(file)
ivPreview.setImageBitmap(bitmap)

// Extract text (handles engine lifecycle automatically)
val text = LLMEdgeManager.extractText(context, bitmap)
tvResult.text = text.ifEmpty { "(no text detected)" }

HuggingFaceDemoActivity

Shows how to download models from Hugging Face Hub and run inference using LLMEdgeManager.

Key features:

  • Model download with progress callback
  • Heap-aware parameter selection
  • Thinking mode configuration

Code snippet:

// 1. Download with progress
LLMEdgeManager.downloadModel(
    context = context,
    modelId = "unsloth/Qwen3-0.6B-GGUF",
    filename = "Qwen3-0.6B-Q4_K_M.gguf",
    onProgress = { downloaded, total -> /* update UI */ }
)

// 2. Generate text (model is now cached)
val response = LLMEdgeManager.generateText(
    context = context,
    params = LLMEdgeManager.TextGenerationParams(
        prompt = "List two quick facts about running GGUF models on Android.",
        modelId = "unsloth/Qwen3-0.6B-GGUF",
        modelFilename = "Qwen3-0.6B-Q4_K_M.gguf",
        thinkingMode = SmolLM.ThinkingMode.DISABLED // Optional reasoning control
    )
)

RagActivity

Complete RAG (Retrieval-Augmented Generation) pipeline with PDF indexing and Q&A.

Key features:

  • PDF document picker with persistent permissions
  • Sentence embeddings (ONNX Runtime)
  • Vector store with cosine similarity search
  • Context-aware answer generation

Full workflow:

// 1. Get shared LLM instance from Manager
// (Ensures efficient resource sharing with other app components)
val llm = LLMEdgeManager.getSmolLM(context)

// 2. Initialize RAG engine with embedding config
val rag = RAGEngine(
    context = context,
    smolLM = llm,
    splitter = TextSplitter(chunkSize = 600, chunkOverlap = 120),
    embeddingConfig = EmbeddingConfig(
        modelAssetPath = "embeddings/all-minilm-l6-v2/model.onnx",
        tokenizerAssetPath = "embeddings/all-minilm-l6-v2/tokenizer.json"
    )
)
rag.init()

// 3. Index a PDF document
val count = rag.indexPdf(pdfUri)
Log.d("RAG", "Indexed $count chunks")

// 4. Ask questions
val answer = rag.ask("What are the key points?", topK = 5)

Important notes:

  • Scanned PDFs require OCR before indexing (PDFBox extracts text-based only)
  • Embedding model must be in assets/ directory
  • Vector store persists to JSON in app files directory
  • Adjust chunk size/overlap based on document type

StableDiffusionActivity

Demonstrates on-device image generation using LLMEdgeManager.

Key patterns:

// Generate image (Manager handles downloading, caching, VAE loading, and memory safety)
val bitmap = LLMEdgeManager.generateImage(
    context = context,
    params = LLMEdgeManager.ImageGenerationParams(
        prompt = "a cute cat",
        width = 256, // Start small on mobile
        height = 256,
        steps = 20,
        cfgScale = 7.0f
    )
)

// Display result
imageView.setImageBitmap(bitmap)

Important notes:

  • Start with 128×128 or 256x256 on devices with <4GB RAM
  • LLMEdgeManager automatically enables CPU offloading if memory is tight
  • Generating images is memory-intensive; close other apps for best results

VideoGenerationActivity

Demonstrates on-device text-to-video generation using Wan 2.1.

Key patterns:

// 1. Configure params for mobile-friendly generation
val params = LLMEdgeManager.VideoGenerationParams(
    prompt = "A dog running in the park",
    width = 512,      // Wan models require multiples of 64 between 256-960
    height = 512,
    videoFrames = 8,  // Start with fewer frames (8-16)
    steps = 20,
    cfgScale = 7.0f,
    flowShift = 3.0f, // Use Float.POSITIVE_INFINITY to auto-select
    forceSequentialLoad = true // Critical for devices with <12GB RAM
)

// 2. Generate video (returns list of bitmaps)
val frames = LLMEdgeManager.generateVideo(
    context = context,
    params = params
) { phase, current, total ->
    // Update progress UI
    updateProgress("$phase ($current/$total)")
}

// 3. Display or save frames
imageView.setImageBitmap(frames.first())

Important notes:

  • Sequential Load: Video models are large (Wan 2.1 is ~5GB). forceSequentialLoad = true is essential for mobile devices; it loads components (T5 encoder, Diffusion model, VAE) one by one to keep peak memory low.
  • Frame Count: Generating 8-16 frames takes significant time. Provide progress feedback.
  • LLMEdgeManager: This activity uses the high-level LLMEdgeManager which handles the complex sequentially loading logic automatically.

LlavaVisionActivity

Demonstrates vision-capable LLM integration using LLMEdgeManager.

Key patterns:

// 1. Preprocess image
val bmp = ImageUtils.imageToBitmap(context, uri)
val scaledBmp = ImageUtils.preprocessImage(bmp, correctOrientation = true, maxDimension = 1024)

// 2. Run OCR (Optional grounding)
val ocrText = LLMEdgeManager.extractText(context, scaledBmp)

// 3. Build Prompt (e.g. ChatML format for Phi-3)
val prompt = """
    <|system|>You are a helpful assistant.<|end|>
    <|user|>
    Context (OCR): $ocrText

    Describe this image.<|end|>
    <|assistant|>
""".trimIndent()

// 4. Run Vision Analysis
val result = LLMEdgeManager.analyzeImage(
    context = context,
    params = LLMEdgeManager.VisionAnalysisParams(
        image = scaledBmp,
        prompt = prompt
    )
)

Status: Uses LLMEdgeManager to orchestrate the experimental vision pipeline (loading projector, encoding image, running inference).


TTSActivity

Demonstrates text-to-speech synthesis using Bark via LLMEdgeManager.

Key features:

  • Text input for speech synthesis
  • Automatic model download from Hugging Face (843MB)
  • Progress tracking during generation (semantic, coarse, fine encoding)
  • Audio playback of generated speech
  • WAV file saving

Key patterns:

import io.aatricks.llmedge.LLMEdgeManager
import io.aatricks.llmedge.BarkTTS

// Generate speech (model auto-downloads on first use)
val audio = LLMEdgeManager.synthesizeSpeech(
    context = context,
    params = LLMEdgeManager.SpeechSynthesisParams(
        text = "Hello, world!",
        nThreads = 8  // Use more threads for better performance
    )
) { step: BarkTTS.EncodingStep, progress: Int ->
    // Update progress UI
    updateProgress("${step.name}: $progress%")
}

// Play the generated audio
val audioTrack = AudioTrack.Builder()
    .setAudioAttributes(AudioAttributes.Builder()
        .setUsage(AudioAttributes.USAGE_MEDIA)
        .setContentType(AudioAttributes.CONTENT_TYPE_SPEECH)
        .build())
    .setAudioFormat(AudioFormat.Builder()
        .setEncoding(AudioFormat.ENCODING_PCM_FLOAT)
        .setSampleRate(audio.sampleRate)
        .setChannelMask(AudioFormat.CHANNEL_OUT_MONO)
        .build())
    .setBufferSizeInBytes(audio.samples.size * 4)
    .build()

audioTrack.write(audio.samples, 0, audio.samples.size, AudioTrack.WRITE_BLOCKING)
audioTrack.play()

// Save to WAV file
LLMEdgeManager.synthesizeSpeechToFile(
    context = context,
    text = "Hello, world!",
    outputFile = File(cacheDir, "output.wav")
)

Additional Resources

  • Architecture for flow diagrams (RAG, OCR, JNI loading)
  • Usage for API reference and concepts
  • Quirks for troubleshooting specific issues