The llmedge-examples repo demonstrates multiple real-world uses.
Example Activities
The llmedge-examples repository contains complete working demonstrations:
| Activity | Purpose |
|---|---|
LocalAssetDemoActivity |
Load bundled GGUF model, blocking & streaming inference |
HuggingFaceDemoActivity |
Download from HF Hub, reasoning controls |
ImageToTextActivity |
Camera capture, OCR text extraction |
LlavaVisionActivity |
Vision model integration (prepared architecture) |
StableDiffusionActivity |
On-device image generation |
VideoGenerationActivity |
On-device text-to-video generation (Wan 2.1) |
RagActivity |
PDF indexing, vector search, Q&A |
TTSActivity |
Text-to-speech synthesis (Bark) |
STTActivity |
Speech-to-text transcription (Whisper) |
LocalAssetDemoActivity
Demonstrates loading a bundled model asset and running inference using LLMEdgeManager.
Key patterns:
val modelPath = copyAssetIfNeeded("YourModel.gguf")
// Run inference (Manager handles loading & caching)
val response = LLMEdgeManager.generateText(
context = context,
params = LLMEdgeManager.TextGenerationParams(
prompt = "Say 'hello from llmedge'.",
modelPath = modelPath,
modelId = "local-model" // Unique ID for caching
)
)
ImageToTextActivity
Demonstrates camera integration and OCR text extraction using LLMEdgeManager.
Key features:
- Camera permission handling
- High-resolution image capture
- ML Kit OCR integration via Manager
Code snippet:
// Process image
val bitmap = ImageUtils.fileToBitmap(file)
ivPreview.setImageBitmap(bitmap)
// Extract text (handles engine lifecycle automatically)
val text = LLMEdgeManager.extractText(context, bitmap)
tvResult.text = text.ifEmpty { "(no text detected)" }
HuggingFaceDemoActivity
Shows how to download models from Hugging Face Hub and run inference using LLMEdgeManager.
Key features:
- Model download with progress callback
- Heap-aware parameter selection
- Thinking mode configuration
Code snippet:
// 1. Download with progress
LLMEdgeManager.downloadModel(
context = context,
modelId = "unsloth/Qwen3-0.6B-GGUF",
filename = "Qwen3-0.6B-Q4_K_M.gguf",
onProgress = { downloaded, total -> /* update UI */ }
)
// 2. Generate text (model is now cached)
val response = LLMEdgeManager.generateText(
context = context,
params = LLMEdgeManager.TextGenerationParams(
prompt = "List two quick facts about running GGUF models on Android.",
modelId = "unsloth/Qwen3-0.6B-GGUF",
modelFilename = "Qwen3-0.6B-Q4_K_M.gguf",
thinkingMode = SmolLM.ThinkingMode.DISABLED // Optional reasoning control
)
)
RagActivity
Complete RAG (Retrieval-Augmented Generation) pipeline with PDF indexing and Q&A.
Key features:
- PDF document picker with persistent permissions
- Sentence embeddings (ONNX Runtime)
- Vector store with cosine similarity search
- Context-aware answer generation
Full workflow:
// 1. Get shared LLM instance from Manager
// (Ensures efficient resource sharing with other app components)
val llm = LLMEdgeManager.getSmolLM(context)
// 2. Initialize RAG engine with embedding config
val rag = RAGEngine(
context = context,
smolLM = llm,
splitter = TextSplitter(chunkSize = 600, chunkOverlap = 120),
embeddingConfig = EmbeddingConfig(
modelAssetPath = "embeddings/all-minilm-l6-v2/model.onnx",
tokenizerAssetPath = "embeddings/all-minilm-l6-v2/tokenizer.json"
)
)
rag.init()
// 3. Index a PDF document
val count = rag.indexPdf(pdfUri)
Log.d("RAG", "Indexed $count chunks")
// 4. Ask questions
val answer = rag.ask("What are the key points?", topK = 5)
Important notes:
- Scanned PDFs require OCR before indexing (PDFBox extracts text-based only)
- Embedding model must be in
assets/directory - Vector store persists to JSON in app files directory
- Adjust chunk size/overlap based on document type
StableDiffusionActivity
Demonstrates on-device image generation using LLMEdgeManager.
Key patterns:
// Generate image (Manager handles downloading, caching, VAE loading, and memory safety)
val bitmap = LLMEdgeManager.generateImage(
context = context,
params = LLMEdgeManager.ImageGenerationParams(
prompt = "a cute cat",
width = 256, // Start small on mobile
height = 256,
steps = 20,
cfgScale = 7.0f
)
)
// Display result
imageView.setImageBitmap(bitmap)
Important notes:
- Start with 128×128 or 256x256 on devices with <4GB RAM
LLMEdgeManagerautomatically enables CPU offloading if memory is tight- Generating images is memory-intensive; close other apps for best results
VideoGenerationActivity
Demonstrates on-device text-to-video generation using Wan 2.1.
Key patterns:
// 1. Configure params for mobile-friendly generation
val params = LLMEdgeManager.VideoGenerationParams(
prompt = "A dog running in the park",
width = 512, // Wan models require multiples of 64 between 256-960
height = 512,
videoFrames = 8, // Start with fewer frames (8-16)
steps = 20,
cfgScale = 7.0f,
flowShift = 3.0f, // Use Float.POSITIVE_INFINITY to auto-select
forceSequentialLoad = true // Critical for devices with <12GB RAM
)
// 2. Generate video (returns list of bitmaps)
val frames = LLMEdgeManager.generateVideo(
context = context,
params = params
) { phase, current, total ->
// Update progress UI
updateProgress("$phase ($current/$total)")
}
// 3. Display or save frames
imageView.setImageBitmap(frames.first())
Important notes:
- Sequential Load: Video models are large (Wan 2.1 is ~5GB).
forceSequentialLoad = trueis essential for mobile devices; it loads components (T5 encoder, Diffusion model, VAE) one by one to keep peak memory low. - Frame Count: Generating 8-16 frames takes significant time. Provide progress feedback.
- LLMEdgeManager: This activity uses the high-level
LLMEdgeManagerwhich handles the complex sequentially loading logic automatically.
LlavaVisionActivity
Demonstrates vision-capable LLM integration using LLMEdgeManager.
Key patterns:
// 1. Preprocess image
val bmp = ImageUtils.imageToBitmap(context, uri)
val scaledBmp = ImageUtils.preprocessImage(bmp, correctOrientation = true, maxDimension = 1024)
// 2. Run OCR (Optional grounding)
val ocrText = LLMEdgeManager.extractText(context, scaledBmp)
// 3. Build Prompt (e.g. ChatML format for Phi-3)
val prompt = """
<|system|>You are a helpful assistant.<|end|>
<|user|>
Context (OCR): $ocrText
Describe this image.<|end|>
<|assistant|>
""".trimIndent()
// 4. Run Vision Analysis
val result = LLMEdgeManager.analyzeImage(
context = context,
params = LLMEdgeManager.VisionAnalysisParams(
image = scaledBmp,
prompt = prompt
)
)
Status: Uses LLMEdgeManager to orchestrate the experimental vision pipeline (loading projector, encoding image, running inference).
TTSActivity
Demonstrates text-to-speech synthesis using Bark via LLMEdgeManager.
Key features:
- Text input for speech synthesis
- Automatic model download from Hugging Face (843MB)
- Progress tracking during generation (semantic, coarse, fine encoding)
- Audio playback of generated speech
- WAV file saving
Key patterns:
import io.aatricks.llmedge.LLMEdgeManager
import io.aatricks.llmedge.BarkTTS
// Generate speech (model auto-downloads on first use)
val audio = LLMEdgeManager.synthesizeSpeech(
context = context,
params = LLMEdgeManager.SpeechSynthesisParams(
text = "Hello, world!",
nThreads = 8 // Use more threads for better performance
)
) { step: BarkTTS.EncodingStep, progress: Int ->
// Update progress UI
updateProgress("${step.name}: $progress%")
}
// Play the generated audio
val audioTrack = AudioTrack.Builder()
.setAudioAttributes(AudioAttributes.Builder()
.setUsage(AudioAttributes.USAGE_MEDIA)
.setContentType(AudioAttributes.CONTENT_TYPE_SPEECH)
.build())
.setAudioFormat(AudioFormat.Builder()
.setEncoding(AudioFormat.ENCODING_PCM_FLOAT)
.setSampleRate(audio.sampleRate)
.setChannelMask(AudioFormat.CHANNEL_OUT_MONO)
.build())
.setBufferSizeInBytes(audio.samples.size * 4)
.build()
audioTrack.write(audio.samples, 0, audio.samples.size, AudioTrack.WRITE_BLOCKING)
audioTrack.play()
// Save to WAV file
LLMEdgeManager.synthesizeSpeechToFile(
context = context,
text = "Hello, world!",
outputFile = File(cacheDir, "output.wav")
)
Additional Resources
- Architecture for flow diagrams (RAG, OCR, JNI loading)
- Usage for API reference and concepts
- Quirks for troubleshooting specific issues