The llmedge-examples repo demonstrates multiple real-world uses.
Example Activities
The llmedge-examples repository contains complete working demonstrations:
| Activity | Purpose |
|---|---|
LocalAssetDemoActivity |
Load bundled GGUF model, blocking & streaming inference |
HuggingFaceDemoActivity |
Download from HF Hub, reasoning controls |
ImageToTextActivity |
Camera capture, OCR text extraction |
LlavaVisionActivity |
Vision model integration (prepared architecture) |
StableDiffusionActivity |
On-device image generation |
VideoGenerationActivity |
On-device text-to-video generation (Wan 2.1) |
RagActivity |
PDF indexing, vector search, Q&A |
TTSActivity |
Text-to-speech synthesis (Bark) |
STTActivity |
Speech-to-text transcription (Whisper) |
Also see the ChatSession pattern below for bounded multi-turn chat UIs backed by LLMEdge.
LocalAssetDemoActivity
Demonstrates loading a bundled model asset and running inference using LLMEdge.
Key patterns:
val modelPath = copyAssetIfNeeded("YourModel.gguf")
val edge = LLMEdge.create(context, lifecycleScope)
// Run inference through the text client
val response = edge.text.generate(
prompt = "Say 'hello from llmedge'.",
model = ModelSpec.localFile(modelPath),
)
ChatSession pattern
Use this pattern when your app owns a chat UI and you want bounded, Kotlin-managed transcript replay instead of relying on unbounded native chat history:
val edge = LLMEdge.create(context, lifecycleScope)
val session =
edge.text.session(
model = ModelSpec.localFile(modelPath),
memory = ConversationWindow(maxTurns = 6, maxTokens = 4096, stripThinkTags = true),
systemPrompt = "You are a concise assistant.",
)
session.prepare()
// Blocking reply
val reply = session.reply("Explain why context windows fill up.")
// Streaming reply
session.stream("Now summarize that in 3 bullets.").collect { event ->
if (event is TextStreamEvent.Chunk) {
appendToChatUi(event.value)
}
}
Why use it:
- Keeps the full transcript in Kotlin memory
- Replays only the active sliding window back into the model
- Strips older
<think>...</think>traces before replay when enabled - Works well for reasoning-enabled models that would otherwise exhaust the native context quickly
ImageToTextActivity
Demonstrates camera integration and OCR text extraction using LLMEdge.
Key features:
- Camera permission handling
- High-resolution image capture
- ML Kit OCR integration via
edge.vision
Code snippet:
// Process image
val bitmap = ImageUtils.fileToBitmap(file)
ivPreview.setImageBitmap(bitmap)
val edge = LLMEdge.create(context, lifecycleScope)
// Extract text
val text = edge.vision.extractText(bitmap)
tvResult.text = text.ifEmpty { "(no text detected)" }
HuggingFaceDemoActivity
Shows how to download models from Hugging Face Hub and run inference using LLMEdge.
Key features:
- Model download with progress callback
- Heap-aware parameter selection
- Thinking mode configuration
Code snippet:
val edge = LLMEdge.create(context, lifecycleScope)
// 1. Download with progress
edge.models.prefetch(
ModelSpec.huggingFace(
repoId = "unsloth/Qwen3-0.6B-GGUF",
filename = "Qwen3-0.6B-Q4_K_M.gguf",
)
)
// 2. Generate text (model is now cached)
val response = edge.text.generate(
prompt = "List two quick facts about running GGUF models on Android.",
model = ModelSpec.huggingFace(
repoId = "unsloth/Qwen3-0.6B-GGUF",
filename = "Qwen3-0.6B-Q4_K_M.gguf",
),
params = SmolLM.InferenceParams(thinkingMode = SmolLM.ThinkingMode.DISABLED),
)
RagActivity
Complete RAG (Retrieval-Augmented Generation) pipeline with PDF indexing and Q&A.
Key features:
- PDF document picker with persistent permissions
- Sentence embeddings (ONNX Runtime)
- Vector store with cosine similarity search
- Context-aware answer generation
Full workflow:
val edge = LLMEdge.create(context, lifecycleScope)
val rag = edge.rag.createSession()
rag.init()
// 1. Index a PDF document
val count = rag.indexPdf(pdfUri)
Log.d("RAG", "Indexed $count chunks")
// 2. Ask questions
val answer = rag.ask("What are the key points?", topK = 5)
Important notes:
- Scanned PDFs require OCR before indexing (PDFBox extracts text-based only)
- Embedding model must be in
assets/directory - Vector store persists to JSON in app files directory
- Adjust chunk size/overlap based on document type
StableDiffusionActivity
Demonstrates on-device image generation using LLMEdge.
Key patterns:
val edge = LLMEdge.create(context, lifecycleScope)
// Generate image
val bitmap = edge.image.generate(
ImageGenerationRequest(
prompt = "a cute cat",
width = 256, // Start small on mobile
height = 256,
steps = 20,
cfgScale = 7.0f
),
)
// Display result
imageView.setImageBitmap(bitmap)
Important notes:
- Start with 128×128 or 256x256 on devices with <4GB RAM
edge.imageautomatically enables CPU offloading if memory is tight- Generating images is memory-intensive; close other apps for best results
VideoGenerationActivity
Demonstrates on-device text-to-video generation using Wan 2.1.
Key patterns:
val edge = LLMEdge.create(context, lifecycleScope)
// 1. Configure params for mobile-friendly generation
val params = VideoGenerationRequest(
prompt = "A dog running in the park",
width = 512, // Wan models require multiples of 64 between 256-960
height = 512,
videoFrames = 8, // Start with fewer frames (8-16)
steps = 20,
cfgScale = 7.0f,
flowShift = 3.0f, // Use Float.POSITIVE_INFINITY to auto-select
forceSequentialLoad = true // Critical for devices with <12GB RAM
)
// 2. Generate video
edge.image.generateVideo(params).collect { event ->
when (event) {
is GenerationStreamEvent.Progress ->
updateProgress(event.update.message)
is GenerationStreamEvent.Completed ->
imageView.setImageBitmap(event.frames.first())
}
}
Important notes:
- Sequential Load: Video models are large (Wan 2.1 is ~5GB).
forceSequentialLoad = trueis essential for mobile devices; it loads components (T5 encoder, Diffusion model, VAE) one by one to keep peak memory low. - Frame Count: Generating 8-16 frames takes significant time. Provide progress feedback.
- LLMEdge: This activity uses the high-level
LLMEdgefacade, which handles the sequential loading logic automatically.
LlavaVisionActivity
Demonstrates vision-capable LLM integration using LLMEdge.
Key patterns:
val edge = LLMEdge.create(context, lifecycleScope)
// 1. Preprocess image
val bmp = ImageUtils.imageToBitmap(context, uri)
val scaledBmp = ImageUtils.preprocessImage(bmp, correctOrientation = true, maxDimension = 1024)
// 2. Run OCR (Optional grounding)
val ocrText = edge.vision.extractText(scaledBmp)
// 3. Build Prompt (e.g. ChatML format for Phi-3)
val prompt = """
<|system|>You are a helpful assistant.<|end|>
<|user|>
Context (OCR): $ocrText
Describe this image.<|end|>
<|assistant|>
""".trimIndent()
// 4. Run Vision Analysis
val result = edge.vision.analyze(scaledBmp, prompt)
Status: Uses LLMEdge to orchestrate the experimental vision pipeline (loading projector, encoding image, running inference).
TTSActivity
Demonstrates text-to-speech synthesis using Bark via LLMEdge.
Key features:
- Text input for speech synthesis
- Automatic model download from Hugging Face (843MB)
- Progress tracking during generation (semantic, coarse, fine encoding)
- Audio playback of generated speech
- WAV file saving
Key patterns:
val edge = LLMEdge.create(context, lifecycleScope)
// Generate speech (model auto-downloads on first use)
val audio = edge.speech.synthesize("Hello, world!")
// Play the generated audio
val audioTrack = AudioTrack.Builder()
.setAudioAttributes(AudioAttributes.Builder()
.setUsage(AudioAttributes.USAGE_MEDIA)
.setContentType(AudioAttributes.CONTENT_TYPE_SPEECH)
.build())
.setAudioFormat(AudioFormat.Builder()
.setEncoding(AudioFormat.ENCODING_PCM_FLOAT)
.setSampleRate(audio.sampleRate)
.setChannelMask(AudioFormat.CHANNEL_OUT_MONO)
.build())
.setBufferSizeInBytes(audio.samples.size * 4)
.build()
audioTrack.write(audio.samples, 0, audio.samples.size, AudioTrack.WRITE_BLOCKING)
audioTrack.play()
Additional Resources
- Architecture for flow diagrams (RAG, OCR, JNI loading)
- Usage for API reference and concepts
- Quirks for troubleshooting specific issues