This section documents known quirks, limitations, and troubleshooting steps for llmedge.
Model loading and memory
Common issues:
- Large models may fail to load or cause OOM (OutOfMemoryError) on devices with limited RAM
- The library automatically caps context size based on heap size, but you may need to reduce it further
- If model loading hangs, check file permissions and storage location
- Android may restrict reading assets directly via native code; use
copyAssetIfNeeded()pattern (see examples)
Solutions:
- Prefer small, quantized GGUF models (Q4_K_M or Q5_K_M)
- Explicitly set lower
contextSizeinInferenceParams(e.g., 2048 for <512MB devices) - Monitor memory with
MemoryMetrics.snapshot()before and after loading - Call
SmolLM.close()to free resources when switching models
Native compatibility
Library selection:
- The library automatically selects the best native
.sobased on CPU features (FP16, dotprod, SVE, i8mm) - Logs show which library was loaded (e.g.,
libsmollm_v8_4_fp16_dotprod.so) - Platform ABI mismatches can cause
UnsatisfiedLinkError
Common errors:
UnsatisfiedLinkError: ABI mismatch (checkBuild.SUPPORTED_ABIS[0]matches your build)dlopen failed: Missing dependencies or incompatible NDK version- Build for arm64-v8a for modern devices; armeabi-v7a for older 32-bit devices
Hugging Face downloads
Download issues:
- HF rate-limits: downloads may fail if you exceed rate limits; retry or provide a token
- For private repositories, pass
tokenparameter toloadFromHuggingFace() - Large files: always use
preferSystemDownloader = trueto avoid heap pressure
Troubleshooting:
- Check network connectivity and HF Hub status
- Verify model ID format:
owner/repo-name(e.g.,unsloth/Qwen3-0.6B-GGUF) - Use
forceDownload = trueto redownload corrupted files - Files are cached in app's files directory under
models/hub/
Caching details:
- If the Hugging Face model metadata contains a file size, the library verifies the file length before using a cached copy.
- If the size is not available, the library will validate the file using SHA256 if the API provides a checksum, or will fall back to treating any existing non-empty file as a valid cached file to avoid unnecessary re-downloads.
Image/Camera quirks
Orientation issues:
- Different devices rotate images differently based on EXIF data
ImageUtilshandles basic orientation correction- Always check and normalize orientation before processing
Memory considerations:
- Camera images can be very large (4K+ on modern phones)
- Scale down before OCR or vision processing
- Use
BitmapFactory.Options.inSampleSizefor efficient downscaling
OCR-specific:
- ML Kit works offline but requires Google Play Services
- Scanned PDFs need OCR before RAG indexing (PDFBox extracts text only from text-based PDFs)
- Low-quality images may produce poor OCR results
RAG performance
Slow retrieval:
- Large vector stores slow down cosine similarity search
- Consider limiting indexed chunks or implementing approximate search
- Current implementation uses in-memory brute-force search
No results:
- Check if PDF text extraction succeeded (
indexPdf()returns chunk count) - Scanned PDFs return 0 chunks (no OCR in PDFReader)
- Try
retrievalPreview()to see what's actually being retrieved - Adjust
TextSplitterparameters (smallerchunkSizefor granular retrieval)
Stable Diffusion
OutOfMemoryError:
- SD models are memory-intensive
- Reduce image dimensions (start with 128x128 or 256x256)
- Enable all CPU offload flags:
offloadToCpu,keepClipOnCpu,keepVaeOnCpu - Use
preferSystemDownloader = truefor model downloads
Slow generation:
- Lower
stepsparameter (20 is reasonable, 50+ is slow) - Reduce image resolution
- Generation speed depends heavily on device CPU
Performance tips
Model loading:
- First load is slower (native memory allocation)
- Subsequent loads reuse memory pools
- Pre-load models at app start if needed immediately
Inference:
- Use smaller context windows when memory is constrained
- Lower
temperaturereduces randomness and can be faster - Streaming (
getResponseAsFlow) shows results sooner but same total time - Monitor token/sec with
getLastGenerationMetrics()
Memory management:
- Always call
.close()on SmolLM, StableDiffusion, and OcrEngine instances - Use
MemoryMetricsto track native heap growth nativePssKbshows native memory (model + KV cache)- Consider using a single global SmolLM instance instead of creating/destroying frequently
Debugging JNI
Getting native logs:
adb logcat -s SmolLM:* SmolSD:* llama:*
Common native errors:
- "Failed to load model": Check file path and permissions
- "ggml_init_cublas: failed": Vulkan/GPU initialization failed (falls back to CPU)
- Crashes without logs: Use
ndk-stackwith symbolicated stack traces
Debugging steps:
- Check logcat for native messages
- Verify model file exists and is readable
- Test with a known-good tiny model first
- Check available memory before loading
- Try disabling Vulkan:
SmolLM(useVulkan = false)
Stack traces:
adb logcat | ndk-stack -sym path/to/obj/local/arm64-v8a/
If something isn't covered here, please open an issue with:
- Device model and Android version
- Logcat output (especially native logs)
- Model name and size
- Minimal reproducible code