Complete reference for on-device video generation using Wan models in llmedge.
Table of Contents
- Overview
- Supported Models
- API Reference
- Parameter Guide
- Advanced Usage
- Troubleshooting
- Performance Optimization
Overview
llmedge provides on-device video generation through the StableDiffusion class, using Wan models. Generate short video clips (4-64 frames) entirely on Android devices.
⚠️ Hardware Requirements:
- Minimum RAM: 12GB recommended for Wan 2.1 T2V-1.3B
- Supported Devices: Galaxy S23 Ultra (12GB), Pixel 8 Pro (12GB), OnePlus 12 (16GB+)
- Not Supported: 8GB RAM devices (Galaxy S22, Pixel 7)
Why 12GB? Wan models require loading three components simultaneously:
- Main diffusion model (fp16): ~2.7GB RAM
- T5XXL text encoder (Q3_K_S GGUF): ~5.9GB RAM
- VAE decoder (fp16): ~0.14GB RAM
- Working memory: ~1GB RAM Total: ~9.7GB minimum, 12GB recommended
Key Features:
- Text-to-video (T2V) generation
- Multi-file model loading (main + VAE + T5XXL)
- Memory-aware device compatibility checks
- Progress monitoring and cancellation
- Multiple scheduler options
Supported Models
Official Model Source (Recommended)
Wan 2.1 T2V 1.3B from Comfy-Org/Wan_2.1_ComfyUI_repackaged:
All three components are required and must be explicitly downloaded:
- Main Model:
wan2.1_t2v_1.3B_fp16.safetensors(~2.6GB file, 2.7GB RAM) - VAE:
wan_2.1_vae.safetensors(~160MB file, 0.14GB RAM) - T5XXL Encoder:
umt5-xxl-encoder-Q3_K_S.gguffromcity96/umt5-xxl-encoder-gguf(~2.86GB file, 5.9GB RAM)
Device Requirements:
- RAM: 12GB+ (9.7GB minimum + overhead)
- Storage: 6GB free space for downloads
- OS: Android 11+ recommended (Vulkan acceleration)
Known Limitations:
- GGUF quantization of main model blocked by metadata issues
- Sequential loading:
edge.image.generateVideo(...)supports a sequential load flow to reduce peak memory usage for low-memory devices. WhenforceSequentialLoad=trueis used, the client precomputes text-conditioning with the T5 encoder and then loads the diffusion model without reloading the T5 encoder to avoid duplicating memory usage. Note: For best results with sequential loading, avoid GPU-heavy settings that increase peak memory usage on constrained devices. - No disk streaming - models must fit in RAM
- 8GB RAM devices cannot run Wan models (architectural constraint)
API Reference
Loading Models
StableDiffusion.load()
Load a video generation model with explicit paths to all three required components.
⚠️ Important: The simplified modelId + filename approach does not work for Wan models. You must explicitly download and provide paths to all three files.
suspend fun load(
context: Context,
modelPath: String,
vaePath: String?,
t5xxlPath: String?,
nThreads: Int = Runtime.getRuntime().availableProcessors(),
offloadToCpu: Boolean = true,
keepClipOnCpu: Boolean = true,
keepVaeOnCpu: Boolean = true,
loraModelDir: String? = null, // Path to directory containing LoRA .safetensors
loraApplyMode: StableDiffusion.LoraApplyMode = StableDiffusion.LoraApplyMode.AUTO, // LoRA application strategy
): StableDiffusion
Parameters:
context: Android application contextmodelPath: Absolute path to main model file (safetensors)vaePath: Absolute path to VAE file (safetensors)t5xxlPath: Absolute path to T5XXL encoder (GGUF)nThreads: Number of CPU threads (default: all cores)offloadToCpu: Enable CPU offloading (default: true, recommended)keepClipOnCpu: Keep CLIP model on CPU (default: true, recommended)keepVaeOnCpu: Keep VAE on CPU (default: true, recommended)
Returns: StableDiffusion instance ready for generation
Throws:
ModelFileNotFoundException: A required model asset was not foundInvalidModelFileException: A required model asset was unreadable or emptyModelLoadException: Native model initialization failed (for example due to insufficient RAM)UnsupportedOperationException: 14B model rejected (mobile unsupported)
Example:
// Download all three model files explicitly
val modelFile = HuggingFaceHub.ensureRepoFileOnDisk(
context = this,
modelId = "Comfy-Org/Wan_2.1_ComfyUI_repackaged",
revision = "main",
filename = "wan2.1_t2v_1.3B_fp16.safetensors",
allowedExtensions = listOf(".safetensors"),
preferSystemDownloader = true
)
val vaeFile = HuggingFaceHub.ensureRepoFileOnDisk(
context = this,
modelId = "Comfy-Org/Wan_2.1_ComfyUI_repackaged",
revision = "main",
filename = "wan_2.1_vae.safetensors",
allowedExtensions = listOf(".safetensors"),
preferSystemDownloader = true
)
val t5xxlFile = HuggingFaceHub.ensureRepoFileOnDisk(
context = this,
modelId = "city96/umt5-xxl-encoder-gguf",
revision = "main",
filename = "umt5-xxl-encoder-Q3_K_S.gguf",
allowedExtensions = listOf(".gguf"),
preferSystemDownloader = true
)
// Load all three models together
val sd = StableDiffusion.load(
context = this,
modelPath = modelFile.file.absolutePath,
vaePath = vaeFile.file.absolutePath,
t5xxlPath = t5xxlFile.file.absolutePath,
nThreads = Runtime.getRuntime().availableProcessors(),
offloadToCpu = true,
keepClipOnCpu = true,
keepVaeOnCpu = true,
loraModelDir = null, // Or provide path to LoRA directory if applicable
loraApplyMode = StableDiffusion.LoraApplyMode.AUTO
)
Video Generation
txt2vid()
Generate video from text prompt and optional initial image.
suspend fun txt2vid(params: VideoGenerateParams): List<Bitmap>
Parameters:
params:VideoGenerateParamsobject (see Parameter Guide)
Returns: List<Bitmap> - Generated video frames
Throws:
IllegalStateException: Model not loaded or not a video modelIllegalArgumentException: Invalid parameters (dimensions, frame count, etc.)CancellationException: Generation cancelled viacancelGeneration()
Example:
val params = StableDiffusion.VideoGenerateParams(
prompt = "a cat walking in a garden, high quality",
videoFrames = 16,
width = 512,
height = 512,
steps = 20,
cfgScale = 7.0,
seed = 42
)
val frames = sd.txt2vid(params)
VideoGenerateParams
Data class for video generation parameters.
data class VideoGenerateParams(
val prompt: String,
val videoFrames: Int = 16,
val width: Int = 512,
val height: Int = 512,
val steps: Int = 20,
val cfgScale: Double = 7.0,
val seed: Long = -1,
val scheduler: Scheduler = Scheduler.EULER_A,
val strength: Float = 0.8f,
val initImage: Bitmap? = null,
val easyCacheParams: EasyCacheParams = EasyCacheParams(), // Parameters for EasyCache optimization
val loraModelDir: String? = null, // Directory containing LoRA .safetensors (only applicable if also loading base model with LoRA support)
val loraApplyMode: StableDiffusion.LoraApplyMode = StableDiffusion.LoraApplyMode.AUTO // LoRA application strategy
)
Field Validation:
prompt: Non-empty stringvideoFrames: 4-64 (capped to 32 for 5B models)width: 256-960 (multiple of 64)height: 256-960 (multiple of 64)steps: 10-50cfgScale: 1.0-15.0seed: Any long (-1 for random)scheduler: EULER_A, DDIM, DDPM, or LCMstrength: 0.0-1.0 (for I2V/TI2V)initImage: Optional bitmap for I2V/TI2V
See Parameter Guide for detailed explanations.
Model Introspection
isVideoModel()
Check if loaded model is a video generation model.
fun isVideoModel(): Boolean
Returns: true if model supports video generation, false otherwise
getVideoModelMetadata()
Get metadata about the loaded video model.
fun getVideoModelMetadata(): VideoModelMetadata?
Returns: VideoModelMetadata object or null if not a video model
VideoModelMetadata fields:
architecture: Model architecture (e.g., "wan")modelType: "t2v", "i2v", or "ti2v"parameterCount: "1.3B", "5B", or "14B"mobileSupported: Boolean (false for 14B models)tags: Set of model tagsfilename: GGUF filename
Progress Monitoring
setProgressCallback()
Set callback for generation progress updates.
fun setProgressCallback(callback: VideoProgressCallback?)
VideoProgressCallback:
fun interface VideoProgressCallback {
fun onProgress(step: Int, totalSteps: Int)
}
Example:
sd.setProgressCallback { step, totalSteps ->
val progress = (step.toFloat() / totalSteps * 100).toInt()
runOnUiThread {
progressBar.progress = progress
statusText.text = "Step $step / $totalSteps"
}
}
cancelGeneration()
Cancel ongoing video generation.
fun cancelGeneration()
Cancellation is cooperative - the native layer checks the flag periodically. Generation will stop within 1-2 seconds.
Example:
cancelButton.setOnClickListener {
sd.cancelGeneration()
}
Resource Management
close()
Free native resources and reset model state.
fun close()
Important: Always call close() when done with the model to prevent memory leaks. Use Kotlin's use block for automatic cleanup:
StableDiffusion.load(context, modelId, filename).use { sd ->
val frames = sd.txt2vid(params)
// sd.close() called automatically
}
Parameter Guide
Core Parameters
prompt
Text description of the video to generate.
Best Practices:
- Be specific and descriptive
- Include quality modifiers: "high quality", "detailed", "cinematic"
- Avoid negations (use positive descriptions)
- Keep under 200 characters for best results
Examples:
// Good
"a serene ocean sunset, waves gently rolling, golden hour lighting, cinematic"
// Better
"a professional chef preparing pasta, kitchen environment, natural lighting, detailed hands"
// Avoid
"a cat, not blurry" // Negations don't work well
videoFrames
Number of frames to generate (4-64).
Guidelines:
- 4-8 frames: Quick tests, ~5-15 seconds generation
- 16 frames: Standard short clips, ~20-45 seconds generation
- 32 frames: Longer animations, ~40-90 seconds generation
- 64 frames: Maximum quality (1.3B models only), ~80-180 seconds
Memory Impact:
- 1.3B models: Up to 64 frames
- 5B models: Automatically capped at 32 frames
// Quick test
videoFrames = 8
// Standard production
videoFrames = 16
// High quality (1.3B only)
videoFrames = 64
width and height
Output resolution (256-960, must be multiples of 64).
Common Resolutions:
- 256x256: Fastest, lowest quality
- 512x512: Balanced (recommended)
- 768x768: High quality, slower
- 960x960: Maximum quality, very slow
Performance vs Quality:
// Fast generation (~2 sec/frame on mid-range)
width = 256, height = 256
// Balanced (recommended)
width = 512, height = 512
// High quality (~8 sec/frame on mid-range)
width = 768, height = 768
steps
Number of diffusion steps (10-50).
Guidelines:
- 10-15 steps: Fast, lower quality
- 20 steps: Recommended default
- 25-30 steps: Higher quality
- 40-50 steps: Maximum quality, diminishing returns
// Fast generation
steps = 15
// Production quality
steps = 20
// Maximum quality
steps = 30
cfgScale
Classifier-free guidance scale (1.0-15.0). Controls adherence to prompt.
Guidelines:
- 1.0-3.0: Very creative, less prompt adherence
- 7.0: Default, balanced
- 10.0-12.0: Strong prompt adherence
- 13.0-15.0: Very strict, may over-saturate
// Creative freedom
cfgScale = 3.0
// Standard (recommended)
cfgScale = 7.0
// Strict prompt following
cfgScale = 10.0
seed
Random seed for reproducibility.
Guidelines:
- -1: Random seed (different output each time)
- 0+: Fixed seed (reproducible outputs)
// Random generation
seed = -1
// Reproducible generation
seed = 42
// Generate variations
val seeds = listOf(42, 43, 44, 45)
seeds.forEach { seed ->
val frames = sd.txt2vid(params.copy(seed = seed))
}
Advanced Parameters
scheduler
Diffusion scheduler algorithm.
Options:
Scheduler.EULER_A: Default, good quality and speedScheduler.DDIM: More deterministic, slightly slowerScheduler.DDPM: Higher quality, slowerScheduler.LCM: Fast inference (requires LCM-fine-tuned model)
// Default
scheduler = Scheduler.EULER_A
// Deterministic
scheduler = Scheduler.DDIM
// Quality-focused
scheduler = Scheduler.DDPM
strength
Denoising strength for image-to-video (0.0-1.0).
Only used with initImage for I2V/TI2V models.
Guidelines:
- 0.0-0.3: Subtle animation, preserves image
- 0.5-0.7: Moderate animation
- 0.8-1.0: Strong transformation
val params = VideoGenerateParams(
prompt = "animate this scene, add motion",
initImage = initialFrame,
strength = 0.7f, // Moderate animation
videoFrames = 16
)
initImage
Initial frame for image-to-video generation (I2V/TI2V models only).
val initialFrame = BitmapFactory.decodeResource(resources, R.drawable.scene)
val params = VideoGenerateParams(
prompt = "animate this image, add wind and motion",
initImage = initialFrame,
strength = 0.8f,
videoFrames = 16,
width = initialFrame.width,
height = initialFrame.height
)
Note: Image will be resized to match width and height if needed.
Advanced Usage
Device-Aware Model Selection
Check device RAM before attempting to load:
fun checkDeviceCompatibility(context: Context): Boolean {
val activityManager = context.getSystemService(Context.ACTIVITY_SERVICE) as ActivityManager
val memInfo = ActivityManager.MemoryInfo()
activityManager.getMemoryInfo(memInfo)
val totalRamGB = memInfo.totalMem / (1024.0 * 1024.0 * 1024.0)
if (totalRamGB < 12.0) {
Log.w("VideoGen", "Insufficient RAM: ${String.format("%.1f", totalRamGB)}GB (12GB required)")
return false
}
return true
}
// Usage
if (!checkDeviceCompatibility(this)) {
showErrorDialog(
"Video generation requires 12GB+ RAM. " +
"This device has ${String.format("%.1f", totalRamGB)}GB. " +
"Consider using cloud inference APIs instead."
)
return
}
// Proceed with model loading
Batch Generation
Generate multiple variations efficiently:
val baseParams = StableDiffusion.VideoGenerateParams(
prompt = "a cat walking",
videoFrames = 16,
width = 512,
height = 512,
steps = 20
)
val variations = (0..4).map { i ->
async(Dispatchers.IO) {
sd.txt2vid(baseParams.copy(seed = 42 + i))
}
}.awaitAll()
// variations now contains 5 different video sequences
Streaming to Video File
Save frames directly to MP4 using MediaCodec:
fun saveFramesToVideo(frames: List<Bitmap>, outputPath: String, fps: Int = 8) {
val mediaCodec = MediaCodec.createEncoderByType(MediaFormat.MIMETYPE_VIDEO_AVC)
val mediaFormat = MediaFormat.createVideoFormat(
MediaFormat.MIMETYPE_VIDEO_AVC,
frames.first().width,
frames.first().height
).apply {
setInteger(MediaFormat.KEY_BIT_RATE, 2000000)
setInteger(MediaFormat.KEY_FRAME_RATE, fps)
setInteger(MediaFormat.KEY_COLOR_FORMAT, MediaCodecInfo.CodecCapabilities.COLOR_FormatSurface)
setInteger(MediaFormat.KEY_I_FRAME_INTERVAL, 1)
}
// Configure MediaMuxer and encode frames
// See Android MediaCodec documentation for full implementation
}
Model Switching
Switch between models efficiently:
// Load first model
var sd = StableDiffusion.load(this, modelId1, filename1)
val frames1 = sd.txt2vid(params)
sd.close()
// Switch to second model
sd = StableDiffusion.load(this, modelId2, filename2)
val frames2 = sd.txt2vid(params)
sd.close()
Note: Metadata caching reduces GGUF parsing overhead on subsequent loads.
Troubleshooting
Common Issues
OutOfMemoryError
Symptoms: App crashes during generation with OOM error
Solutions:
- Reduce resolution:
width = 256, height = 256 - Reduce frame count:
videoFrames = 8 - Reduce steps:
steps = 15 - Use smaller quantization: Q3_K_S instead of Q4_K_M
- Close other apps to free RAM
- Enable CPU offloading:
offloadToCpu = true
// Memory-constrained configuration
val params = VideoGenerateParams(
prompt = "...",
videoFrames = 8,
width = 256,
height = 256,
steps = 15,
cfgScale = 7.0
)
Slow Generation
Symptoms: Generation takes >5 seconds per frame
Solutions:
- Use 1.3B model instead of 5B
- Reduce resolution
- Reduce steps (15-20 is usually sufficient)
- Enable Vulkan if on Android 11+
- Close background apps
// Fast generation configuration
val params = VideoGenerateParams(
prompt = "...",
videoFrames = 16,
width = 512,
height = 512,
steps = 15, // Reduced from 20
cfgScale = 7.0
)
Model Not Loading
Symptoms: ModelFileNotFoundException, InvalidModelFileException, or native load failure
Solutions:
- Verify model file exists:
File(context.filesDir, "hf-models/$modelId/$filename").exists() - Check internet connection for downloads
- Verify Hugging Face model ID is correct
- Check storage space (5B models need ~5GB free)
// Debug model loading
try {
val sd = StableDiffusion.load(this, modelId, filename)
Log.d("VideoGen", "Model loaded: ${sd.getVideoModelMetadata()}")
} catch (e: Exception) {
Log.e("VideoGen", "Load failed", e)
// Handle error
}
Poor Quality Output
Symptoms: Blurry, artifact-heavy, or incoherent frames
Solutions:
- Increase steps: 20-30
- Increase resolution: 512x512 or higher
- Adjust cfgScale: Try 7.0-10.0
- Use Q4_K_M or higher quantization
- Improve prompt specificity
- Try different schedulers (DDPM for quality)
// Quality-focused configuration
val params = VideoGenerateParams(
prompt = "detailed, high quality, cinematic scene...",
videoFrames = 16,
width = 768,
height = 768,
steps = 25,
cfgScale = 8.0,
scheduler = Scheduler.DDPM
)
Generation Hangs
Symptoms: Progress stops, app becomes unresponsive
Solutions:
- Ensure generation runs on
Dispatchers.IO - Set progress callback to monitor
- Implement timeout mechanism
- Call
cancelGeneration()if needed
val job = CoroutineScope(Dispatchers.IO).launch {
try {
withTimeout(300_000) { // 5 minute timeout
val frames = sd.txt2vid(params)
// Process frames
}
} catch (e: TimeoutCancellationException) {
sd.cancelGeneration()
Log.e("VideoGen", "Generation timed out")
}
}
Performance Optimization
EasyCache (for supported models)
For supported DiT models (for example Flux, SD3, Wan, Qwen Image, and Z-Image), EasyCache can significantly reduce generation time by reusing intermediate diffusion steps. edge.image automatically enables it across image generation, direct video generation, and sequential video generation when the loaded model supports it.
If using the low-level StableDiffusion API, you can enable EasyCache via VideoGenerateParams:
val params = VideoGenerateParams(
// ... other params
easyCacheParams = StableDiffusion.EasyCacheParams(
enabled = true,
reuseThreshold = 0.2f, // Threshold for skipping steps (0.0 - 1.0)
startPercent = 0.15f, // Start skipping after this percentage of steps
endPercent = 0.95f // Stop skipping before this percentage of steps
)
)
Note: EasyCache is not supported for UNet-based models (like Stable Diffusion 1.5).
Memory Management
Monitor memory usage:
val runtime = Runtime.getRuntime()
val usedMemoryMB = (runtime.totalMemory() - runtime.freeMemory()) / (1024 * 1024)
Log.d("VideoGen", "Memory usage: ${usedMemoryMB}MB")
Memory-efficient generation:
// Process frames immediately instead of accumulating
sd.setProgressCallback { step, total ->
// ... update UI
}
val frames = sd.txt2vid(params)
frames.forEachIndexed { index, frame ->
saveFrameToDisk(frame, index)
frame.recycle() // Free bitmap immediately
}
Batch Processing
Generate multiple videos efficiently:
// Reuse model instance
val sd = StableDiffusion.load(this, modelId, filename)
prompts.forEach { prompt ->
val frames = sd.txt2vid(params.copy(prompt = prompt))
processFrames(frames)
}
sd.close()
Background Processing
Use WorkManager for long generations:
class VideoGenerationWorker(context: Context, params: WorkerParameters)
: CoroutineWorker(context, params) {
override suspend fun doWork(): Result {
val sd = StableDiffusion.load(applicationContext, modelId, filename)
val frames = sd.txt2vid(params)
saveVideo(frames)
sd.close()
return Result.success()
}
}
Vulkan Acceleration
Enable Vulkan on Android 11+:
Build library with Vulkan support:
./gradlew :llmedge:assembleRelease -Pandroid.jniCmakeArgs="-DGGML_VULKAN=ON -DSD_VULKAN=ON"
Verify Vulkan at runtime:
// Vulkan status is logged during initialization
// Check logcat for: "Vulkan initialized successfully"
See Also
- architecture.md - System design