Complete reference for on-device video generation using Wan models in llmedge.
Table of Contents
- Overview
- Supported Models
- API Reference
- Parameter Guide
- Advanced Usage
- Troubleshooting
- Performance Optimization
Overview
llmedge provides on-device video generation through the StableDiffusion class, using Wan models. Generate short video clips (4-64 frames) entirely on Android devices.
⚠️ Hardware Requirements:
- Minimum RAM: 12GB recommended for Wan 2.1 T2V-1.3B
- Supported Devices: Galaxy S23 Ultra (12GB), Pixel 8 Pro (12GB), OnePlus 12 (16GB+)
- Not Supported: 8GB RAM devices (Galaxy S22, Pixel 7)
Why 12GB? Wan models require loading three components simultaneously:
- Main diffusion model (fp16): ~2.7GB RAM
- T5XXL text encoder (Q3_K_S GGUF): ~5.9GB RAM
- VAE decoder (fp16): ~0.14GB RAM
- Working memory: ~1GB RAM Total: ~9.7GB minimum, 12GB recommended
Key Features:
- Text-to-video (T2V) generation
- Multi-file model loading (main + VAE + T5XXL)
- Memory-aware device compatibility checks
- Progress monitoring and cancellation
- Multiple scheduler options
Supported Models
Official Model Source (Recommended)
Wan 2.1 T2V 1.3B from Comfy-Org/Wan_2.1_ComfyUI_repackaged:
All three components are required and must be explicitly downloaded:
- Main Model:
wan2.1_t2v_1.3B_fp16.safetensors(~2.6GB file, 2.7GB RAM) - VAE:
wan_2.1_vae.safetensors(~160MB file, 0.14GB RAM) - T5XXL Encoder:
umt5-xxl-encoder-Q3_K_S.gguffromcity96/umt5-xxl-encoder-gguf(~2.86GB file, 5.9GB RAM)
Device Requirements:
- RAM: 12GB+ (9.7GB minimum + overhead)
- Storage: 6GB free space for downloads
- OS: Android 11+ recommended (Vulkan acceleration)
Known Limitations:
- GGUF quantization of main model blocked by metadata issues
- Sequential loading: LLMEdgeManager supports a sequential load flow to reduce peak
memory usage for low-memory devices. When
forceSequentialLoad=trueis used, the manager will precompute text-conditioning with the T5 encoder and then load the diffusion model without reloading the T5 encoder to avoid duplicating memory usage. Note: For best results with sequential loading, avoidpreferPerformanceMode=trueas that can cause the manager to favor GPU allocation patterns that increase peak memory usage. - No disk streaming - models must fit in RAM
- 8GB RAM devices cannot run Wan models (architectural constraint)
API Reference
Loading Models
StableDiffusion.load()
Load a video generation model with explicit paths to all three required components.
⚠️ Important: The simplified modelId + filename approach does not work for Wan models. You must explicitly download and provide paths to all three files.
suspend fun load(
context: Context,
modelPath: String,
vaePath: String?,
t5xxlPath: String?,
nThreads: Int = Runtime.getRuntime().availableProcessors(),
offloadToCpu: Boolean = true,
keepClipOnCpu: Boolean = true,
keepVaeOnCpu: Boolean = true,
loraModelDir: String? = null, // Path to directory containing LoRA .safetensors
loraApplyMode: StableDiffusion.LoraApplyMode = StableDiffusion.LoraApplyMode.AUTO, // LoRA application strategy
): StableDiffusion
Parameters:
context: Android application contextmodelPath: Absolute path to main model file (safetensors)vaePath: Absolute path to VAE file (safetensors)t5xxlPath: Absolute path to T5XXL encoder (GGUF)nThreads: Number of CPU threads (default: all cores)offloadToCpu: Enable CPU offloading (default: true, recommended)keepClipOnCpu: Keep CLIP model on CPU (default: true, recommended)keepVaeOnCpu: Keep VAE on CPU (default: true, recommended)
Returns: StableDiffusion instance ready for generation
Throws:
FileNotFoundException: Model file not foundIllegalStateException: Model loading failed (e.g., insufficient RAM)UnsupportedOperationException: 14B model rejected (mobile unsupported)
Example:
// Download all three model files explicitly
val modelFile = HuggingFaceHub.ensureRepoFileOnDisk(
context = this,
modelId = "Comfy-Org/Wan_2.1_ComfyUI_repackaged",
revision = "main",
filename = "wan2.1_t2v_1.3B_fp16.safetensors",
allowedExtensions = listOf(".safetensors"),
preferSystemDownloader = true
)
val vaeFile = HuggingFaceHub.ensureRepoFileOnDisk(
context = this,
modelId = "Comfy-Org/Wan_2.1_ComfyUI_repackaged",
revision = "main",
filename = "wan_2.1_vae.safetensors",
allowedExtensions = listOf(".safetensors"),
preferSystemDownloader = true
)
val t5xxlFile = HuggingFaceHub.ensureRepoFileOnDisk(
context = this,
modelId = "city96/umt5-xxl-encoder-gguf",
revision = "main",
filename = "umt5-xxl-encoder-Q3_K_S.gguf",
allowedExtensions = listOf(".gguf"),
preferSystemDownloader = true
)
// Load all three models together
val sd = StableDiffusion.load(
context = this,
modelPath = modelFile.file.absolutePath,
vaePath = vaeFile.file.absolutePath,
t5xxlPath = t5xxlFile.file.absolutePath,
nThreads = Runtime.getRuntime().availableProcessors(),
offloadToCpu = true,
keepClipOnCpu = true,
keepVaeOnCpu = true,
loraModelDir = null, // Or provide path to LoRA directory if applicable
loraApplyMode = StableDiffusion.LoraApplyMode.AUTO
)
Video Generation
txt2vid()
Generate video from text prompt and optional initial image.
suspend fun txt2vid(params: VideoGenerateParams): List<Bitmap>
Parameters:
params:VideoGenerateParamsobject (see Parameter Guide)
Returns: List<Bitmap> - Generated video frames
Throws:
IllegalStateException: Model not loaded or not a video modelIllegalArgumentException: Invalid parameters (dimensions, frame count, etc.)CancellationException: Generation cancelled viacancelGeneration()
Example:
val params = StableDiffusion.VideoGenerateParams(
prompt = "a cat walking in a garden, high quality",
videoFrames = 16,
width = 512,
height = 512,
steps = 20,
cfgScale = 7.0,
seed = 42
)
val frames = sd.txt2vid(params)
VideoGenerateParams
Data class for video generation parameters.
data class VideoGenerateParams(
val prompt: String,
val videoFrames: Int = 16,
val width: Int = 512,
val height: Int = 512,
val steps: Int = 20,
val cfgScale: Double = 7.0,
val seed: Long = -1,
val scheduler: Scheduler = Scheduler.EULER_A,
val strength: Float = 0.8f,
val initImage: Bitmap? = null,
val easyCacheParams: EasyCacheParams = EasyCacheParams(), // Parameters for EasyCache optimization
val loraModelDir: String? = null, // Directory containing LoRA .safetensors (only applicable if also loading base model with LoRA support)
val loraApplyMode: StableDiffusion.LoraApplyMode = StableDiffusion.LoraApplyMode.AUTO // LoRA application strategy
)
Field Validation:
prompt: Non-empty stringvideoFrames: 4-64 (capped to 32 for 5B models)width: 256-960 (multiple of 64)height: 256-960 (multiple of 64)steps: 10-50cfgScale: 1.0-15.0seed: Any long (-1 for random)scheduler: EULER_A, DDIM, DDPM, or LCMstrength: 0.0-1.0 (for I2V/TI2V)initImage: Optional bitmap for I2V/TI2V
See Parameter Guide for detailed explanations.
Model Introspection
isVideoModel()
Check if loaded model is a video generation model.
fun isVideoModel(): Boolean
Returns: true if model supports video generation, false otherwise
getVideoModelMetadata()
Get metadata about the loaded video model.
fun getVideoModelMetadata(): VideoModelMetadata?
Returns: VideoModelMetadata object or null if not a video model
VideoModelMetadata fields:
architecture: Model architecture (e.g., "wan")modelType: "t2v", "i2v", or "ti2v"parameterCount: "1.3B", "5B", or "14B"mobileSupported: Boolean (false for 14B models)tags: Set of model tagsfilename: GGUF filename
Progress Monitoring
setProgressCallback()
Set callback for generation progress updates.
fun setProgressCallback(callback: VideoProgressCallback?)
VideoProgressCallback:
fun interface VideoProgressCallback {
fun onProgress(step: Int, totalSteps: Int)
}
Example:
sd.setProgressCallback { step, totalSteps ->
val progress = (step.toFloat() / totalSteps * 100).toInt()
runOnUiThread {
progressBar.progress = progress
statusText.text = "Step $step / $totalSteps"
}
}
cancelGeneration()
Cancel ongoing video generation.
fun cancelGeneration()
Cancellation is cooperative - the native layer checks the flag periodically. Generation will stop within 1-2 seconds.
Example:
cancelButton.setOnClickListener {
sd.cancelGeneration()
}
Resource Management
close()
Free native resources and reset model state.
fun close()
Important: Always call close() when done with the model to prevent memory leaks. Use Kotlin's use block for automatic cleanup:
StableDiffusion.load(context, modelId, filename).use { sd ->
val frames = sd.txt2vid(params)
// sd.close() called automatically
}
Parameter Guide
Core Parameters
prompt
Text description of the video to generate.
Best Practices:
- Be specific and descriptive
- Include quality modifiers: "high quality", "detailed", "cinematic"
- Avoid negations (use positive descriptions)
- Keep under 200 characters for best results
Examples:
// Good
"a serene ocean sunset, waves gently rolling, golden hour lighting, cinematic"
// Better
"a professional chef preparing pasta, kitchen environment, natural lighting, detailed hands"
// Avoid
"a cat, not blurry" // Negations don't work well
videoFrames
Number of frames to generate (4-64).
Guidelines:
- 4-8 frames: Quick tests, ~5-15 seconds generation
- 16 frames: Standard short clips, ~20-45 seconds generation
- 32 frames: Longer animations, ~40-90 seconds generation
- 64 frames: Maximum quality (1.3B models only), ~80-180 seconds
Memory Impact:
- 1.3B models: Up to 64 frames
- 5B models: Automatically capped at 32 frames
// Quick test
videoFrames = 8
// Standard production
videoFrames = 16
// High quality (1.3B only)
videoFrames = 64
width and height
Output resolution (256-960, must be multiples of 64).
Common Resolutions:
- 256x256: Fastest, lowest quality
- 512x512: Balanced (recommended)
- 768x768: High quality, slower
- 960x960: Maximum quality, very slow
Performance vs Quality:
// Fast generation (~2 sec/frame on mid-range)
width = 256, height = 256
// Balanced (recommended)
width = 512, height = 512
// High quality (~8 sec/frame on mid-range)
width = 768, height = 768
steps
Number of diffusion steps (10-50).
Guidelines:
- 10-15 steps: Fast, lower quality
- 20 steps: Recommended default
- 25-30 steps: Higher quality
- 40-50 steps: Maximum quality, diminishing returns
// Fast generation
steps = 15
// Production quality
steps = 20
// Maximum quality
steps = 30
cfgScale
Classifier-free guidance scale (1.0-15.0). Controls adherence to prompt.
Guidelines:
- 1.0-3.0: Very creative, less prompt adherence
- 7.0: Default, balanced
- 10.0-12.0: Strong prompt adherence
- 13.0-15.0: Very strict, may over-saturate
// Creative freedom
cfgScale = 3.0
// Standard (recommended)
cfgScale = 7.0
// Strict prompt following
cfgScale = 10.0
seed
Random seed for reproducibility.
Guidelines:
- -1: Random seed (different output each time)
- 0+: Fixed seed (reproducible outputs)
// Random generation
seed = -1
// Reproducible generation
seed = 42
// Generate variations
val seeds = listOf(42, 43, 44, 45)
seeds.forEach { seed ->
val frames = sd.txt2vid(params.copy(seed = seed))
}
Advanced Parameters
scheduler
Diffusion scheduler algorithm.
Options:
Scheduler.EULER_A: Default, good quality and speedScheduler.DDIM: More deterministic, slightly slowerScheduler.DDPM: Higher quality, slowerScheduler.LCM: Fast inference (requires LCM-fine-tuned model)
// Default
scheduler = Scheduler.EULER_A
// Deterministic
scheduler = Scheduler.DDIM
// Quality-focused
scheduler = Scheduler.DDPM
strength
Denoising strength for image-to-video (0.0-1.0).
Only used with initImage for I2V/TI2V models.
Guidelines:
- 0.0-0.3: Subtle animation, preserves image
- 0.5-0.7: Moderate animation
- 0.8-1.0: Strong transformation
val params = VideoGenerateParams(
prompt = "animate this scene, add motion",
initImage = initialFrame,
strength = 0.7f, // Moderate animation
videoFrames = 16
)
initImage
Initial frame for image-to-video generation (I2V/TI2V models only).
val initialFrame = BitmapFactory.decodeResource(resources, R.drawable.scene)
val params = VideoGenerateParams(
prompt = "animate this image, add wind and motion",
initImage = initialFrame,
strength = 0.8f,
videoFrames = 16,
width = initialFrame.width,
height = initialFrame.height
)
Note: Image will be resized to match width and height if needed.
Advanced Usage
Device-Aware Model Selection
Check device RAM before attempting to load:
fun checkDeviceCompatibility(context: Context): Boolean {
val activityManager = context.getSystemService(Context.ACTIVITY_SERVICE) as ActivityManager
val memInfo = ActivityManager.MemoryInfo()
activityManager.getMemoryInfo(memInfo)
val totalRamGB = memInfo.totalMem / (1024.0 * 1024.0 * 1024.0)
if (totalRamGB < 12.0) {
Log.w("VideoGen", "Insufficient RAM: ${String.format("%.1f", totalRamGB)}GB (12GB required)")
return false
}
return true
}
// Usage
if (!checkDeviceCompatibility(this)) {
showErrorDialog(
"Video generation requires 12GB+ RAM. " +
"This device has ${String.format("%.1f", totalRamGB)}GB. " +
"Consider using cloud inference APIs instead."
)
return
}
// Proceed with model loading
Batch Generation
Generate multiple variations efficiently:
val baseParams = StableDiffusion.VideoGenerateParams(
prompt = "a cat walking",
videoFrames = 16,
width = 512,
height = 512,
steps = 20
)
val variations = (0..4).map { i ->
async(Dispatchers.IO) {
sd.txt2vid(baseParams.copy(seed = 42 + i))
}
}.awaitAll()
// variations now contains 5 different video sequences
Streaming to Video File
Save frames directly to MP4 using MediaCodec:
fun saveFramesToVideo(frames: List<Bitmap>, outputPath: String, fps: Int = 8) {
val mediaCodec = MediaCodec.createEncoderByType(MediaFormat.MIMETYPE_VIDEO_AVC)
val mediaFormat = MediaFormat.createVideoFormat(
MediaFormat.MIMETYPE_VIDEO_AVC,
frames.first().width,
frames.first().height
).apply {
setInteger(MediaFormat.KEY_BIT_RATE, 2000000)
setInteger(MediaFormat.KEY_FRAME_RATE, fps)
setInteger(MediaFormat.KEY_COLOR_FORMAT, MediaCodecInfo.CodecCapabilities.COLOR_FormatSurface)
setInteger(MediaFormat.KEY_I_FRAME_INTERVAL, 1)
}
// Configure MediaMuxer and encode frames
// See Android MediaCodec documentation for full implementation
}
Model Switching
Switch between models efficiently:
// Load first model
var sd = StableDiffusion.load(this, modelId1, filename1)
val frames1 = sd.txt2vid(params)
sd.close()
// Switch to second model
sd = StableDiffusion.load(this, modelId2, filename2)
val frames2 = sd.txt2vid(params)
sd.close()
Note: Metadata caching reduces GGUF parsing overhead on subsequent loads.
Troubleshooting
Common Issues
OutOfMemoryError
Symptoms: App crashes during generation with OOM error
Solutions:
- Reduce resolution:
width = 256, height = 256 - Reduce frame count:
videoFrames = 8 - Reduce steps:
steps = 15 - Use smaller quantization: Q3_K_S instead of Q4_K_M
- Close other apps to free RAM
- Enable CPU offloading:
offloadToCpu = true
// Memory-constrained configuration
val params = VideoGenerateParams(
prompt = "...",
videoFrames = 8,
width = 256,
height = 256,
steps = 15,
cfgScale = 7.0
)
Slow Generation
Symptoms: Generation takes >5 seconds per frame
Solutions:
- Use 1.3B model instead of 5B
- Reduce resolution
- Reduce steps (15-20 is usually sufficient)
- Enable Vulkan if on Android 11+
- Close background apps
// Fast generation configuration
val params = VideoGenerateParams(
prompt = "...",
videoFrames = 16,
width = 512,
height = 512,
steps = 15, // Reduced from 20
cfgScale = 7.0
)
Model Not Loading
Symptoms: FileNotFoundException or load failure
Solutions:
- Verify model file exists:
File(context.filesDir, "hf-models/$modelId/$filename").exists() - Check internet connection for downloads
- Verify Hugging Face model ID is correct
- Check storage space (5B models need ~5GB free)
// Debug model loading
try {
val sd = StableDiffusion.load(this, modelId, filename)
Log.d("VideoGen", "Model loaded: ${sd.getVideoModelMetadata()}")
} catch (e: Exception) {
Log.e("VideoGen", "Load failed", e)
// Handle error
}
Poor Quality Output
Symptoms: Blurry, artifact-heavy, or incoherent frames
Solutions:
- Increase steps: 20-30
- Increase resolution: 512x512 or higher
- Adjust cfgScale: Try 7.0-10.0
- Use Q4_K_M or higher quantization
- Improve prompt specificity
- Try different schedulers (DDPM for quality)
// Quality-focused configuration
val params = VideoGenerateParams(
prompt = "detailed, high quality, cinematic scene...",
videoFrames = 16,
width = 768,
height = 768,
steps = 25,
cfgScale = 8.0,
scheduler = Scheduler.DDPM
)
Generation Hangs
Symptoms: Progress stops, app becomes unresponsive
Solutions:
- Ensure generation runs on
Dispatchers.IO - Set progress callback to monitor
- Implement timeout mechanism
- Call
cancelGeneration()if needed
val job = CoroutineScope(Dispatchers.IO).launch {
try {
withTimeout(300_000) { // 5 minute timeout
val frames = sd.txt2vid(params)
// Process frames
}
} catch (e: TimeoutCancellationException) {
sd.cancelGeneration()
Log.e("VideoGen", "Generation timed out")
}
}
Performance Optimization
EasyCache (for supported models)
For supported models (e.g., DiT architectures like Flux/SD3), EasyCache can significantly reduce generation time by reusing intermediate diffusion steps. LLMEdgeManager automatically detects and enables EasyCache if the loaded model supports it.
If using the low-level StableDiffusion API, you can enable EasyCache via VideoGenerateParams:
val params = VideoGenerateParams(
// ... other params
easyCacheParams = StableDiffusion.EasyCacheParams(
enabled = true,
reuseThreshold = 0.2f, // Threshold for skipping steps (0.0 - 1.0)
startPercent = 0.15f, // Start skipping after this percentage of steps
endPercent = 0.95f // Stop skipping before this percentage of steps
)
)
Note: EasyCache is not supported for UNet-based models (like Stable Diffusion 1.5).
Memory Management
Monitor memory usage:
val runtime = Runtime.getRuntime()
val usedMemoryMB = (runtime.totalMemory() - runtime.freeMemory()) / (1024 * 1024)
Log.d("VideoGen", "Memory usage: ${usedMemoryMB}MB")
Memory-efficient generation:
// Process frames immediately instead of accumulating
sd.setProgressCallback { step, total ->
// ... update UI
}
val frames = sd.txt2vid(params)
frames.forEachIndexed { index, frame ->
saveFrameToDisk(frame, index)
frame.recycle() // Free bitmap immediately
}
Batch Processing
Generate multiple videos efficiently:
// Reuse model instance
val sd = StableDiffusion.load(this, modelId, filename)
prompts.forEach { prompt ->
val frames = sd.txt2vid(params.copy(prompt = prompt))
processFrames(frames)
}
sd.close()
Background Processing
Use WorkManager for long generations:
class VideoGenerationWorker(context: Context, params: WorkerParameters)
: CoroutineWorker(context, params) {
override suspend fun doWork(): Result {
val sd = StableDiffusion.load(applicationContext, modelId, filename)
val frames = sd.txt2vid(params)
saveVideo(frames)
sd.close()
return Result.success()
}
}
Vulkan Acceleration
Enable Vulkan on Android 11+:
Build library with Vulkan support:
./gradlew :llmedge:assembleRelease -Pandroid.jniCmakeArgs="-DGGML_VULKAN=ON -DSD_VULKAN=ON"
Verify Vulkan at runtime:
// Vulkan status is logged during initialization
// Check logcat for: "Vulkan initialized successfully"
See Also
- architecture.md - System design