Video Generation

Complete reference for on-device video generation using Wan models in llmedge.

Overview
Supported Models
API Reference
Parameter Guide
Advanced Usage
Troubleshooting
Performance Optimization

Overview

llmedge provides on-device video generation through the StableDiffusion class, using Wan models. Generate short video clips (4-64 frames) entirely on Android devices.

⚠️ Hardware Requirements:

Minimum RAM: 12GB recommended for Wan 2.1 T2V-1.3B
Supported Devices: Galaxy S23 Ultra (12GB), Pixel 8 Pro (12GB), OnePlus 12 (16GB+)
Not Supported: 8GB RAM devices (Galaxy S22, Pixel 7)

Why 12GB? Wan models require loading three components simultaneously:

Main diffusion model (fp16): ~2.7GB RAM
T5XXL text encoder (Q3_K_S GGUF): ~5.9GB RAM
VAE decoder (fp16): ~0.14GB RAM
Working memory: ~1GB RAM Total: ~9.7GB minimum, 12GB recommended

Key Features:

Text-to-video (T2V) generation
Multi-file model loading (main + VAE + T5XXL)
Memory-aware device compatibility checks
Progress monitoring and cancellation
Multiple scheduler options

Supported Models

Official Model Source (Recommended)

Wan 2.1 T2V 1.3B from Comfy-Org/Wan_2.1_ComfyUI_repackaged:

All three components are required and must be explicitly downloaded:

Main Model: wan2.1_t2v_1.3B_fp16.safetensors (~2.6GB file, 2.7GB RAM)
VAE: wan_2.1_vae.safetensors (~160MB file, 0.14GB RAM)
T5XXL Encoder: umt5-xxl-encoder-Q3_K_S.gguf from city96/umt5-xxl-encoder-gguf (~2.86GB file, 5.9GB RAM)

Device Requirements:

RAM: 12GB+ (9.7GB minimum + overhead)
Storage: 6GB free space for downloads
OS: Android 11+ recommended (Vulkan acceleration)

Known Limitations:

GGUF quantization of main model blocked by metadata issues
Sequential loading: LLMEdgeManager supports a sequential load flow to reduce peak memory usage for low-memory devices. When forceSequentialLoad=true is used, the manager will precompute text-conditioning with the T5 encoder and then load the diffusion model without reloading the T5 encoder to avoid duplicating memory usage. Note: For best results with sequential loading, avoid preferPerformanceMode=true as that can cause the manager to favor GPU allocation patterns that increase peak memory usage.
No disk streaming - models must fit in RAM
8GB RAM devices cannot run Wan models (architectural constraint)

API Reference

Loading Models

`StableDiffusion.load()`

Load a video generation model with explicit paths to all three required components.

⚠️ Important: The simplified modelId + filename approach does not work for Wan models. You must explicitly download and provide paths to all three files.

suspend fun load(
    context: Context,
    modelPath: String,
    vaePath: String?,
    t5xxlPath: String?,
    nThreads: Int = Runtime.getRuntime().availableProcessors(),
    offloadToCpu: Boolean = true,
    keepClipOnCpu: Boolean = true,
    keepVaeOnCpu: Boolean = true,
    loraModelDir: String? = null, // Path to directory containing LoRA .safetensors
    loraApplyMode: StableDiffusion.LoraApplyMode = StableDiffusion.LoraApplyMode.AUTO, // LoRA application strategy
): StableDiffusion

Parameters:

context: Android application context
modelPath: Absolute path to main model file (safetensors)
vaePath: Absolute path to VAE file (safetensors)
t5xxlPath: Absolute path to T5XXL encoder (GGUF)
nThreads: Number of CPU threads (default: all cores)
offloadToCpu: Enable CPU offloading (default: true, recommended)
keepClipOnCpu: Keep CLIP model on CPU (default: true, recommended)
keepVaeOnCpu: Keep VAE on CPU (default: true, recommended)

Returns: StableDiffusion instance ready for generation

Throws:

FileNotFoundException: Model file not found
IllegalStateException: Model loading failed (e.g., insufficient RAM)
UnsupportedOperationException: 14B model rejected (mobile unsupported)

Example:

// Download all three model files explicitly
val modelFile = HuggingFaceHub.ensureRepoFileOnDisk(
    context = this,
    modelId = "Comfy-Org/Wan_2.1_ComfyUI_repackaged",
    revision = "main",
    filename = "wan2.1_t2v_1.3B_fp16.safetensors",
    allowedExtensions = listOf(".safetensors"),
    preferSystemDownloader = true
)

val vaeFile = HuggingFaceHub.ensureRepoFileOnDisk(
    context = this,
    modelId = "Comfy-Org/Wan_2.1_ComfyUI_repackaged",
    revision = "main",
    filename = "wan_2.1_vae.safetensors",
    allowedExtensions = listOf(".safetensors"),
    preferSystemDownloader = true
)

val t5xxlFile = HuggingFaceHub.ensureRepoFileOnDisk(
    context = this,
    modelId = "city96/umt5-xxl-encoder-gguf",
    revision = "main",
    filename = "umt5-xxl-encoder-Q3_K_S.gguf",
    allowedExtensions = listOf(".gguf"),
    preferSystemDownloader = true
)

// Load all three models together
val sd = StableDiffusion.load(
    context = this,
    modelPath = modelFile.file.absolutePath,
    vaePath = vaeFile.file.absolutePath,
    t5xxlPath = t5xxlFile.file.absolutePath,
    nThreads = Runtime.getRuntime().availableProcessors(),
    offloadToCpu = true,
    keepClipOnCpu = true,
    keepVaeOnCpu = true,
    loraModelDir = null, // Or provide path to LoRA directory if applicable
    loraApplyMode = StableDiffusion.LoraApplyMode.AUTO
)

Video Generation

`txt2vid()`

Generate video from text prompt and optional initial image.

suspend fun txt2vid(params: VideoGenerateParams): List<Bitmap>

Parameters:

params: VideoGenerateParams object (see Parameter Guide)

Returns: List<Bitmap> - Generated video frames

Throws:

IllegalStateException: Model not loaded or not a video model
IllegalArgumentException: Invalid parameters (dimensions, frame count, etc.)
CancellationException: Generation cancelled via cancelGeneration()

Example:

val params = StableDiffusion.VideoGenerateParams(
    prompt = "a cat walking in a garden, high quality",
    videoFrames = 16,
    width = 512,
    height = 512,
    steps = 20,
    cfgScale = 7.0,
    seed = 42
)

val frames = sd.txt2vid(params)

VideoGenerateParams

Data class for video generation parameters.

data class VideoGenerateParams(
    val prompt: String,
    val videoFrames: Int = 16,
    val width: Int = 512,
    val height: Int = 512,
    val steps: Int = 20,
    val cfgScale: Double = 7.0,
    val seed: Long = -1,
    val scheduler: Scheduler = Scheduler.EULER_A,
    val strength: Float = 0.8f,
    val initImage: Bitmap? = null,
    val easyCacheParams: EasyCacheParams = EasyCacheParams(), // Parameters for EasyCache optimization
    val loraModelDir: String? = null, // Directory containing LoRA .safetensors (only applicable if also loading base model with LoRA support)
    val loraApplyMode: StableDiffusion.LoraApplyMode = StableDiffusion.LoraApplyMode.AUTO // LoRA application strategy
)

Field Validation:

prompt: Non-empty string
videoFrames: 4-64 (capped to 32 for 5B models)
width: 256-960 (multiple of 64)
height: 256-960 (multiple of 64)
steps: 10-50
cfgScale: 1.0-15.0
seed: Any long (-1 for random)
scheduler: EULER_A, DDIM, DDPM, or LCM
strength: 0.0-1.0 (for I2V/TI2V)
initImage: Optional bitmap for I2V/TI2V

See Parameter Guide for detailed explanations.

Model Introspection

`isVideoModel()`

Check if loaded model is a video generation model.

fun isVideoModel(): Boolean

Returns: true if model supports video generation, false otherwise

`getVideoModelMetadata()`

Get metadata about the loaded video model.

fun getVideoModelMetadata(): VideoModelMetadata?

Returns: VideoModelMetadata object or null if not a video model

VideoModelMetadata fields:

architecture: Model architecture (e.g., "wan")
modelType: "t2v", "i2v", or "ti2v"
parameterCount: "1.3B", "5B", or "14B"
mobileSupported: Boolean (false for 14B models)
tags: Set of model tags
filename: GGUF filename

Progress Monitoring

`setProgressCallback()`

Set callback for generation progress updates.

fun setProgressCallback(callback: VideoProgressCallback?)

VideoProgressCallback:

fun interface VideoProgressCallback {
    fun onProgress(step: Int, totalSteps: Int)
}

Example:

sd.setProgressCallback { step, totalSteps ->
    val progress = (step.toFloat() / totalSteps * 100).toInt()
    runOnUiThread {
        progressBar.progress = progress
        statusText.text = "Step $step / $totalSteps"
    }
}

`cancelGeneration()`

Cancel ongoing video generation.

fun cancelGeneration()

Cancellation is cooperative - the native layer checks the flag periodically. Generation will stop within 1-2 seconds.

Example:

cancelButton.setOnClickListener {
    sd.cancelGeneration()
}

Resource Management

`close()`

Free native resources and reset model state.

fun close()

Important: Always call close() when done with the model to prevent memory leaks. Use Kotlin's use block for automatic cleanup:

StableDiffusion.load(context, modelId, filename).use { sd ->
    val frames = sd.txt2vid(params)
    // sd.close() called automatically
}

Parameter Guide

Core Parameters

`prompt`

Text description of the video to generate.

Best Practices:

Be specific and descriptive
Include quality modifiers: "high quality", "detailed", "cinematic"
Avoid negations (use positive descriptions)
Keep under 200 characters for best results

Examples:

// Good
"a serene ocean sunset, waves gently rolling, golden hour lighting, cinematic"

// Better
"a professional chef preparing pasta, kitchen environment, natural lighting, detailed hands"

// Avoid
"a cat, not blurry" // Negations don't work well

`videoFrames`

Number of frames to generate (4-64).

Guidelines:

4-8 frames: Quick tests, ~5-15 seconds generation
16 frames: Standard short clips, ~20-45 seconds generation
32 frames: Longer animations, ~40-90 seconds generation
64 frames: Maximum quality (1.3B models only), ~80-180 seconds

Memory Impact:

1.3B models: Up to 64 frames
5B models: Automatically capped at 32 frames

// Quick test
videoFrames = 8

// Standard production
videoFrames = 16

// High quality (1.3B only)
videoFrames = 64

`width` and `height`

Output resolution (256-960, must be multiples of 64).

Common Resolutions:

256x256: Fastest, lowest quality
512x512: Balanced (recommended)
768x768: High quality, slower
960x960: Maximum quality, very slow

Performance vs Quality:

// Fast generation (~2 sec/frame on mid-range)
width = 256, height = 256

// Balanced (recommended)
width = 512, height = 512

// High quality (~8 sec/frame on mid-range)
width = 768, height = 768

`steps`

Number of diffusion steps (10-50).

Guidelines:

10-15 steps: Fast, lower quality
20 steps: Recommended default
25-30 steps: Higher quality
40-50 steps: Maximum quality, diminishing returns

// Fast generation
steps = 15

// Production quality
steps = 20

// Maximum quality
steps = 30

`cfgScale`

Classifier-free guidance scale (1.0-15.0). Controls adherence to prompt.

Guidelines:

1.0-3.0: Very creative, less prompt adherence
7.0: Default, balanced
10.0-12.0: Strong prompt adherence
13.0-15.0: Very strict, may over-saturate

// Creative freedom
cfgScale = 3.0

// Standard (recommended)
cfgScale = 7.0

// Strict prompt following
cfgScale = 10.0

`seed`

Random seed for reproducibility.

Guidelines:

-1: Random seed (different output each time)
0+: Fixed seed (reproducible outputs)

// Random generation
seed = -1

// Reproducible generation
seed = 42

// Generate variations
val seeds = listOf(42, 43, 44, 45)
seeds.forEach { seed ->
    val frames = sd.txt2vid(params.copy(seed = seed))
}

Advanced Parameters

`scheduler`

Diffusion scheduler algorithm.

Options:

Scheduler.EULER_A: Default, good quality and speed
Scheduler.DDIM: More deterministic, slightly slower
Scheduler.DDPM: Higher quality, slower
Scheduler.LCM: Fast inference (requires LCM-fine-tuned model)

// Default
scheduler = Scheduler.EULER_A

// Deterministic
scheduler = Scheduler.DDIM

// Quality-focused
scheduler = Scheduler.DDPM

`strength`

Denoising strength for image-to-video (0.0-1.0).

Only used with initImage for I2V/TI2V models.

Guidelines:

0.0-0.3: Subtle animation, preserves image
0.5-0.7: Moderate animation
0.8-1.0: Strong transformation

val params = VideoGenerateParams(
    prompt = "animate this scene, add motion",
    initImage = initialFrame,
    strength = 0.7f,  // Moderate animation
    videoFrames = 16
)

`initImage`

Initial frame for image-to-video generation (I2V/TI2V models only).

val initialFrame = BitmapFactory.decodeResource(resources, R.drawable.scene)

val params = VideoGenerateParams(
    prompt = "animate this image, add wind and motion",
    initImage = initialFrame,
    strength = 0.8f,
    videoFrames = 16,
    width = initialFrame.width,
    height = initialFrame.height
)

Note: Image will be resized to match width and height if needed.

Advanced Usage

Device-Aware Model Selection

Check device RAM before attempting to load:

fun checkDeviceCompatibility(context: Context): Boolean {
    val activityManager = context.getSystemService(Context.ACTIVITY_SERVICE) as ActivityManager
    val memInfo = ActivityManager.MemoryInfo()
    activityManager.getMemoryInfo(memInfo)
    val totalRamGB = memInfo.totalMem / (1024.0 * 1024.0 * 1024.0)

    if (totalRamGB < 12.0) {
        Log.w("VideoGen", "Insufficient RAM: ${String.format("%.1f", totalRamGB)}GB (12GB required)")
        return false
    }

    return true
}

// Usage
if (!checkDeviceCompatibility(this)) {
    showErrorDialog(
        "Video generation requires 12GB+ RAM. " +
        "This device has ${String.format("%.1f", totalRamGB)}GB. " +
        "Consider using cloud inference APIs instead."
    )
    return
}

// Proceed with model loading

Batch Generation

Generate multiple variations efficiently:

val baseParams = StableDiffusion.VideoGenerateParams(
    prompt = "a cat walking",
    videoFrames = 16,
    width = 512,
    height = 512,
    steps = 20
)

val variations = (0..4).map { i ->
    async(Dispatchers.IO) {
        sd.txt2vid(baseParams.copy(seed = 42 + i))
    }
}.awaitAll()

// variations now contains 5 different video sequences

Streaming to Video File

Save frames directly to MP4 using MediaCodec:

fun saveFramesToVideo(frames: List<Bitmap>, outputPath: String, fps: Int = 8) {
    val mediaCodec = MediaCodec.createEncoderByType(MediaFormat.MIMETYPE_VIDEO_AVC)
    val mediaFormat = MediaFormat.createVideoFormat(
        MediaFormat.MIMETYPE_VIDEO_AVC,
        frames.first().width,
        frames.first().height
    ).apply {
        setInteger(MediaFormat.KEY_BIT_RATE, 2000000)
        setInteger(MediaFormat.KEY_FRAME_RATE, fps)
        setInteger(MediaFormat.KEY_COLOR_FORMAT, MediaCodecInfo.CodecCapabilities.COLOR_FormatSurface)
        setInteger(MediaFormat.KEY_I_FRAME_INTERVAL, 1)
    }

    // Configure MediaMuxer and encode frames
    // See Android MediaCodec documentation for full implementation
}

Model Switching

Switch between models efficiently:

// Load first model
var sd = StableDiffusion.load(this, modelId1, filename1)
val frames1 = sd.txt2vid(params)
sd.close()

// Switch to second model
sd = StableDiffusion.load(this, modelId2, filename2)
val frames2 = sd.txt2vid(params)
sd.close()

Note: Metadata caching reduces GGUF parsing overhead on subsequent loads.

Troubleshooting

Common Issues

OutOfMemoryError

Symptoms: App crashes during generation with OOM error

Solutions:

Reduce resolution: width = 256, height = 256
Reduce frame count: videoFrames = 8
Reduce steps: steps = 15
Use smaller quantization: Q3_K_S instead of Q4_K_M
Close other apps to free RAM
Enable CPU offloading: offloadToCpu = true

// Memory-constrained configuration
val params = VideoGenerateParams(
    prompt = "...",
    videoFrames = 8,
    width = 256,
    height = 256,
    steps = 15,
    cfgScale = 7.0
)

Slow Generation

Symptoms: Generation takes >5 seconds per frame

Solutions:

Use 1.3B model instead of 5B
Reduce resolution
Reduce steps (15-20 is usually sufficient)
Enable Vulkan if on Android 11+
Close background apps

// Fast generation configuration
val params = VideoGenerateParams(
    prompt = "...",
    videoFrames = 16,
    width = 512,
    height = 512,
    steps = 15,  // Reduced from 20
    cfgScale = 7.0
)

Model Not Loading

Symptoms: FileNotFoundException or load failure

Solutions:

Verify model file exists: File(context.filesDir, "hf-models/$modelId/$filename").exists()
Check internet connection for downloads
Verify Hugging Face model ID is correct
Check storage space (5B models need ~5GB free)

// Debug model loading
try {
    val sd = StableDiffusion.load(this, modelId, filename)
    Log.d("VideoGen", "Model loaded: ${sd.getVideoModelMetadata()}")
} catch (e: Exception) {
    Log.e("VideoGen", "Load failed", e)
    // Handle error
}

Poor Quality Output

Symptoms: Blurry, artifact-heavy, or incoherent frames

Solutions:

Increase steps: 20-30
Increase resolution: 512x512 or higher
Adjust cfgScale: Try 7.0-10.0
Use Q4_K_M or higher quantization
Improve prompt specificity
Try different schedulers (DDPM for quality)

// Quality-focused configuration
val params = VideoGenerateParams(
    prompt = "detailed, high quality, cinematic scene...",
    videoFrames = 16,
    width = 768,
    height = 768,
    steps = 25,
    cfgScale = 8.0,
    scheduler = Scheduler.DDPM
)

Generation Hangs

Symptoms: Progress stops, app becomes unresponsive

Solutions:

Ensure generation runs on Dispatchers.IO
Set progress callback to monitor
Implement timeout mechanism
Call cancelGeneration() if needed

val job = CoroutineScope(Dispatchers.IO).launch {
    try {
        withTimeout(300_000) { // 5 minute timeout
            val frames = sd.txt2vid(params)
            // Process frames
        }
    } catch (e: TimeoutCancellationException) {
        sd.cancelGeneration()
        Log.e("VideoGen", "Generation timed out")
    }
}

Performance Optimization

EasyCache (for supported models)

For supported models (e.g., DiT architectures like Flux/SD3), EasyCache can significantly reduce generation time by reusing intermediate diffusion steps. LLMEdgeManager automatically detects and enables EasyCache if the loaded model supports it.

If using the low-level StableDiffusion API, you can enable EasyCache via VideoGenerateParams:

val params = VideoGenerateParams(
    // ... other params
    easyCacheParams = StableDiffusion.EasyCacheParams(
        enabled = true,
        reuseThreshold = 0.2f, // Threshold for skipping steps (0.0 - 1.0)
        startPercent = 0.15f,  // Start skipping after this percentage of steps
        endPercent = 0.95f     // Stop skipping before this percentage of steps
    )
)

Note: EasyCache is not supported for UNet-based models (like Stable Diffusion 1.5).

Memory Management

Monitor memory usage:

val runtime = Runtime.getRuntime()
val usedMemoryMB = (runtime.totalMemory() - runtime.freeMemory()) / (1024 * 1024)
Log.d("VideoGen", "Memory usage: ${usedMemoryMB}MB")

Memory-efficient generation:

// Process frames immediately instead of accumulating
sd.setProgressCallback { step, total ->
    // ... update UI
}

val frames = sd.txt2vid(params)
frames.forEachIndexed { index, frame ->
    saveFrameToDisk(frame, index)
    frame.recycle() // Free bitmap immediately
}

Batch Processing

Generate multiple videos efficiently:

// Reuse model instance
val sd = StableDiffusion.load(this, modelId, filename)

prompts.forEach { prompt ->
    val frames = sd.txt2vid(params.copy(prompt = prompt))
    processFrames(frames)
}

sd.close()

Background Processing

Use WorkManager for long generations:

class VideoGenerationWorker(context: Context, params: WorkerParameters) 
    : CoroutineWorker(context, params) {

    override suspend fun doWork(): Result {
        val sd = StableDiffusion.load(applicationContext, modelId, filename)
        val frames = sd.txt2vid(params)
        saveVideo(frames)
        sd.close()
        return Result.success()
    }
}

Vulkan Acceleration

Enable Vulkan on Android 11+:

Build library with Vulkan support:

./gradlew :llmedge:assembleRelease -Pandroid.jniCmakeArgs="-DGGML_VULKAN=ON -DSD_VULKAN=ON"

Verify Vulkan at runtime:

// Vulkan status is logged during initialization
// Check logcat for: "Vulkan initialized successfully"

Video Generation

Table of Contents

Overview

Supported Models

Official Model Source (Recommended)

API Reference

Loading Models

StableDiffusion.load()

Video Generation

txt2vid()

VideoGenerateParams

Model Introspection

isVideoModel()

getVideoModelMetadata()

Progress Monitoring

setProgressCallback()

cancelGeneration()

Resource Management

close()

Parameter Guide

Core Parameters

prompt

videoFrames

width and height

steps

cfgScale

seed

Advanced Parameters

scheduler

strength

initImage

Advanced Usage

Device-Aware Model Selection

Batch Generation

Streaming to Video File

Model Switching

Troubleshooting

Common Issues

OutOfMemoryError

Slow Generation

Model Not Loading

Poor Quality Output

Generation Hangs

Performance Optimization

EasyCache (for supported models)

Memory Management

Batch Processing

Background Processing

Vulkan Acceleration

See Also

`StableDiffusion.load()`

`txt2vid()`

`isVideoModel()`

`getVideoModelMetadata()`

`setProgressCallback()`

`cancelGeneration()`

`close()`

`prompt`

`videoFrames`

`width` and `height`

`steps`

`cfgScale`

`seed`

`scheduler`

`strength`

`initImage`