Feat; Add support for Wan/Qwen TAEHV decoding #937

stduhpf · 2025-11-03T15:25:59Z

Model weights:

https://github.com/madebyollin/taehv/blob/main/taew2_1.pth (wan2.1, wan2.2 a14B, Qwen)
https://github.com/madebyollin/taehv/blob/main/taew2_2.pth (wan2.2 5B)

.\bin\Release\sd.exe --diffusion-model ..\..\ComfyUI\models\diffusion_models\qwen-image-Q8_0.gguf --vae ..\..\ComfyUI\models\vae\qwen_image_vae.safetensors --qwen2vl ..\..\ComfyUI\models\text_encoders\Qwen2.5-VL-7B-Instruct-Q8_0.gguf -p '一个穿着"QWEN"标志的T恤的中国美女正拿着黑色的马克笔面相镜头微笑。她身后的玻璃板上手写体写着 “一、Qwen-Image的技术路线：探索视觉生成基础模型的极限，开创理解与生成一体化的未来。二、Qwen-Image的模型特色：1、复杂文字渲染。支持中英渲染、自动布局； 2、精准图像编辑。支持文字编辑、物体增减、风格变换。三、Qwen-Image的未来愿景：赋能专业内容创作、助力生成式AI发展。”' --cfg-scale 2.5 --sampling-method euler -v --offload-to-cpu -H 1024 -W 1024 --diffusion-fa --flow-shift 3 --tae ..\ComfyUI\models\vae_approx\taew2_1.pth --vae-conv-direct

.\bin\Release\sd-cli.exe -M vid_gen --diffusion-model '..\..\ComfyUI\models\unet\Wan2.2-TI2V-5B-Q8_0.gguf' --t5xxl ..\..\ComfyUI\models\clip\t5\umt5-xxl-encoder-Q8_0.gguf --tae ..\..\ComfyUI\models\vae_approx\taew2_2.pth -p "The woman drops the marker, and then she starts laughing a bit" -n "色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走" --cfg-scale 5.0 --sampling-method euler -v -W 768 -H 768 --color --video-frames 49 -i .\image.png --vae-conv-direct --scheduler smoothstep --steps 50 --fps 24 --diffusion-fa

output.mp4

Speedup and memory saving aren't that impressive yet, maybe it can be improved further?

stduhpf · 2025-11-03T15:28:49Z

Sorry for the unrelated whitespace changes and the debug spam, will fix later

stduhpf · 2025-11-03T21:04:53Z

Oh a new version of the taew2.1 weights just came out, coincidentally.

Old Weights	New Weights

stduhpf · 2025-11-03T23:17:52Z

Now tae decoding for the outputs of Wan2.1 models (and Wan2.2 A14B) works in txt2img mode.

Video decoding is running as well, but the results are obviously incorrect (flashing lights warning)

If someone can see what I'm doing wrong when decoding videos, let me know.

madebyollin · 2025-12-11T00:03:14Z

After fixing the three bugs mentioned in review, image results look correct (tested on GH200 with -DSD_CUDA=ON). I didn't check video.

diffs

diff --git a/tae.hpp b/tae.hpp
index ad0bd37..6a7951f 100644
--- a/tae.hpp
+++ b/tae.hpp
@@ -224,7 +224,7 @@ public:
         h      = conv1->forward(ctx, h);
         h      = ggml_relu_inplace(ctx->ggml_ctx, h);
         h      = conv2->forward(ctx, h);
-        h      = ggml_relu_inplace(ctx->ggml_ctx, h);
+        // h      = ggml_relu_inplace(ctx->ggml_ctx, h);
 
         auto skip = x;
         if (has_skip_conv) {
@@ -323,7 +323,7 @@ public:
         for (int i = 0; i < num_layers; i++) {
             for (int j = 0; j < num_blocks; j++) {
                 auto block = std::dynamic_pointer_cast<MemBlock>(blocks[std::to_string(index++)]);
-                auto mem   = ggml_pad(ctx->ggml_ctx, h, 0, 0, 0, 1);
+                auto mem   = ggml_pad_ext(ctx->ggml_ctx, h, 0, 0, 0, 0, 0, 0, 1, 0);
                 mem        = ggml_view_4d(ctx->ggml_ctx, mem, h->ne[0], h->ne[1], h->ne[2], h->ne[3], h->nb[1], h->nb[2], h->nb[3], 0);
                 h          = block->forward(ctx, h, mem);
             }
@@ -341,7 +341,7 @@ public:
         h              = last_conv->forward(ctx, h);
 
         // shape(W, H, 3, T+3) => shape(W, H, 3, T)
-        h = ggml_view_4d(ctx->ggml_ctx, h, h->ne[0], h->ne[1], h->ne[2], h->ne[3] - 3, h->nb[1], h->nb[2], h->nb[3], 0);
+        h = ggml_view_4d(ctx->ggml_ctx, h, h->ne[0], h->ne[1], h->ne[2], h->ne[3] - 3, h->nb[1], h->nb[2], h->nb[3], 3*h->nb[3]);
         return h;

tae.hpp

stduhpf · 2025-12-11T19:13:25Z

Video is still completely broken, but image decoding works very well now.

stable-diffusion.cpp

stduhpf · 2025-12-13T21:32:21Z

Results for taew2.2 are quite interesting for now.

output.mp4

madebyollin · 2025-12-13T21:44:16Z

Wan 2.2 and Hunyuan 1.5 have 2x2 pixelshuffle on the input/output

stduhpf · 2025-12-13T21:50:07Z

@madebyollin Yes I saw that when looking at the VAE code in wan.hpp, I'm on it

stduhpf · 2025-12-13T22:11:42Z

output.mp4

stduhpf · 2025-12-13T22:32:32Z

output.mp4

madebyollin · 2025-12-14T20:44:12Z

The Wan 2.2 TI2V results still look broken. There's a scaling issue on ~L3600 where sd_ctx->sd->process_latent_out(init_latent); and sd_ctx->sd->process_latent_in(init_latent); are incorrectly called even when using TAEW2.2. After fixing that, initial frame results look correctly-scaled but the video deteriorates into gray mush:

output_with_disabled_process_latent.mp4

This gray-mush issue happens with the default VAE on 8f05f5bc6ee9d6aba9d1ff2be7739a5a3cf1586d (before this PR) so fixing it is likely out of scope for this PR.

output_with_official_vae_on_8f05f5bc6ee9d6aba9d1ff2be7739a5a3cf1586d.mp4

Co-authored-by: Ollin Boer Bohan <madebyollin@gmail.com>

stduhpf · 2025-12-15T00:57:41Z

@madebyollin yes I figured it was probably something that after noticing how much worse the img2vid results were compared to txt2vid. I get no "gray-mush" on my end with this fix though.

leejet · 2025-12-15T16:38:01Z

@stduhpf I used taehv and get results very close to the results of wan vae. Maybe this PR can be merged now?

stduhpf · 2025-12-15T16:42:33Z

I think so too. I haven't tested every possible use case though (for example VACE).

stduhpf mentioned this pull request Nov 6, 2025

[Bug] TAESD with WAN-2.1 and 2.2 dump core #946

Open

CarlGao4 mentioned this pull request Dec 9, 2025

[Feature] TAEHV Support with WAN weights [TAEW2_2] #1069

Open

madebyollin suggested changes Dec 11, 2025

View reviewed changes

tae.hpp Outdated Show resolved Hide resolved

tae.hpp Outdated Show resolved Hide resolved

tae.hpp Outdated Show resolved Hide resolved

stduhpf force-pushed the taehv branch from d04fd90 to fde734b Compare December 11, 2025 18:30

madebyollin reviewed Dec 13, 2025

View reviewed changes

stable-diffusion.cpp Outdated Show resolved Hide resolved

stduhpf force-pushed the taehv branch from 16178ca to 1cbcca2 Compare December 13, 2025 21:09

stduhpf marked this pull request as ready for review December 13, 2025 21:10

stduhpf and others added 12 commits December 15, 2025 01:36

Add support for Wan2.1 TAEHV decoding

9dc54ee

--tae instead of --taesd

85607ea

progress towards video support

d7fc012

Wan2.1 decode not crashing anymore (still broken)

d6920cc

Less broken video decode + remove log spam

11470b7

Taehv fixes

9842e34

Co-authored-by: Ollin Boer Bohan <madebyollin@gmail.com>

Adapt to lastest changes

2fc458b

taew2.1 encode support

2162441

fix permute ctx for videeo decoding

9abc513

Co-authored-by: Ollin Boer Bohan <madebyollin@gmail.com>

taehv: support patchified latents

068d928

fix patched pixels order

e4cbcdc

taehv: patchify encode

a7a791d

stduhpf force-pushed the taehv branch from 0b988dd to a7a791d Compare December 15, 2025 02:11

fix img2vid

6a653e6

stduhpf force-pushed the taehv branch from 23fb870 to 6a653e6 Compare December 15, 2025 02:14

Feat; Add support for Wan/Qwen TAEHV decoding #937

Are you sure you want to change the base?

Feat; Add support for Wan/Qwen TAEHV decoding #937

Conversation

stduhpf commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stduhpf commented Nov 3, 2025

Uh oh!

stduhpf commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stduhpf commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

madebyollin commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

stduhpf commented Dec 11, 2025

Uh oh!

Uh oh!

stduhpf commented Dec 13, 2025

Uh oh!

madebyollin commented Dec 13, 2025

Uh oh!

stduhpf commented Dec 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stduhpf commented Dec 13, 2025

Uh oh!

stduhpf commented Dec 13, 2025

Uh oh!

madebyollin commented Dec 14, 2025

Uh oh!

stduhpf commented Dec 15, 2025

Uh oh!

leejet commented Dec 15, 2025

Uh oh!

stduhpf commented Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

stduhpf commented Nov 3, 2025 •

edited

Loading

stduhpf commented Nov 3, 2025 •

edited

Loading

stduhpf commented Nov 3, 2025 •

edited

Loading

madebyollin commented Dec 11, 2025 •

edited

Loading

stduhpf commented Dec 13, 2025 •

edited

Loading