-
Notifications
You must be signed in to change notification settings - Fork 471
Feat; Add support for Wan/Qwen TAEHV decoding #937
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Sorry for the unrelated whitespace changes and the debug spam, will fix later |
|
Now tae decoding for the outputs of Wan2.1 models (and Wan2.2 A14B) works in txt2img mode. Video decoding is running as well, but the results are obviously incorrect (flashing lights warning) If someone can see what I'm doing wrong when decoding videos, let me know. |
|
Video is still completely broken, but image decoding works very well now. |
|
Results for taew2.2 are quite interesting for now. output.mp4 |
|
Wan 2.2 and Hunyuan 1.5 have 2x2 pixelshuffle on the input/output |
|
@madebyollin Yes I saw that when looking at the VAE code in |
output.mp4 |
output.mp4 |
|
The Wan 2.2 TI2V results still look broken. There's a scaling issue on ~L3600 where output_with_disabled_process_latent.mp4This gray-mush issue happens with the default VAE on output_with_official_vae_on_8f05f5bc6ee9d6aba9d1ff2be7739a5a3cf1586d.mp4 |
Co-authored-by: Ollin Boer Bohan <madebyollin@gmail.com>
Co-authored-by: Ollin Boer Bohan <madebyollin@gmail.com>
|
@madebyollin yes I figured it was probably something that after noticing how much worse the img2vid results were compared to txt2vid. I get no "gray-mush" on my end with this fix though. |
|
@stduhpf I used taehv and get results very close to the results of wan vae. Maybe this PR can be merged now? |
|
I think so too. I haven't tested every possible use case though (for example VACE). |



https://github.com/madebyollin/taehv
Model weights:
.\bin\Release\sd.exe --diffusion-model ..\..\ComfyUI\models\diffusion_models\qwen-image-Q8_0.gguf --vae ..\..\ComfyUI\models\vae\qwen_image_vae.safetensors --qwen2vl ..\..\ComfyUI\models\text_encoders\Qwen2.5-VL-7B-Instruct-Q8_0.gguf -p '一个穿着"QWEN"标志的T恤的中国美女正拿着黑色的马克笔面相镜头微笑。她身后的玻璃板上手写体写着 “一、Qwen-Image的技术路线: 探索视觉生成基础模型的极限,开创理解与生成一体化的未来。二、Qwen-Image的模型特色:1、复杂文字渲染。支持中英渲染、自动布局; 2、精准图像编辑。支持文字编辑、物体增减、风格变换。三、Qwen-Image的未来愿景:赋能专业内容创作、助力生成式AI发展。”' --cfg-scale 2.5 --sampling-method euler -v --offload-to-cpu -H 1024 -W 1024 --diffusion-fa --flow-shift 3 --tae ..\ComfyUI\models\vae_approx\taew2_1.pth --vae-conv-direct.\bin\Release\sd-cli.exe -M vid_gen --diffusion-model '..\..\ComfyUI\models\unet\Wan2.2-TI2V-5B-Q8_0.gguf' --t5xxl ..\..\ComfyUI\models\clip\t5\umt5-xxl-encoder-Q8_0.gguf --tae ..\..\ComfyUI\models\vae_approx\taew2_2.pth -p "The woman drops the marker, and then she starts laughing a bit" -n "色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走" --cfg-scale 5.0 --sampling-method euler -v -W 768 -H 768 --color --video-frames 49 -i .\image.png --vae-conv-direct --scheduler smoothstep --steps 50 --fps 24 --diffusion-faoutput.mp4
Speedup and memory saving aren't that impressive yet, maybe it can be improved further?