Qwen-Image-Layered
Model Introduction
This model is trained based on Qwen/Qwen-Image-Layered using the dataset artplus/PrismLayersPro, enabling text-controlled extraction of segmented image layers.
For more details about training strategies and implementation, feel free to check our technical blog.
Usage Tips
- The model architecture has been modified from multi-image output to single-image output, producing only the layer relevant to the textual description.
- The model was trained exclusively on English text but inherits Chinese language understanding capabilities from the base model.
- The native training resolution is 1024x1024; however, inference at other resolutions is supported.
- The model struggles to separate multiple overlapping entities (e.g., the cartoon skeleton and hat in the examples).
- The model excels at decomposing poster-like images but performs poorly on photographic images, especially those involving complex lighting and shadows.
- Negative prompts are supported—use them to specify content you want excluded from the output.
Demo Examples
Some images contain white text on light backgrounds. Users of ModelScope community should click the "☀︎" icon at the top-right corner to switch to dark mode for better visibility.
Example 1
Example 2
Example 3
Inference Code
Install DiffSynth-Studio:
git clone https://github.com/modelscope/DiffSynth-Studio.git
cd DiffSynth-Studio
pip install -e .
Model Inference:
from diffsynth.pipelines.qwen_image import QwenImagePipeline, ModelConfig
from PIL import Image
import torch, requests
pipe = QwenImagePipeline.from_pretrained(
torch_dtype=torch.bfloat16,
device="cuda",
model_configs=[
ModelConfig(model_id="DiffSynth-Studio/Qwen-Image-Layered-Control", origin_file_pattern="transformer/diffusion_pytorch_model*.safetensors"),
ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="text_encoder/model*.safetensors"),
ModelConfig(model_id="Qwen/Qwen-Image-Layered", origin_file_pattern="vae/diffusion_pytorch_model.safetensors"),
],
processor_config=ModelConfig(model_id="Qwen/Qwen-Image-Edit", origin_file_pattern="processor/"),
)
prompt = "A cartoon skeleton character wearing a purple hat and holding a gift box"
input_image = requests.get("https://modelscope.oss-cn-beijing.aliyuncs.com/resource/images/trick_or_treat.png", stream=True).raw
input_image = Image.open(input_image).convert("RGBA").resize((1024, 1024))
input_image.save("image_input.png")
images = pipe(
prompt,
seed=0,
num_inference_steps=30, cfg_scale=4,
height=1024, width=1024,
layer_input_image=input_image,
layer_num=0,
)
images[0].save("image.png")
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support


















