跳转至

潜在一致性模型(Latent Consistency Model)

译者:疾风兔X

项目地址:https://huggingface.apachecn.org/docs/diffusers/using-diffusers/inference_with_lcm

原始地址:https://huggingface.co/docs/diffusers/using-diffusers/inference_with_lcm

在 Colab 中打开

在 Studio 实验室中打开

潜在一致性模型 (LCM) 通常只需 2-4 个步骤即可生成高质量的图像,从而可以在几乎实时的设置中使用扩散模型。

官方网站

LCM 只需 4,000 个训练步骤(~32 个 A100 GPU 小时)即可从任何预训练的稳定扩散 (SD) 中提取出来,只需 2~4 个步骤甚至一个步骤即可生成高质量的 768 x 768 分辨率图像,从而显著加快文本到图像的生成速度。我们使用 LCM 在短短 4,000 次训练迭代中提炼出 Dreamshaper-V7 版本的 SD。

有关 LCM 的更多技术概述,请参阅白皮书

LCM 蒸馏模型可用于 stable-diffusion-v1-5stable-diffusion-xl-base-1.0SSD-1B 模型。所有检查点都可以在此集合中找到。

本指南介绍如何使用 LCM 对以下

  • 文生图
  • 图生图
  • 结合 style LoRA
  • ControlNet/T2I-Adapter

文生图(Text-to-image)

你将 StableDiffusionXLPipeline 流水线(pipeline)与 LCMScheduler 一起使用,然后加载 LCM-LoRA。与 LCM-LoRA 和调度器一起,该流水线(pipeline)可实现快速推理工作流程,克服扩散模型的缓慢迭代特性。

from diffusers import StableDiffusionXLPipeline, UNet2DConditionModel, LCMScheduler
import torch

unet = UNet2DConditionModel.from_pretrained(
    "latent-consistency/lcm-sdxl",
    torch_dtype=torch.float16,
    variant="fp16",
)
pipe = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", unet=unet, torch_dtype=torch.float16, variant="fp16",
).to("cuda")
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)

prompt = "Self-portrait oil painting, a beautiful cyborg with golden hair, 8k"

generator = torch.manual_seed(0)
image = pipe(
    prompt=prompt, num_inference_steps=4, generator=generator, guidance_scale=8.0
).images[0]

注意,我们在生成过程中只使用了4步,这比标准的SDXL要少得多。

一些需要注意的细节:

  • 为了执行无分类器引导(classifier-free guidance),通常会在管线内部将批次大小翻倍。然而,LCM(latent conditioning model,潜在条件模型)则是通过引导嵌入来应用引导,因此在这种情况下不必将批次大小翻倍。这导致推理时间更快,但缺点是负面提示对去噪过程没有任何影响。
  • UNet使用[3.,13.]指导量表范围进行训练。UNet是在[3.,13.]的引导尺度范围内进行训练的。因此,这是引导尺度(guidance_scale)理想范围。然而,在大多数情况下,使用1.0的值来禁用引导尺度(guidance_scale)也是有效的。

图生图(Image-to-image)

LCM 也可以应用于图像到图像任务。在此示例中,我们将使用 LCM_Dreamshaper_v7 模型,但相同的步骤也可以应用于其他 LCM 模型。

import torch
from diffusers import AutoPipelineForImage2Image, UNet2DConditionModel, LCMScheduler
from diffusers.utils import make_image_grid, load_image

unet = UNet2DConditionModel.from_pretrained(
    "SimianLuo/LCM_Dreamshaper_v7",
    subfolder="unet",
    torch_dtype=torch.float16,
)

pipe = AutoPipelineForImage2Image.from_pretrained(
    "Lykon/dreamshaper-7",
    unet=unet,
    torch_dtype=torch.float16,
    variant="fp16",
).to("cuda")
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)

# prepare image
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/img2img-init.png"
init_image = load_image(url)
prompt = "Astronauts in a jungle, cold color palette, muted colors, detailed, 8k"

# pass prompt and image to pipeline
generator = torch.manual_seed(0)
image = pipe(
    prompt,
    image=init_image,
    num_inference_steps=4,
    guidance_scale=7.5,
    strength=0.5,
    generator=generator
).images[0]
make_image_grid([init_image, image], rows=1, cols=2)

您可以根据提示和提供的图像获得不同的结果。为了获得最佳结果,我们建议对num_inference_stepsstrengthguidance_scale 参数尝试不同的值,然后选择最佳值。

与风格 LoRA 结合(Combine with style LoRAs)

LCM 可以与其他样式的 LoRA 一起使用,只需很少的步骤 (4-8) 即可生成样式图像。在以下示例中,我们将使用剪纸 LoRA。

from diffusers import StableDiffusionXLPipeline, UNet2DConditionModel, LCMScheduler
import torch

unet = UNet2DConditionModel.from_pretrained(
    "latent-consistency/lcm-sdxl",
    torch_dtype=torch.float16,
    variant="fp16",
)
pipe = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", unet=unet, torch_dtype=torch.float16, variant="fp16",
).to("cuda")
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)

prompt = "Self-portrait oil painting, a beautiful cyborg with golden hair, 8k"

generator = torch.manual_seed(0)
image = pipe(
    prompt=prompt, num_inference_steps=4, generator=generator, guidance_scale=8.0
).images[0]

ControlNet/T2I-Adapter

让我们看看如何使用 ControlNet/T2I-Adapter 和 LCM 进行推理。

ControlNet

在此示例中,我们将使用 LCM_Dreamshaper_v7 模型结合Canny ControlNet ,但相同的步骤也可以应用于其他 LCM 模型。

import torch
import cv2
import numpy as np
from PIL import Image

from diffusers import StableDiffusionControlNetPipeline, ControlNetModel, LCMScheduler
from diffusers.utils import load_image, make_image_grid

image = load_image(
    "https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png"
).resize((512, 512))

image = np.array(image)

low_threshold = 100
high_threshold = 200

image = cv2.Canny(image, low_threshold, high_threshold)
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
canny_image = Image.fromarray(image)

controlnet = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny", torch_dtype=torch.float16)
pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "SimianLuo/LCM_Dreamshaper_v7",
    controlnet=controlnet,
    torch_dtype=torch.float16,
    safety_checker=None,
).to("cuda")

# set scheduler
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)

generator = torch.manual_seed(0)
image = pipe(
    "the mona lisa",
    image=canny_image,
    num_inference_steps=4,
    generator=generator,
).images[0]
make_image_grid([canny_image, image], rows=1, cols=2)

此示例中的推理参数可能不适用于所有示例,因此我们建议对num_inference_stepsguidance_scalecontrolnet_conditioning_scalecross_attention_kwargs参数尝试不同的值,然后选择最佳值。

T2I-Adapter

此示例说明如何将lcm-sdxlCanny T2I 适配器一起使用。

import torch
import cv2
import numpy as np
from PIL import Image

from diffusers import StableDiffusionXLAdapterPipeline, UNet2DConditionModel, T2IAdapter, LCMScheduler
from diffusers.utils import load_image, make_image_grid

# Prepare image
# Detect the canny map in low resolution to avoid high-frequency details
image = load_image(
    "https://huggingface.co/Adapter/t2iadapter/resolve/main/figs_SDXLV1.0/org_canny.jpg"
).resize((384, 384))

image = np.array(image)

low_threshold = 100
high_threshold = 200

image = cv2.Canny(image, low_threshold, high_threshold)
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
canny_image = Image.fromarray(image).resize((1024, 1216))

# load adapter
adapter = T2IAdapter.from_pretrained("TencentARC/t2i-adapter-canny-sdxl-1.0", torch_dtype=torch.float16, varient="fp16").to("cuda")

unet = UNet2DConditionModel.from_pretrained(
    "latent-consistency/lcm-sdxl",
    torch_dtype=torch.float16,
    variant="fp16",
)
pipe = StableDiffusionXLAdapterPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    unet=unet,
    adapter=adapter,
    torch_dtype=torch.float16,
    variant="fp16", 
).to("cuda")

pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)

prompt = "Mystical fairy in real, magic, 4k picture, high quality"
negative_prompt = "extra digit, fewer digits, cropped, worst quality, low quality, glitch, deformed, mutated, ugly, disfigured"

generator = torch.manual_seed(0)
image = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    image=canny_image,
    num_inference_steps=4,
    guidance_scale=5,
    adapter_conditioning_scale=0.8, 
    adapter_conditioning_factor=1,
    generator=generator,
).images[0]
grid = make_image_grid([canny_image, image], rows=1, cols=2)


我们一直在努力

apachecn/AiLearning

【布客】中文翻译组