潜在一致性模型（Latent Consistency Model）

译者：疾风兔X

项目地址：https://huggingface.apachecn.org/docs/diffusers/using-diffusers/inference_with_lcm

原始地址：https://huggingface.co/docs/diffusers/using-diffusers/inference_with_lcm

在 Studio 实验室中打开

潜在一致性模型（LCM）通常只需 2-4 个步骤即可生成高质量的图像，从而可以在几乎实时的设置中使用扩散模型。

从官方网站：

LCM 只需 4,000 个训练步骤（~32 个 A100 GPU 小时）即可从任何预训练的稳定扩散（SD）中提取出来，只需 2~4 个步骤甚至一个步骤即可生成高质量的 768 x 768 分辨率图像，从而显著加快文本到图像的生成速度。我们使用 LCM 在短短 4,000 次训练迭代中提炼出 Dreamshaper-V7 版本的 SD。

有关 LCM 的更多技术概述，请参阅白皮书。

LCM 蒸馏模型可用于 stable-diffusion-v1-5、stable-diffusion-xl-base-1.0 和 SSD-1B 模型。所有检查点都可以在此集合中找到。

本指南介绍如何使用 LCM 对以下

文生图
图生图
结合 style LoRA
ControlNet/T2I-Adapter

文生图（Text-to-image）

你将 StableDiffusionXLPipeline 流水线（pipeline）与 LCMScheduler 一起使用，然后加载 LCM-LoRA。与 LCM-LoRA 和调度器一起，该流水线（pipeline）可实现快速推理工作流程，克服扩散模型的缓慢迭代特性。

from diffusers import StableDiffusionXLPipeline, UNet2DConditionModel, LCMScheduler
import torch

unet = UNet2DConditionModel.from_pretrained(
    "latent-consistency/lcm-sdxl",
    torch_dtype=torch.float16,
    variant="fp16",
)
pipe = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", unet=unet, torch_dtype=torch.float16, variant="fp16",
).to("cuda")
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)

prompt = "Self-portrait oil painting, a beautiful cyborg with golden hair, 8k"

generator = torch.manual_seed(0)
image = pipe(
    prompt=prompt, num_inference_steps=4, generator=generator, guidance_scale=8.0
).images[0]

注意，我们在生成过程中只使用了4步，这比标准的SDXL要少得多。

一些需要注意的细节：

为了执行无分类器引导（classifier-free guidance），通常会在管线内部将批次大小翻倍。然而，LCM（latent conditioning model，潜在条件模型）则是通过引导嵌入来应用引导，因此在这种情况下不必将批次大小翻倍。这导致推理时间更快，但缺点是负面提示对去噪过程没有任何影响。
UNet使用[3.，13.]指导量表范围进行训练。UNet是在[3.，13.]的引导尺度范围内进行训练的。因此，这是引导尺度(guidance_scale)理想范围。然而，在大多数情况下，使用1.0的值来禁用引导尺度(guidance_scale)也是有效的。

图生图（Image-to-image）

LCM 也可以应用于图像到图像任务。在此示例中，我们将使用 LCM_Dreamshaper_v7 模型，但相同的步骤也可以应用于其他 LCM 模型。

import torch
from diffusers import AutoPipelineForImage2Image, UNet2DConditionModel, LCMScheduler
from diffusers.utils import make_image_grid, load_image

unet = UNet2DConditionModel.from_pretrained(
    "SimianLuo/LCM_Dreamshaper_v7",
    subfolder="unet",
    torch_dtype=torch.float16,
)

pipe = AutoPipelineForImage2Image.from_pretrained(
    "Lykon/dreamshaper-7",
    unet=unet,
    torch_dtype=torch.float16,
    variant="fp16",
).to("cuda")
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)

# prepare image
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/img2img-init.png"
init_image = load_image(url)
prompt = "Astronauts in a jungle, cold color palette, muted colors, detailed, 8k"

# pass prompt and image to pipeline
generator = torch.manual_seed(0)
image = pipe(
    prompt,
    image=init_image,
    num_inference_steps=4,
    guidance_scale=7.5,
    strength=0.5,
    generator=generator
).images[0]
make_image_grid([init_image, image], rows=1, cols=2)

您可以根据提示和提供的图像获得不同的结果。为了获得最佳结果，我们建议对num_inference_steps 、strength 和 guidance_scale 参数尝试不同的值，然后选择最佳值。

与风格 LoRA 结合（Combine with style LoRAs）

LCM 可以与其他样式的 LoRA 一起使用，只需很少的步骤（4-8）即可生成样式图像。在以下示例中，我们将使用剪纸 LoRA。

from diffusers import StableDiffusionXLPipeline, UNet2DConditionModel, LCMScheduler
import torch

unet = UNet2DConditionModel.from_pretrained(
    "latent-consistency/lcm-sdxl",
    torch_dtype=torch.float16,
    variant="fp16",
)
pipe = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", unet=unet, torch_dtype=torch.float16, variant="fp16",
).to("cuda")
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)

prompt = "Self-portrait oil painting, a beautiful cyborg with golden hair, 8k"

generator = torch.manual_seed(0)
image = pipe(
    prompt=prompt, num_inference_steps=4, generator=generator, guidance_scale=8.0
).images[0]

ControlNet/T2I-Adapter

让我们看看如何使用 ControlNet/T2I-Adapter 和 LCM 进行推理。

ControlNet

在此示例中，我们将使用 LCM_Dreamshaper_v7 模型结合Canny ControlNet ，但相同的步骤也可以应用于其他 LCM 模型。

import torch
import cv2
import numpy as np
from PIL import Image

from diffusers import StableDiffusionControlNetPipeline, ControlNetModel, LCMScheduler
from diffusers.utils import load_image, make_image_grid

image = load_image(
    "https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png"
).resize((512, 512))

image = np.array(image)

low_threshold = 100
high_threshold = 200

image = cv2.Canny(image, low_threshold, high_threshold)
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
canny_image = Image.fromarray(image)

controlnet = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny", torch_dtype=torch.float16)
pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "SimianLuo/LCM_Dreamshaper_v7",
    controlnet=controlnet,
    torch_dtype=torch.float16,
    safety_checker=None,
).to("cuda")

# set scheduler
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)

generator = torch.manual_seed(0)
image = pipe(
    "the mona lisa",
    image=canny_image,
    num_inference_steps=4,
    generator=generator,
).images[0]
make_image_grid([canny_image, image], rows=1, cols=2)

此示例中的推理参数可能不适用于所有示例，因此我们建议对num_inference_steps、guidance_scale、controlnet_conditioning_scale和cross_attention_kwargs参数尝试不同的值，然后选择最佳值。

T2I-Adapter

此示例说明如何将lcm-sdxl与 Canny T2I 适配器一起使用。

import torch
import cv2
import numpy as np
from PIL import Image

from diffusers import StableDiffusionXLAdapterPipeline, UNet2DConditionModel, T2IAdapter, LCMScheduler
from diffusers.utils import load_image, make_image_grid

# Prepare image
# Detect the canny map in low resolution to avoid high-frequency details
image = load_image(
    "https://huggingface.co/Adapter/t2iadapter/resolve/main/figs_SDXLV1.0/org_canny.jpg"
).resize((384, 384))

image = np.array(image)

low_threshold = 100
high_threshold = 200

image = cv2.Canny(image, low_threshold, high_threshold)
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
canny_image = Image.fromarray(image).resize((1024, 1216))

# load adapter
adapter = T2IAdapter.from_pretrained("TencentARC/t2i-adapter-canny-sdxl-1.0", torch_dtype=torch.float16, varient="fp16").to("cuda")

unet = UNet2DConditionModel.from_pretrained(
    "latent-consistency/lcm-sdxl",
    torch_dtype=torch.float16,
    variant="fp16",
)
pipe = StableDiffusionXLAdapterPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    unet=unet,
    adapter=adapter,
    torch_dtype=torch.float16,
    variant="fp16", 
).to("cuda")

pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)

prompt = "Mystical fairy in real, magic, 4k picture, high quality"
negative_prompt = "extra digit, fewer digits, cropped, worst quality, low quality, glitch, deformed, mutated, ugly, disfigured"

generator = torch.manual_seed(0)
image = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    image=canny_image,
    num_inference_steps=4,
    guidance_scale=5,
    adapter_conditioning_scale=0.8, 
    adapter_conditioning_factor=1,
    generator=generator,
).images[0]
grid = make_image_grid([canny_image, image], rows=1, cols=2)

我们一直在努力

apachecn/AiLearning