Shap-E
译者:片刻小哥哥
项目地址:https://huggingface.apachecn.org/docs/diffusers/api/pipelines/shap_e
原始地址:https://huggingface.co/docs/diffusers/api/pipelines/shap_e
The Shap-E model was proposed in Shap-E: Generating Conditional 3D Implicit Functions by Alex Nichol and Heewon Jun from OpenAI .
The abstract from the paper is:
We present Shap-E, a conditional generative model for 3D assets. Unlike recent work on 3D generative models which produce a single output representation, Shap-E directly generates the parameters of implicit functions that can be rendered as both textured meshes and neural radiance fields. We train Shap-E in two stages: first, we train an encoder that deterministically maps 3D assets into the parameters of an implicit function; second, we train a conditional diffusion model on outputs of the encoder. When trained on a large dataset of paired 3D and text data, our resulting models are capable of generating complex and diverse 3D assets in a matter of seconds. When compared to Point-E, an explicit generative model over point clouds, Shap-E converges faster and reaches comparable or better sample quality despite modeling a higher-dimensional, multi-representation output space.
The original codebase can be found at openai/shap-e .
See the reuse components across pipelines section to learn how to efficiently load the same components into multiple pipelines.
ShapEPipeline
class
diffusers.
ShapEPipeline
[<
source
](https://github.com/huggingface/diffusers/blob/v0.23.0/src/diffusers/pipelines/shap_e/pipeline_shap_e.py#L79)
(
prior
: PriorTransformer
text_encoder
: CLIPTextModelWithProjection
tokenizer
: CLIPTokenizer
scheduler
: HeunDiscreteScheduler
shap_e_renderer
: ShapERenderer
)
Parameters
- prior ( PriorTransformer ) — The canonical unCLIP prior to approximate the image embedding from the text embedding.
- text_encoder ( CLIPTextModelWithProjection ) — Frozen text-encoder.
- tokenizer
(
CLIPTokenizer
) —
A
CLIPTokenizerto tokenize text. - scheduler
(
HeunDiscreteScheduler
) —
A scheduler to be used in combination with the
priormodel to generate image embedding. - shap_e_renderer
(
ShapERenderer) — Shap-E renderer projects the generated latents into parameters of a MLP to create 3D objects with the NeRF rendering method.
Pipeline for generating latent representation of a 3D asset and rendering with the NeRF method.
This model inherits from DiffusionPipeline . Check the superclass documentation for the generic methods implemented for all pipelines (downloading, saving, running on a particular device, etc.).
__call__
[<
source
](https://github.com/huggingface/diffusers/blob/v0.23.0/src/diffusers/pipelines/shap_e/pipeline_shap_e.py#L182)
(
prompt
: str
num_images_per_prompt
: int = 1
num_inference_steps
: int = 25
generator
: typing.Union[torch._C.Generator, typing.List[torch._C.Generator], NoneType] = None
latents
: typing.Optional[torch.FloatTensor] = None
guidance_scale
: float = 4.0
frame_size
: int = 64
output_type
: typing.Optional[str] = 'pil'
return_dict
: bool = True
)
→
export const metadata = 'undefined';
ShapEPipelineOutput
or
tuple
Parameters
- prompt
(
strorList[str]) — The prompt or prompts to guide the image generation. - num_images_per_prompt
(
int, optional , defaults to 1) — The number of images to generate per prompt. - num_inference_steps
(
int, optional , defaults to 25) — The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference. - generator
(
torch.GeneratororList[torch.Generator], optional ) — Atorch.Generatorto make generation deterministic. - latents
(
torch.FloatTensor, optional ) — Pre-generated noisy latents sampled from a Gaussian distribution, to be used as inputs for image generation. Can be used to tweak the same generation with different prompts. If not provided, a latents tensor is generated by sampling using the supplied randomgenerator. - guidance_scale
(
float, optional , defaults to 4.0) — A higher guidance scale value encourages the model to generate images closely linked to the textpromptat the expense of lower image quality. Guidance scale is enabled whenguidance_scale > 1. - frame_size
(
int, optional , default to 64) — The width and height of each image frame of the generated 3D output. - output_type
(
str, optional , defaults to"pil") — The output format of the generated image. Choose between"pil"(PIL.Image.Image),"np"(np.array),"latent"(torch.Tensor), or mesh (MeshDecoderOutput). - return_dict
(
bool, optional , defaults toTrue) — Whether or not to return a ShapEPipelineOutput instead of a plain tuple.
Returns
export const metadata = 'undefined';
ShapEPipelineOutput
or
tuple
export const metadata = 'undefined';
If
return_dict
is
True
,
ShapEPipelineOutput
is returned,
otherwise a
tuple
is returned where the first element is a list with the generated images.
The call function to the pipeline for generation.
Examples:
>>> import torch
>>> from diffusers import DiffusionPipeline
>>> from diffusers.utils import export_to_gif
>>> device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
>>> repo = "openai/shap-e"
>>> pipe = DiffusionPipeline.from_pretrained(repo, torch_dtype=torch.float16)
>>> pipe = pipe.to(device)
>>> guidance_scale = 15.0
>>> prompt = "a shark"
>>> images = pipe(
... prompt,
... guidance_scale=guidance_scale,
... num_inference_steps=64,
... frame_size=256,
... ).images
>>> gif_path = export_to_gif(images[0], "shark\_3d.gif")
ShapEImg2ImgPipeline
class
diffusers.
ShapEImg2ImgPipeline
[<
source
](https://github.com/huggingface/diffusers/blob/v0.23.0/src/diffusers/pipelines/shap_e/pipeline_shap_e_img2img.py#L80)
(
prior
: PriorTransformer
image_encoder
: CLIPVisionModel
image_processor
: CLIPImageProcessor
scheduler
: HeunDiscreteScheduler
shap_e_renderer
: ShapERenderer
)
Parameters
- prior ( PriorTransformer ) — The canonincal unCLIP prior to approximate the image embedding from the text embedding.
- image_encoder ( CLIPVisionModel ) — Frozen image-encoder.
- image_processor
(
CLIPImageProcessor
) —
A
CLIPImageProcessorto process images. - scheduler
(
HeunDiscreteScheduler
) —
A scheduler to be used in combination with the
priormodel to generate image embedding. - shap_e_renderer
(
ShapERenderer) — Shap-E renderer projects the generated latents into parameters of a MLP to create 3D objects with the NeRF rendering method.
Pipeline for generating latent representation of a 3D asset and rendering with the NeRF method from an image.
This model inherits from DiffusionPipeline . Check the superclass documentation for the generic methods implemented for all pipelines (downloading, saving, running on a particular device, etc.).
__call__
[<
source
](https://github.com/huggingface/diffusers/blob/v0.23.0/src/diffusers/pipelines/shap_e/pipeline_shap_e_img2img.py#L164)
(
image
: typing.Union[PIL.Image.Image, typing.List[PIL.Image.Image]]
num_images_per_prompt
: int = 1
num_inference_steps
: int = 25
generator
: typing.Union[torch._C.Generator, typing.List[torch._C.Generator], NoneType] = None
latents
: typing.Optional[torch.FloatTensor] = None
guidance_scale
: float = 4.0
frame_size
: int = 64
output_type
: typing.Optional[str] = 'pil'
return_dict
: bool = True
)
→
export const metadata = 'undefined';
ShapEPipelineOutput
or
tuple
Parameters
- image
(
torch.FloatTensor,PIL.Image.Image,np.ndarray,List[torch.FloatTensor],List[PIL.Image.Image], orList[np.ndarray]) —Imageor tensor representing an image batch to be used as the starting point. Can also accept image latents as image, but if passing latents directly it is not encoded again. - num_images_per_prompt
(
int, optional , defaults to 1) — The number of images to generate per prompt. - num_inference_steps
(
int, optional , defaults to 25) — The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference. - generator
(
torch.GeneratororList[torch.Generator], optional ) — Atorch.Generatorto make generation deterministic. - latents
(
torch.FloatTensor, optional ) — Pre-generated noisy latents sampled from a Gaussian distribution, to be used as inputs for image generation. Can be used to tweak the same generation with different prompts. If not provided, a latents tensor is generated by sampling using the supplied randomgenerator. - guidance_scale
(
float, optional , defaults to 4.0) — A higher guidance scale value encourages the model to generate images closely linked to the textpromptat the expense of lower image quality. Guidance scale is enabled whenguidance_scale > 1. - frame_size
(
int, optional , default to 64) — The width and height of each image frame of the generated 3D output. - output_type
(
str, optional , defaults to"pil") — The output format of the generated image. Choose between"pil"(PIL.Image.Image),"np"(np.array),"latent"(torch.Tensor), or mesh (MeshDecoderOutput). - return_dict
(
bool, optional , defaults toTrue) — Whether or not to return a ShapEPipelineOutput instead of a plain tuple.
Returns
export const metadata = 'undefined';
ShapEPipelineOutput
or
tuple
export const metadata = 'undefined';
If
return_dict
is
True
,
ShapEPipelineOutput
is returned,
otherwise a
tuple
is returned where the first element is a list with the generated images.
The call function to the pipeline for generation.
Examples:
>>> from PIL import Image
>>> import torch
>>> from diffusers import DiffusionPipeline
>>> from diffusers.utils import export_to_gif, load_image
>>> device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
>>> repo = "openai/shap-e-img2img"
>>> pipe = DiffusionPipeline.from_pretrained(repo, torch_dtype=torch.float16)
>>> pipe = pipe.to(device)
>>> guidance_scale = 3.0
>>> image_url = "https://hf.co/datasets/diffusers/docs-images/resolve/main/shap-e/corgi.png"
>>> image = load_image(image_url).convert("RGB")
>>> images = pipe(
... image,
... guidance_scale=guidance_scale,
... num_inference_steps=64,
... frame_size=256,
... ).images
>>> gif_path = export_to_gif(images[0], "corgi\_3d.gif")
ShapEPipelineOutput
class
diffusers.pipelines.shap_e.pipeline_shap_e.
ShapEPipelineOutput
[<
source
](https://github.com/huggingface/diffusers/blob/v0.23.0/src/diffusers/pipelines/shap_e/pipeline_shap_e.py#L67)
(
images
: typing.Union[typing.List[typing.List[PIL.Image.Image]], typing.List[typing.List[numpy.ndarray]]]
)
Parameters
- images
(
torch.FloatTensor) — A list of images for 3D rendering.
Output class for ShapEPipeline and ShapEImg2ImgPipeline .
