Transformer Temporal
译者:片刻小哥哥
项目地址:https://huggingface.apachecn.org/docs/diffusers/api/models/transformer_temporal
原始地址:https://huggingface.co/docs/diffusers/api/models/transformer_temporal
A Transformer model for video-like data.
TransformerTemporalModel
class
diffusers.models.
TransformerTemporalModel
[<
source
](https://github.com/huggingface/diffusers/blob/v0.23.0/src/diffusers/models/transformer_temporal.py#L39)
(
num_attention_heads
: int = 16
attention_head_dim
: int = 88
in_channels
: typing.Optional[int] = None
out_channels
: typing.Optional[int] = None
num_layers
: int = 1
dropout
: float = 0.0
norm_num_groups
: int = 32
cross_attention_dim
: typing.Optional[int] = None
attention_bias
: bool = False
sample_size
: typing.Optional[int] = None
activation_fn
: str = 'geglu'
norm_elementwise_affine
: bool = True
double_self_attention
: bool = True
positional_embeddings
: typing.Optional[str] = None
num_positional_embeddings
: typing.Optional[int] = None
)
Parameters
- num_attention_heads
(
int
, optional , defaults to 16) — The number of heads to use for multi-head attention. - attention_head_dim
(
int
, optional , defaults to 88) — The number of channels in each head. - in_channels
(
int
, optional ) — The number of channels in the input and output (specify if the input is continuous ). - num_layers
(
int
, optional , defaults to 1) — The number of layers of Transformer blocks to use. - dropout
(
float
, optional , defaults to 0.0) — The dropout probability to use. - cross_attention_dim
(
int
, optional ) — The number ofencoder_hidden_states
dimensions to use. - attention_bias
(
bool
, optional ) — Configure if theTransformerBlock
attention should contain a bias parameter. - sample_size
(
int
, optional ) — The width of the latent images (specify if the input is discrete ). This is fixed during training since it is used to learn a number of position embeddings. - activation_fn
(
str
, optional , defaults to"geglu"
) — Activation function to use in feed-forward. Seediffusers.models.activations.get_activation
for supported activation functions. - norm_elementwise_affine
(
bool
, optional ) — Configure if theTransformerBlock
should use learnable elementwise affine parameters for normalization. - double_self_attention
(
bool
, optional ) — Configure if eachTransformerBlock
should contain two self-attention layers. positional_embeddings — (str
, optional ): The type of positional embeddings to apply to the sequence input before passing use. num_positional_embeddings — (int
, optional ): The maximum length of the sequence over which to apply positional embeddings.
A Transformer model for video-like data.
forward
[<
source
](https://github.com/huggingface/diffusers/blob/v0.23.0/src/diffusers/models/transformer_temporal.py#L119)
(
hidden_states
: FloatTensor
encoder_hidden_states
: typing.Optional[torch.LongTensor] = None
timestep
: typing.Optional[torch.LongTensor] = None
class_labels
: LongTensor = None
num_frames
: int = 1
cross_attention_kwargs
: typing.Union[typing.Dict[str, typing.Any], NoneType] = None
return_dict
: bool = True
)
→
export const metadata = 'undefined';
TransformerTemporalModelOutput
or
tuple
Parameters
- hidden_states
(
torch.LongTensor
of shape(batch size, num latent pixels)
if discrete,torch.FloatTensor
of shape(batch size, channel, height, width)
if continuous) — Input hidden_states. - encoder_hidden_states
(
torch.LongTensor
of shape(batch size, encoder_hidden_states dim)
, optional ) — Conditional embeddings for cross attention layer. If not given, cross-attention defaults to self-attention. - timestep
(
torch.LongTensor
, optional ) — Used to indicate denoising step. Optional timestep to be applied as an embedding inAdaLayerNorm
. - class_labels
(
torch.LongTensor
of shape(batch size, num classes)
, optional ) — Used to indicate class labels conditioning. Optional class labels to be applied as an embedding inAdaLayerZeroNorm
. - num_frames
(
int
, optional , defaults to 1) — The number of frames to be processed per batch. This is used to reshape the hidden states. - cross_attention_kwargs
(
dict
, optional ) — A kwargs dictionary that if specified is passed along to theAttentionProcessor
as defined underself.processor
in diffusers.models.attention_processor . - return_dict
(
bool
, optional , defaults toTrue
) — Whether or not to return a UNet2DConditionOutput instead of a plain tuple.
Returns
export const metadata = 'undefined';
TransformerTemporalModelOutput
or
tuple
export const metadata = 'undefined';
If
return_dict
is True, an
TransformerTemporalModelOutput
is
returned, otherwise a
tuple
where the first element is the sample tensor.
The
TransformerTemporal
forward method.
TransformerTemporalModelOutput
class
diffusers.models.transformer_temporal.
TransformerTemporalModelOutput
[<
source
](https://github.com/huggingface/diffusers/blob/v0.23.0/src/diffusers/models/transformer_temporal.py#L27)
(
sample
: FloatTensor
)
Parameters
- sample
(
torch.FloatTensor
of shape(batch_size x num_frames, num_channels, height, width)
) — The hidden states output conditioned onencoder_hidden_states
input.
The output of
TransformerTemporalModel
.