Transformer Temporal
译者:片刻小哥哥
项目地址:https://huggingface.apachecn.org/docs/diffusers/api/models/transformer_temporal
原始地址:https://huggingface.co/docs/diffusers/api/models/transformer_temporal
A Transformer model for video-like data.
TransformerTemporalModel
class
diffusers.models.
TransformerTemporalModel
[<
source
](https://github.com/huggingface/diffusers/blob/v0.23.0/src/diffusers/models/transformer_temporal.py#L39)
(
num_attention_heads
: int = 16
attention_head_dim
: int = 88
in_channels
: typing.Optional[int] = None
out_channels
: typing.Optional[int] = None
num_layers
: int = 1
dropout
: float = 0.0
norm_num_groups
: int = 32
cross_attention_dim
: typing.Optional[int] = None
attention_bias
: bool = False
sample_size
: typing.Optional[int] = None
activation_fn
: str = 'geglu'
norm_elementwise_affine
: bool = True
double_self_attention
: bool = True
positional_embeddings
: typing.Optional[str] = None
num_positional_embeddings
: typing.Optional[int] = None
)
Parameters
- num_attention_heads
(
int, optional , defaults to 16) — The number of heads to use for multi-head attention. - attention_head_dim
(
int, optional , defaults to 88) — The number of channels in each head. - in_channels
(
int, optional ) — The number of channels in the input and output (specify if the input is continuous ). - num_layers
(
int, optional , defaults to 1) — The number of layers of Transformer blocks to use. - dropout
(
float, optional , defaults to 0.0) — The dropout probability to use. - cross_attention_dim
(
int, optional ) — The number ofencoder_hidden_statesdimensions to use. - attention_bias
(
bool, optional ) — Configure if theTransformerBlockattention should contain a bias parameter. - sample_size
(
int, optional ) — The width of the latent images (specify if the input is discrete ). This is fixed during training since it is used to learn a number of position embeddings. - activation_fn
(
str, optional , defaults to"geglu") — Activation function to use in feed-forward. Seediffusers.models.activations.get_activationfor supported activation functions. - norm_elementwise_affine
(
bool, optional ) — Configure if theTransformerBlockshould use learnable elementwise affine parameters for normalization. - double_self_attention
(
bool, optional ) — Configure if eachTransformerBlockshould contain two self-attention layers. positional_embeddings — (str, optional ): The type of positional embeddings to apply to the sequence input before passing use. num_positional_embeddings — (int, optional ): The maximum length of the sequence over which to apply positional embeddings.
A Transformer model for video-like data.
forward
[<
source
](https://github.com/huggingface/diffusers/blob/v0.23.0/src/diffusers/models/transformer_temporal.py#L119)
(
hidden_states
: FloatTensor
encoder_hidden_states
: typing.Optional[torch.LongTensor] = None
timestep
: typing.Optional[torch.LongTensor] = None
class_labels
: LongTensor = None
num_frames
: int = 1
cross_attention_kwargs
: typing.Union[typing.Dict[str, typing.Any], NoneType] = None
return_dict
: bool = True
)
→
export const metadata = 'undefined';
TransformerTemporalModelOutput
or
tuple
Parameters
- hidden_states
(
torch.LongTensorof shape(batch size, num latent pixels)if discrete,torch.FloatTensorof shape(batch size, channel, height, width)if continuous) — Input hidden_states. - encoder_hidden_states
(
torch.LongTensorof shape(batch size, encoder_hidden_states dim), optional ) — Conditional embeddings for cross attention layer. If not given, cross-attention defaults to self-attention. - timestep
(
torch.LongTensor, optional ) — Used to indicate denoising step. Optional timestep to be applied as an embedding inAdaLayerNorm. - class_labels
(
torch.LongTensorof shape(batch size, num classes), optional ) — Used to indicate class labels conditioning. Optional class labels to be applied as an embedding inAdaLayerZeroNorm. - num_frames
(
int, optional , defaults to 1) — The number of frames to be processed per batch. This is used to reshape the hidden states. - cross_attention_kwargs
(
dict, optional ) — A kwargs dictionary that if specified is passed along to theAttentionProcessoras defined underself.processorin diffusers.models.attention_processor . - return_dict
(
bool, optional , defaults toTrue) — Whether or not to return a UNet2DConditionOutput instead of a plain tuple.
Returns
export const metadata = 'undefined';
TransformerTemporalModelOutput
or
tuple
export const metadata = 'undefined';
If
return_dict
is True, an
TransformerTemporalModelOutput
is
returned, otherwise a
tuple
where the first element is the sample tensor.
The
TransformerTemporal
forward method.
TransformerTemporalModelOutput
class
diffusers.models.transformer_temporal.
TransformerTemporalModelOutput
[<
source
](https://github.com/huggingface/diffusers/blob/v0.23.0/src/diffusers/models/transformer_temporal.py#L27)
(
sample
: FloatTensor
)
Parameters
- sample
(
torch.FloatTensorof shape(batch_size x num_frames, num_channels, height, width)) — The hidden states output conditioned onencoder_hidden_statesinput.
The output of
TransformerTemporalModel
.
