aitemplate.frontend

Classes:

AvgPool2d(kernel_size, stride, padding)

Applies a 2D average pooling over an input signal composed of several input planes.

BertEmbeddings(hidden_size, vocab_size, ...)

Construct the embeddings from word, position and token_type embeddings.

Conv1d(in_channels, out_channels, kernel_size)

Conv1d module applies a 1D convolution over an input signal composed of several input planes.

Conv2d(in_channels, out_channels, ...[, ...])

Applies a 2D convolution over an input signal composed of several input planes.

Conv2dBias(in_channels, out_channels, ...[, ...])

Applies 2D convolution with bias.

Conv2dBiasAddHardswish(in_channels, ...[, ...])

Applies 2D convolution with bias + add + hardswish.

Conv2dBiasAddRelu(in_channels, out_channels, ...)

Applies 2D convolution with bias + add + relu.

Conv2dBiasFewChannels(in_channels, ...[, ...])

Applies 2D convolution with bias for few channels.

Conv2dBiasHardswish(in_channels, ...[, ...])

Applies 2D convolution with bias + hardswish.

Conv2dBiasHardswishFewChannels(in_channels, ...)

Applies 2D convolution with bias + hardswish for few channels.

Conv2dBiasRelu(in_channels, out_channels, ...)

Applies 2D convolution with bias + relu.

Conv2dBiasReluFewChannels(in_channels, ...)

Applies 2D convolution with bias + relu for few channels.

Conv2dBiasSigmoid(in_channels, out_channels, ...)

Applies 2D convolution with bias + sigmoid.

Conv2dDepthwise(in_channels, out_channels, ...)

Conv2dDepthwiseBias(in_channels, ...[, ...])

Conv3d(in_channels, out_channels, ...[, ...])

Applies a 3D convolution over an input signal composed of several input planes.

ConvTranspose2dBias(in_channels, ...[, ...])

Applies a 2D transposed convolution operator over an input image composed of several input planes.

ConvTranspose2dBiasRelu(in_channels, ...[, ...])

Applies a 2D transposed convolution with bias + relu.

CrossAttention(dim, seq_len, seq_len_kv, ...)

Cross Multi-head Attention.

DropPath([dtype])

DropPath placeholder

Dropout([p, dtype])

Dropout placeholder

Embedding(shape, dtype)

A simple lookup table that stores embeddings of a fixed dictionary and size.

FPNProposal(im_shape[, feat_strides, ...])

FPNRoiAlign(num_rois, pooled_size, ...)

Performs Multiple level Region of Interest (RoI) Align operator with average pooling, as described in Mask R-CNN.

FlashAttention(batch_size, max_seq_len[, ...])

FlashAttention provides an implementation for fused multi-head attention module:

Flatten([start_dim, end_dim])

Flattens input by reshaping it into a one-dimensional tensor.

GroupNorm(num_groups, num_channels[, eps, ...])

GroupNorm nn module

Identity([dtype])

The identity of the input.

LayerNorm(normalized_shape[, eps, dtype])

LayerNorm nn module

Linear(in_channels, out_channels[, bias, ...])

Applies a linear transformation to the incoming data: \(y = xA^T + b\)

MaxPool2d(kernel_size, stride[, padding])

Applies a 2D max pooling over an input signal composed of several input planes.

Module()

Base class for all neural network modules.

ModuleDict([modules])

Holds submodules in a dictionary.

ModuleList([modules])

Holds submodules in a list.

MultiScaleBlock(dim, dim_out, num_heads, ...)

Implementation of a multiscale vision transformer block.

MultiheadAttention(dim, batch_size, seq_len)

Multi-Head Attention.

Ndhwc3to8()

Pads the input data with ndhwc dimensions from 3 channels to 8 channels

Nhwc3to8()

Pads the input data with nhwc dimensions from 3 channels to 8 channels

Proposal(im_shape[, feat_stride, scales, ...])

Reshape()

Returns a tensor with the same data and number of elements as input, but with the specified shape.

RoiAlign(num_rois, pooled_size, ...)

Performs Region of Interest (RoI) Align operator with average pooling, as described in Mask R-CNN.

ScaledDotProductAttention()

Sequential()

A sequential container.

T5DenseGatedGeluDense(in_channels, out_channels)

T5DenseGatedGeluDense.

Tensor(shape[, name, src_ops, dst_ops, ...])

A Tensor represents a piece of data, which is used as an input / output of an Operator.

Upsampling2d(scale_factor, mode)

Applies a 2D bilinear upsampling to an input signal composed of several input channels.

Upsampling2dAdd(scale_factor, mode)

Applies Upsampling2d + add.

VanillaCrossAttention(dim, seq_len, ...[, ...])

Vanilla Cross Multi-head Attention.

VanillaMultiheadAttention(dim[, batch_size, ...])

Vanilla Multi-Head Attention.

View()

Placeholder for View layer.

avg_pool2d(kernel_size, stride, pad)

Applies a 2D average pooling over an input signal composed of several input planes.

conv3d(stride, pad[, dilate, group])

conv3d_bias(stride, pad[, dilate, group])

depthwise_conv3d(stride, pad[, dilate, ...])

flatten([start_dim, end_dim])

Flattens input by reshaping it into a one-dimensional tensor.

max_pool2d(kernel_size, stride, pad)

Applies a 2D max pooling over an input signal composed of several input planes.

multi_level_roi_align(num_rois, pooled_size, ...)

Performs Multiple level Region of Interest (RoI) Align operator with average pooling, as described in Mask R-CNN.

ndhwc3to8()

Pad the 3-channel input data to 8-channel.

nhwc3to8()

reshape()

Returns a tensor with the same data and number of elements as input, but with the specified shape.

roi_align(num_rois, pooled_size, ...)

Performs Region of Interest (RoI) Align operator with average pooling, as described in Mask R-CNN.

squeeze(dim)

Examines the specified dimension and gets rid of it if it is of size 1.

unsqueeze(dim)

Adds a dimension of size 1 at a specified location.

upsampling2d(scale_factor, mode)

Applies a 2D bilinear upsampling to an input signal composed of several input channels.

upsampling2d_add(scale_factor, mode)

Fused op for bilinear_upsampling + add.

Functions:

detect_target(**kwargs)

Detect GPU target based on nvidia-smi and rocminfo

vanilla_attention(q, k, v[, scale, attn_mask])

Vanilla attention in the most basic form. q,k,v: batch, seqlen, num_heads, head_dim Either batch or sequence dimension could be variable (but not both) attn_mask: attention mask is added to the attention, use 0 and -inf to mask a sequence index.

class aitemplate.frontend.nn.AvgPool2d(kernel_size, stride, padding)[source]

Applies a 2D average pooling over an input signal composed of several input planes.

In the simplest case, the output value of the layer with input size \((N, H, W, C)\), output \((N, H_{out}, W_{out}, C)\) and kernel_size \((kH, kW)\) can be precisely described as:

\[out(N_i, h, w, C_j) = \frac{1}{kH * kW} \sum_{m=0}^{kH-1} \sum_{n=0}^{kW-1} input(N_i, stride[0] \times h + m, stride[1] \times w + n, C_j)\]

If padding is non-zero, then the input is implicitly zero-padded on both sides for padding number of points.

Note

When ceil_mode=True, sliding windows are allowed to go off-bounds if they start within the left padding or the input. Sliding windows that would start in the right padded region are ignored.

Parameters:
  • kernel_size – the size of the window to take an avg over

  • stride – the stride of the window

  • padding – implicit zero padding to be added on both sides

Methods:

forward(*args)

Applies AvgPool2d on the input.

forward(*args)[source]

Applies AvgPool2d on the input.

class aitemplate.frontend.nn.BertEmbeddings(hidden_size, vocab_size, max_position_embeddings, type_vocab_size, layer_norm_eps, hidden_dropout_prob, dtype='float16')[source]

Construct the embeddings from word, position and token_type embeddings.

Methods:

forward(input_ids, token_type_ids, position_ids)

Defines the computation performed at every call.

forward(input_ids, token_type_ids, position_ids)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class aitemplate.frontend.nn.Conv1d(in_channels: int, out_channels: int, kernel_size: int, stride: int = 1, padding: int = 0, dilation: int = 1, groups: int = 1, dtype: str = 'float16', bias: bool = False, name: str = 'conv1d')[source]

Conv1d module applies a 1D convolution over an input signal composed of several input planes.

\[\text{out}\left(B_i, \text{:}, \text{channels\_out}_j\right) = \text{bias}\left(\text{channels\_out}_j\right) + \sum_{k = 0}^{\text{channels\_in} - 1} \text{weight}\left(\text{channels\_out}_j, \text{:}, k\right) \star \text{input}\left(B_i, \text{:}, k\right)\]

The semantics are similar to PyTorch with the following exception: dims 1 and 2 of the weight, input and output are swapped (while dim 0 remains the same).

Methods:

forward(x)

Applies Conv1d on the input tensor of shape \((B, \text{seq\_in}, \text{channels\_in})\). The output has shape \((B, \text{seq\_out}, \text{channels\_out})\), where .. math:: text{seq_out} = leftlfloorfrac{text{seq_in} + 2 times text{padding} - text{dilation} times (text{kernel_size} - 1) - 1}{text{stride}} + 1rightrfloor.

forward(x: Tensor) Tensor[source]

Applies Conv1d on the input tensor of shape \((B, \text{seq\_in}, \text{channels\_in})\). The output has shape \((B, \text{seq\_out}, \text{channels\_out})\), where .. math:

\text{seq\_out} = \left\lfloor\frac{\text{seq\_in} + 2 \times \text{padding} - \text{dilation}
                 \times (\text{kernel\_size} - 1) - 1}{\text{stride}} + 1\right\rfloor
class aitemplate.frontend.nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding=0, dilation=1, groups=1, dtype='float16')[source]

Applies a 2D convolution over an input signal composed of several input planes.

In the simplest case, the output value of the layer with input size \((N, H, W, C_{\text{in}})\) and output \((N, H_{\text{out}}, W_{\text{out}}, C_{\text{out}})\) can be precisely described as:

\[\text{out}(N_i, C_{\text{out}_j}) = \text{bias}(C_{\text{out}_j}) + \sum_{k = 0}^{C_{\text{in}} - 1} \text{weight}(C_{\text{out}_j}, k) \star \text{input}(N_i, k)\]

where \(\star\) is the valid 2D cross-correlation operator, \(N\) is a batch size, \(H\) is a height of input planes in pixels, \(W\) is width in pixels, and \(C\) denotes a number of channels.

  • stride controls the stride for the cross-correlation.

  • padding controls the amount of padding applied to the input.

  • dilation controls the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this link has a nice visualization of what dilation does.

Parameters:
  • in_channels (int) – Number of channels in the input image

  • out_channels (int) – Number of channels produced by the convolution

  • kernel_size (int) – Size of the convolving kernel

  • stride (int) – Stride of the convolution

  • padding (int, optional) – Padding added to all four sides of the input. Default: 0

  • dilation (int, optional) – Spacing between kernel elements. Default: 1

  • groups (int, optional) – Number of blocked connections from input channels to output channels. Default: 1

  • dtype (string, optional) – Data type. Default: “float16”

Shape:
  • Input: \((N, H_{in}, W_{in}, C_{in})\)

  • Output: \((N, H_{out}, W_{out}, C_{out})\), where

    \[H_{out} = \left\lfloor\frac{H_{in} + 2 \times \text{padding} - \text{dilation} \times (\text{kernel_size} - 1) - 1}{\text{stride}} + 1\right\rfloor\]
    \[W_{out} = \left\lfloor\frac{W_{in} + 2 \times \text{padding} - \text{dilation} \times (\text{kernel_size} - 1) - 1}{\text{stride}} + 1\right\rfloor\]
weight

the learnable weights of the module of shape \((\text{out_channels}, \text{kernel_size}, \text{kernel_size}, ` :math:\)frac{text{in_channels}}{text{groups}})`.

Type:

Tensor

Examples:

>>> m = nn.Conv2d(16, 33, 3, 2)
>>> input = Tensor(shape=[20, 50, 100, 16])
>>> output = m(input)

Methods:

forward(*args)

Applies Conv2d on the input tensor.

forward(*args)[source]

Applies Conv2d on the input tensor.

class aitemplate.frontend.nn.Conv2dBias(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, dtype='float16')[source]

Applies 2D convolution with bias.

Parameters:
  • in_channels (int) – Number of channels in the input image

  • out_channels (int) – Number of channels produced by the convolution

  • kernel_size (int) – Size of the convolving kernel

  • stride (int) – Stride of the convolution

  • padding (int, optional) – Padding added to all four sides of the input. Default: 0

  • dilation (int, optional) – Spacing between kernel elements. Default: 1

  • groups (int, optional) – Number of blocked connections from input channels to output channels. Default: 1

  • dtype (string, optional) – Data type. Default: “float16”

Shape:
  • Input: \((N, H_{in}, W_{in}, C_{in})\)

  • Output: \((N, H_{out}, W_{out}, C_{out})\), where

    \[H_{out} = \left\lfloor\frac{H_{in} + 2 \times \text{padding} - \text{dilation} \times (\text{kernel_size} - 1) - 1}{\text{stride}} + 1\right\rfloor\]
    \[W_{out} = \left\lfloor\frac{W_{in} + 2 \times \text{padding} - \text{dilation} \times (\text{kernel_size} - 1) - 1}{\text{stride}} + 1\right\rfloor\]
weight

the learnable weights of the module of shape \((\text{out_channels}, \text{kernel_size}, \text{kernel_size}, ` :math:\)frac{text{in_channels}}{text{groups}})`.

Type:

Tensor

bias

the learnable bias of the module of shape (out_channels).

Type:

Tensor

Examples:

>>> m = nn.Conv2d(16, 33, 3, 2)
>>> input = Tensor(shape=[20, 50, 100, 16])
>>> output = m(input)
class aitemplate.frontend.nn.Conv2dBiasAddHardswish(in_channels, out_channels, kernel_size, stride, padding=0, dilation=1, groups=1, dtype='float16')[source]

Applies 2D convolution with bias + add + hardswish.

weight

the learnable weights of the module of shape \((\text{out_channels}, \text{kernel_size}, \text{kernel_size}, ` :math:\)frac{text{in_channels}}{text{groups}})`.

Type:

Tensor

bias

the learnable bias of the module of shape (out_channels).

Type:

Tensor

Parameters:
  • input (Tensor) – the input tensor to apply 2D convolution on.

  • residual (Tensor) – the residule tensor to add after Conv2dBias.

Examples:

>>> m = nn.Conv2dBiasAddRelu(128, 256, 3, 1)
>>> input = Tensor(shape=[4, 28, 28, 128])
>>> residual = Tensor(shape=[4, 28, 28, 256])
>>> output = m(input, residual)
class aitemplate.frontend.nn.Conv2dBiasAddRelu(in_channels, out_channels, kernel_size, stride, padding=0, dilation=1, groups=1, dtype='float16')[source]

Applies 2D convolution with bias + add + relu.

weight

the learnable weights of the module of shape \((\text{out_channels}, \text{kernel_size}, \text{kernel_size}, ` :math:\)frac{text{in_channels}}{text{groups}})`.

Type:

Tensor

bias

the learnable bias of the module of shape (out_channels).

Type:

Tensor

Parameters:
  • input (Tensor) – the input tensor to apply 2D convolution on.

  • residual (Tensor) – the residule tensor to add after Conv2dBias.

Examples:

>>> m = nn.Conv2dBiasAddRelu(128, 256, 3, 1)
>>> input = Tensor(shape=[4, 28, 28, 128])
>>> residual = Tensor(shape=[4, 28, 28, 256])
>>> output = m(input, residual)
class aitemplate.frontend.nn.Conv2dBiasFewChannels(in_channels, out_channels, kernel_size, stride, padding=0, dilation=1, auto_padding=True, dtype='float16')[source]

Applies 2D convolution with bias for few channels.

This layer equals to Conv2dBias but has improved performance for in_channels < 8.

class aitemplate.frontend.nn.Conv2dBiasHardswish(in_channels, out_channels, kernel_size, stride, padding=0, dilation=1, groups=1, dtype='float16')[source]

Applies 2D convolution with bias + hardswish.

class aitemplate.frontend.nn.Conv2dBiasHardswishFewChannels(in_channels, out_channels, kernel_size, stride, padding=0, dilation=1, auto_padding=True, dtype='float16')[source]

Applies 2D convolution with bias + hardswish for few channels.

This layer equals to Conv2dBiasHardswish but has improved performance for in_channels < 8.

class aitemplate.frontend.nn.Conv2dBiasRelu(in_channels, out_channels, kernel_size, stride, padding=0, dilation=1, groups=1, dtype='float16')[source]

Applies 2D convolution with bias + relu.

class aitemplate.frontend.nn.Conv2dBiasReluFewChannels(in_channels, out_channels, kernel_size, stride, padding=0, dilation=1, auto_padding=True, dtype='float16')[source]

Applies 2D convolution with bias + relu for few channels.

This layer equals to Conv2dBiasRelu but has improved performance for in_channels < 8.

class aitemplate.frontend.nn.Conv2dBiasSigmoid(in_channels, out_channels, kernel_size, stride, padding=0, dilation=1, groups=1, dtype='float16')[source]

Applies 2D convolution with bias + sigmoid.

class aitemplate.frontend.nn.Conv2dDepthwise(in_channels, out_channels, kernel_size, stride, padding=0, dilation=1, groups=1, dtype='float16')[source]
class aitemplate.frontend.nn.Conv2dDepthwiseBias(in_channels, out_channels, kernel_size, stride, padding=0, dilation=1, groups=1, dtype='float16')[source]
class aitemplate.frontend.nn.Conv3d(in_channels, out_channels, kernel_size, stride, padding=0, dilation=1, groups=1, dtype='float16', bias=False)[source]

Applies a 3D convolution over an input signal composed of several input planes.

  • stride controls the stride for the cross-correlation.

  • padding controls the amount of padding applied to the input.

  • dilation controls the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this link has a nice visualization of what dilation does.

Parameters:
  • in_channels (int) – Number of channels in the input image

  • out_channels (int) – Number of channels produced by the convolution

  • kernel_size (int or Tuple(int)) – Size of the convolving kernel

  • stride (int or Tuple(int)) – Stride of the convolution

  • padding (int or Tuple(int), optional) – Padding added to all four sides of the input. Default: 0

  • dilation (int or Tuple(int), optional) – Spacing between kernel elements. Default: 1

  • groups (int, optional) – Number of blocked connections from input channels to output channels. Default: 1

  • dtype (string, optional) – Data type. Default: “float16”

  • bias (bool, optional) – Has bias or not. Default: False (Note that we only support bias for depthwise_conv3d for now)

Shape:
  • Input: \((N, D_{in}, H_{in}, W_{in}, C_{in})\)

  • Output: \((N, D_{out}, H_{out}, W_{out}, C_{out})\), where

    \[D_{out} = \left\lfloor\frac{D_{in} + 2 \times \text{padding} - \text{dilation} \times (\text{kernel_size} - 1) - 1}{\text{stride}} + 1\right\rfloor\]
    \[H_{out} = \left\lfloor\frac{H_{in} + 2 \times \text{padding} - \text{dilation} \times (\text{kernel_size} - 1) - 1}{\text{stride}} + 1\right\rfloor\]
    \[W_{out} = \left\lfloor\frac{W_{in} + 2 \times \text{padding} - \text{dilation} \times (\text{kernel_size} - 1) - 1}{\text{stride}} + 1\right\rfloor\]
weight

the learnable weights of the module of shape \((\text{out_channels}, \text{kernel_size}[0], \text{kernel_size}[1], \text{kernel_size}[2], ` :math:\)frac{text{in_channels}}{text{groups}})`.

Type:

Tensor

Examples:

>>> m = nn.Conv3d(16, 33, 3, 2)
>>> input = Tensor(shape=[20, 50, 100, 100, 16])
>>> output = m(input)

Methods:

forward(*args)

Applies Conv3d on the input tensor.

forward(*args)[source]

Applies Conv3d on the input tensor.

class aitemplate.frontend.nn.ConvTranspose2dBias(in_channels, out_channels, kernel_size, stride, padding=0, dilation=1, groups=1, dtype='float16')[source]

Applies a 2D transposed convolution operator over an input image composed of several input planes.

This module can be seen as the gradient of Conv2d with respect to its input. It is also known as a fractionally-strided convolution or a deconvolution (although it is not an actual deconvolution operation as it does not compute a true inverse of convolution). For more information, see the visualizations here and the Deconvolutional Networks paper.

Parameters:
  • in_channels (int) – Number of channels in the input image

  • out_channels (int) – Number of channels produced by the convolution

  • kernel_size (int) – Size of the convolving kernel

  • stride (int) – Stride of the convolution

  • padding (int, optional) – Padding added to all four sides of the input. Default: 0

  • dilation (int, optional) – Spacing between kernel elements. Default: 1

  • groups (int, optional) – Number of blocked connections from input channels to output channels. Default: 1

  • dtype (string, optional) – Data type. Default: “float16”

Shape:
  • Input: \((N, H_{in}, W_{in}, C_{in})\)

  • Output: \((N, H_{out}, W_{out}, C_{out})\), where

    \[H_{out} = (H_{in} - 1) \times \text{stride} - 2 \times \text{padding} + \text{dilation} \times (\text{kernel_size} - 1) + \text{output_padding} + 1\]
    \[W_{out} = (W_{in} - 1) \times \text{stride} - 2 \times \text{padding} + \text{dilation} \times (\text{kernel_size} - 1) + \text{output_padding} + 1\]
weight

the learnable weights of the module of shape \((\text{out_channels}, \text{kernel_size}, \text{kernel_size}, ` :math:\)frac{text{in_channels}}{text{groups}})`.

Type:

Tensor

bias

the learnable bias of the module of shape (out_channels).

Type:

Tensor

class aitemplate.frontend.nn.ConvTranspose2dBiasRelu(in_channels, out_channels, kernel_size, stride, padding=0, dilation=1, groups=1, dtype='float16')[source]

Applies a 2D transposed convolution with bias + relu.

class aitemplate.frontend.nn.CrossAttention(dim, seq_len, seq_len_kv, num_heads, qkv_bias=False, attn_drop=0.0, proj_drop=0.0, has_residual=True, causal=False, dtype='float16')[source]

Cross Multi-head Attention.

Allows the model to jointly attend to information from different representation subspaces as described in the paper: Attention Is All You Need.

Multi-Head Attention is defined as:

\[\text{MultiHead}(Q, K, V) = \text{Concat}(head_1,\dots,head_h)W^O\]

where \(head_i = \text{Attention}(QW_i^Q, KW_i^K, VW_i^V)\).

Parameters:
  • dim – total dimension of the model

  • batch_size – batch size

  • seq_len – sequence length

  • num_heads – Number of parallel attention heads. Default: 8

  • qkv_bias – whether to add bias to QKV. Default: False

  • attn_drop – Dropout probability on attention output weights. Default: 0.0 (no dropout).

  • proj_drop – Dropout probability on projection layers. Default: 0.0 (no dropout).

  • has_residual – has or has no residual. Default: True.

  • causal – default: False.

  • mask_seq – sequence mask, default: 0.

Methods:

forward(*args)

forward pass for calling mha module

forward(*args)[source]

forward pass for calling mha module

class aitemplate.frontend.nn.DropPath(dtype='float16')[source]

DropPath placeholder

class aitemplate.frontend.nn.Dropout(p=0, dtype='float16')[source]

Dropout placeholder

Methods:

forward(*args)

Not implemented.

forward(*args)[source]

Not implemented.

class aitemplate.frontend.nn.Embedding(shape, dtype)[source]

A simple lookup table that stores embeddings of a fixed dictionary and size.

This module is often used to store word embeddings and retrieve them using indices. The input to the module is a list of indices, and the output is the corresponding word embeddings.

Parameters:
  • shape (List[int]) – denotes the shape of the embeddings which is typically [num_embeddings, embedding_dim] where num_embeddings is the size of the dictionary of embeddings, and embedding_dim is the size of each embedding vector.

  • dtype (string) – denotes the data type

class aitemplate.frontend.nn.FPNProposal(im_shape, feat_strides=(4, 8, 16, 32, 64), scales=((32,), (64,), (128,), (256,), (512,)), ratios=(0.5, 1, 2), clip_box=True, nms_on=True, rpn_pre_nms_top_n=6000, rpn_post_nms_top_n=300, iou_threshold=0.3, rpn_min_size=0, level=-1, batch_size=1, dtype='float16')[source]

Methods:

forward(*args)

Defines the computation performed at every call.

forward(*args)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class aitemplate.frontend.nn.FPNRoiAlign(num_rois, pooled_size, sampling_ratio, spatial_scale, position_sensitive, continuous_coordinate, im_shape)[source]

Performs Multiple level Region of Interest (RoI) Align operator with average pooling, as described in Mask R-CNN.

  • num_rois identifies the number of RoIs in the input.

  • pooled_size identifies the size of the pooling section, i.e., the size of the output (in bins or pixels) after the pooling is performed, as (height, width).

  • sampling_ratio is the number of sampling points in the interpolation grid used to compute the output value of each pooled output bin. If > 0, then exactly sampling_ratio x sampling_ratio sampling points per bin are used. If <= 0, then an adaptive number of grid points are used (computed as ceil(roi_width / output_width), and likewise for height).

  • spatial_scale is a scaling factor that maps the box coordinates to the input coordinates. For example, if your boxes are defined on the scale of a 224x224 image and your input is a 112x112 feature map (resulting from a 0.5x scaling of the original image), you’ll want to set this to 0.5.

  • position_sensitive, a bool value.

  • continuous_coordinate, a bool value.

  • im_shape, original image shape.

Parameters:
  • p1 (Tensor[N, H//4, W//4, C]) – the feature map, i.e. a batch with N elements. Each element contains C feature maps of dimensions (H//4) x (W//4).

  • p2 (Tensor[N, H//8, W//8, C]) – the feature map, i.e. a batch with N elements. Each element contains C feature maps of dimensions (H//8) x (W//8).

  • p3 (Tensor[N, H//16, W//16, C]) – the feature map, i.e. a batch with N elements. Each element contains C feature maps of dimensions (H//16) x (W//16).

  • p4 (Tensor[N, H//32, W//32, C]) – the feature map, i.e. a batch with N elements. Each element contains C feature maps of dimensions (H//32) x (W//32).

  • rois (Tensor[roi_batch, 5]) – the list of RoIs and each ROI contains the index of the corresponding element in the batch, i.e. a number in [0, N - 1], and the box coordinates in (x1, y1, x2, y2) format where the regions will be taken from. The coordinate must satisfy 0 <= x1 < x2 and 0 <= y1 < y2.

Returns:

the fixed-size feature maps, i.e., the pooled RoIs.

Return type:

Tensor[num_rois * N, pooled_size, pooled_size, C]

Methods:

forward(*args)

Performs Multi Level RoiAlign on the input.

forward(*args)[source]

Performs Multi Level RoiAlign on the input.

class aitemplate.frontend.nn.FlashAttention(batch_size, max_seq_len, dropout=0, causal=False, dtype='float16')[source]

FlashAttention provides an implementation for fused multi-head attention module:

\[\text{Attention}(Q, K, V) = \text{softmax}(\frac{QK}{\sqrt(d)}) * V\]
\[\text{MultiHead}(Q, K, V) = \text{Concat}(head_1,\dots,head_h)W^O\]

where \(head_i = \text{Attention}(QW_i^Q, KW_i^K, VW_i^V)\).

Methods:

forward(*args)

forward pass for calling attention op

forward(*args)[source]

forward pass for calling attention op

class aitemplate.frontend.nn.Flatten(start_dim=0, end_dim=-1)[source]

Flattens input by reshaping it into a one-dimensional tensor. If start_dim or end_dim are passed, only dimensions starting with start_dim and ending with end_dim are flattened. The order of elements in input is unchanged.

Methods:

forward(*args)

Flattens the input with specified start and end dims.

forward(*args)[source]

Flattens the input with specified start and end dims.

class aitemplate.frontend.nn.GroupNorm(num_groups, num_channels, eps=1e-05, affine=True, dtype='float16', use_swish=False, **kwargs)[source]

GroupNorm nn module

Methods:

forward(*args)

Defines the computation performed at every call.

forward(*args)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class aitemplate.frontend.nn.Identity(dtype='float16')[source]

The identity of the input.

Methods:

forward(*args)

Defines the computation performed at every call.

forward(*args)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class aitemplate.frontend.nn.LayerNorm(normalized_shape, eps=1e-05, dtype='float16', **kwargs)[source]

LayerNorm nn module

Methods:

forward(*args)

Defines the computation performed at every call.

forward(*args)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class aitemplate.frontend.nn.Linear(in_channels, out_channels, bias=True, specialization=None, dtype='float16', **kwargs)[source]

Applies a linear transformation to the incoming data: \(y = xA^T + b\)

Parameters:
  • in_channels – size of each input sample

  • out_channels – size of each output sample

  • bias – If set to False, the layer will not learn an additive bias. Default: True

  • specialization – elementwise operation to add after the linear operation, Default: None

  • dtype – data type, default: float16

Shape:

  • Input: \((*, H_{in})\) where \(*\) means any number of dimensions including none and \(H_{in} = \text{in_channels}\).

  • Output: \((*, H_{out})\) where all but the last dimension are the same shape as the input and \(H_{out} = \text{out_channels}\).

weight

the learnable weights of the module of shape \((\text{out_channels}, \text{in_channels})\). The values are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\), where \(k = \frac{1}{\text{in\_channels}}\)

bias

the learnable bias of the module of shape \((\text{out_channels})\). If bias is True, the values are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{1}{\text{in_channels}}\)

Examples:

>>> m = nn.Linear(20, 30)
>>> input = Tensor(shape=[128, 20])
>>> output = m(input)
Tensor(shape=[128, 30])

Methods:

forward(*args)

Defines the computation performed at every call.

forward(*args)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class aitemplate.frontend.nn.MaxPool2d(kernel_size, stride, padding=0)[source]

Applies a 2D max pooling over an input signal composed of several input planes.

In the simplest case, the output value of the layer with input size \((N, H, W, C)\), output \((N, H_{out}, W_{out}, C)\) and kernel_size \((kH, kW)\) can be precisely described as:

\[\begin{split}\begin{aligned} out(N_i, h, w, C_j) ={} & \max_{m=0, \ldots, kH-1} \max_{n=0, \ldots, kW-1} \\ & \text{input}(N_i, \text{stride[0]} \times h + m, \text{stride[1]} \times w + n, C_j) \end{aligned}\end{split}\]

If padding is non-zero, then the input is implicitly padded with negative infinity on both sides for padding number of points.

Parameters:
  • kernel_size – the size of the window to take a max over

  • stride – the stride of the window

  • padding – implicit zero padding to be added on both sides

Methods:

forward(*args)

Applies MaxPool2d on the input.

forward(*args)[source]

Applies MaxPool2d on the input.

class aitemplate.frontend.nn.Module[source]

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes:

import nn as nn
import nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call to(), etc.

Note

As per the example above, an __init__() call to the parent class must be made before assignment on the child.

Variables:

training (bool) – Boolean represents whether this module is in training or evaluation mode.

Methods:

add_module(name, module)

Adds a child module to the current module.

buffers([recurse])

Returns an iterator over module buffers.

children()

Returns an iterator over immediate children modules.

extra_repr()

Set the extra representation of the module

forward(*input)

Defines the computation performed at every call.

get_buffer(target)

Returns the buffer given by target if it exists, otherwise throws an error.

get_parameter(target)

Returns the parameter given by target if it exists, otherwise throws an error.

get_submodule(target)

Returns the submodule given by target if it exists, otherwise throws an error.

modules()

Returns an iterator over all modules in the network.

name_parameter_tensor()

Set the name of the parameter to tensor's name

named_buffers([prefix, recurse])

Returns an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.

named_children()

Returns an iterator over immediate children modules, yielding both the name of the module as well as the module itself.

named_modules([memo, prefix, remove_duplicate])

Returns an iterator over all modules in the network, yielding both the name of the module as well as the module itself.

named_parameters([prefix, recurse])

Returns an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.

parameters([recurse])

Returns an iterator over module parameters.

register_buffer(name, tensor[, persistent])

Adds a buffer to the module.

register_module(name, module)

Alias for add_module().

register_parameter(name, param)

Adds a parameter to the module.

add_module(name: str, module: Optional[Module]) None[source]

Adds a child module to the current module.

The module can be accessed as an attribute using the given name.

Parameters:
  • name (str) – name of the child module. The child module can be accessed from this module using the given name

  • module (Module) – child module to be added to the module.

buffers(recurse: bool = True) Iterator[Tensor][source]

Returns an iterator over module buffers.

Parameters:

recurse (bool) – if True, then yields buffers of this module and all submodules. Otherwise, yields only buffers that are direct members of this module.

Yields:

Tensor – module buffer

Example:

>>> for buf in model.buffers():
>>>     print(type(buf), buf.size())
<class 'Tensor'> (20L,)
<class 'Tensor'> (20L, 1L, 5L, 5L)
children() Iterator[Module][source]

Returns an iterator over immediate children modules.

Yields:

Module – a child module

extra_repr() str[source]

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

forward(*input: Any) None

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

get_buffer(target: str) Tensor[source]

Returns the buffer given by target if it exists, otherwise throws an error.

See the docstring for get_submodule for a more detailed explanation of this method’s functionality as well as how to correctly specify target.

Parameters:

target – The fully-qualified string name of the buffer to look for. (See get_submodule for how to specify a fully-qualified string.)

Returns:

The buffer referenced by target

Return type:

Tensor

Raises:

AttributeError – If the target string references an invalid path or resolves to something that is not a buffer

get_parameter(target: str) Parameter[source]

Returns the parameter given by target if it exists, otherwise throws an error.

See the docstring for get_submodule for a more detailed explanation of this method’s functionality as well as how to correctly specify target.

Parameters:

target – The fully-qualified string name of the Parameter to look for. (See get_submodule for how to specify a fully-qualified string.)

Returns:

The Parameter referenced by target

Return type:

nn.Parameter

Raises:

AttributeError – If the target string references an invalid path or resolves to something that is not an nn.Parameter

get_submodule(target: str) Module[source]

Returns the submodule given by target if it exists, otherwise throws an error.

For example, let’s say you have an nn.Module A that looks like this:

A(
    (net_b): Module(
    (net_c): Module(
        (conv): Conv2d(16, 33, kernel_size=(3, 3), stride=(2, 2))
    )
    (linear): Linear(in_features=100, out_features=200, bias=True)
    )
)

(The diagram shows an nn.Module A. A has a nested submodule net_b, which itself has two submodules net_c and linear. net_c then has a submodule conv.)

To check whether or not we have the linear submodule, we would call get_submodule("net_b.linear"). To check whether we have the conv submodule, we would call get_submodule("net_b.net_c.conv").

The runtime of get_submodule is bounded by the degree of module nesting in target. A query against named_modules achieves the same result, but it is O(N) in the number of transitive modules. So, for a simple check to see if some submodule exists, get_submodule should always be used.

Parameters:

target – The fully-qualified string name of the submodule to look for. (See above example for how to specify a fully-qualified string.)

Returns:

The submodule referenced by target

Return type:

nn.Module

Raises:

AttributeError – If the target string references an invalid path or resolves to something that is not an nn.Module

modules() Iterator[Module][source]

Returns an iterator over all modules in the network.

Yields:

Module – a module in the network

Note

Duplicate modules are returned only once. In the following example, l will be returned only once.

Example:

>>> l = nn.Linear(2, 2)
>>> net = nn.Sequential(l, l)
>>> for idx, m in enumerate(net.modules()):
        print(idx, '->', m)

0 -> Sequential(
  (0): Linear(in_features=2, out_features=2, bias=True)
  (1): Linear(in_features=2, out_features=2, bias=True)
)
1 -> Linear(in_features=2, out_features=2, bias=True)
name_parameter_tensor()[source]

Set the name of the parameter to tensor’s name

named_buffers(prefix: str = '', recurse: bool = True) Iterator[Tuple[str, Tensor]][source]

Returns an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.

Parameters:
  • prefix (str) – prefix to prepend to all buffer names.

  • recurse (bool) – if True, then yields buffers of this module and all submodules. Otherwise, yields only buffers that are direct members of this module.

Yields:

(str, Tensor) – Tuple containing the name and buffer

Example:

>>> for name, buf in self.named_buffers():
>>>    if name in ['running_var']:
>>>        print(buf.size())
named_children() Iterator[Tuple[str, Module]][source]

Returns an iterator over immediate children modules, yielding both the name of the module as well as the module itself.

Yields:

(str, Module) – Tuple containing a name and child module

Example:

>>> for name, module in model.named_children():
>>>     if name in ['conv4', 'conv5']:
>>>         print(module)
named_modules(memo: Optional[Set[Module]] = None, prefix: str = '', remove_duplicate: bool = True)[source]

Returns an iterator over all modules in the network, yielding both the name of the module as well as the module itself.

Parameters:
  • memo – a memo to store the set of modules already added to the result

  • prefix – a prefix that will be added to the name of the module

  • remove_duplicate – whether to remove the duplicated module instances in the result or not

Yields:

(str, Module) – Tuple of name and module

Note

Duplicate modules are returned only once. In the following example, l will be returned only once.

Example:

>>> l = nn.Linear(2, 2)
>>> net = nn.Sequential(l, l)
>>> for idx, m in enumerate(net.named_modules()):
        print(idx, '->', m)

0 -> ('', Sequential(
  (0): Linear(in_features=2, out_features=2, bias=True)
  (1): Linear(in_features=2, out_features=2, bias=True)
))
1 -> ('0', Linear(in_features=2, out_features=2, bias=True))
named_parameters(prefix: str = '', recurse: bool = True) Iterator[Tuple[str, Parameter]][source]

Returns an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.

Parameters:
  • prefix (str) – prefix to prepend to all parameter names.

  • recurse (bool) – if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module.

Yields:

(str, Parameter) – Tuple containing the name and parameter

Example:

>>> for name, param in self.named_parameters():
>>>    if name in ['bias']:
>>>        print(param.size())
parameters(recurse: bool = True) Iterator[Parameter][source]

Returns an iterator over module parameters.

This is typically passed to an optimizer.

Parameters:

recurse (bool) – if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module.

Yields:

Parameter – module parameter

Example:

>>> for param in model.parameters():
>>>     print(type(param), param.size())
<class 'Tensor'> (20L,)
<class 'Tensor'> (20L, 1L, 5L, 5L)
register_buffer(name: str, tensor: Optional[Tensor], persistent: bool = True) None[source]

Adds a buffer to the module.

This is typically used to register a buffer that should not to be considered a model parameter. For example, BatchNorm’s running_mean is not a parameter, but is part of the module’s state. Buffers, by default, are persistent and will be saved alongside parameters. This behavior can be changed by setting persistent to False. The only difference between a persistent buffer and a non-persistent buffer is that the latter will not be a part of this module’s state_dict.

Buffers can be accessed as attributes using given names.

Parameters:
  • name (str) – name of the buffer. The buffer can be accessed from this module using the given name

  • tensor (Tensor or None) – buffer to be registered. If None, then operations that run on buffers, such as cuda, are ignored. If None, the buffer is not included in the module’s state_dict.

  • persistent (bool) – whether the buffer is part of this module’s state_dict.

Example:

>>> self.register_buffer('running_mean', zeros(num_features))
register_module(name: str, module: Optional[Module]) None[source]

Alias for add_module().

register_parameter(name: str, param: Optional[Parameter]) None[source]

Adds a parameter to the module.

The parameter can be accessed as an attribute using given name.

Parameters:
  • name (str) – name of the parameter. The parameter can be accessed from this module using the given name

  • param (Parameter or None) – parameter to be added to the module. If None, then operations that run on parameters, such as cuda, are ignored. If None, the parameter is not included in the module’s state_dict.

class aitemplate.frontend.nn.ModuleDict(modules: Optional[Mapping[str, Module]] = None)[source]

Holds submodules in a dictionary.

ModuleDict can be indexed like a regular Python dictionary, but modules it contains are properly registered, and will be visible by all Module methods.

ModuleDict is an ordered dictionary that respects

  • the order of insertion, and

  • in update(), the order of the merged OrderedDict, dict (started from Python 3.6) or another ModuleDict (the argument to update()).

Note that update() with other unordered mapping types (e.g., Python’s plain dict before Python version 3.6) does not preserve the order of the merged mapping.

Parameters:

modules (iterable, optional) – a mapping (dictionary) of (string: module) or an iterable of key-value pairs of type (string, module)

Example:

class MyModule(nn.Module):
    def __init__(self):
        super(MyModule, self).__init__()
        self.choices = nn.ModuleDict({
                'conv': nn.Conv2d(10, 10, 3),
                'pool': nn.MaxPool2d(3)
        })
        self.activations = nn.ModuleDict([
                ['lrelu', nn.LeakyReLU()],
                ['prelu', nn.PReLU()]
        ])

    def forward(self, x, choice, act):
        x = self.choices[choice](x)
        x = self.activations[act](x)
        return x

Methods:

clear()

Remove all items from the ModuleDict.

items()

Return an iterable of the ModuleDict key/value pairs.

keys()

Return an iterable of the ModuleDict keys.

pop(key)

Remove key from the ModuleDict and return its module.

update(modules)

Update the ModuleDict with the key-value pairs from a mapping or an iterable, overwriting existing keys.

values()

Return an iterable of the ModuleDict values.

clear() None[source]

Remove all items from the ModuleDict.

items() Iterable[Tuple[str, Module]][source]

Return an iterable of the ModuleDict key/value pairs.

keys() Iterable[str][source]

Return an iterable of the ModuleDict keys.

pop(key: str) Module[source]

Remove key from the ModuleDict and return its module.

Parameters:

key (str) – key to pop from the ModuleDict

update(modules: Mapping[str, Module]) None[source]

Update the ModuleDict with the key-value pairs from a mapping or an iterable, overwriting existing keys.

Note

If modules is an OrderedDict, a ModuleDict, or an iterable of key-value pairs, the order of new elements in it is preserved.

Parameters:

modules (iterable) – a mapping (dictionary) from string to Module, or an iterable of key-value pairs of type (string, Module)

values() Iterable[Module][source]

Return an iterable of the ModuleDict values.

class aitemplate.frontend.nn.ModuleList(modules: Optional[Iterable[Module]] = None)[source]

Holds submodules in a list.

ModuleList can be indexed like a regular Python list, but modules it contains are properly registered, and will be visible by all Module methods.

Parameters:

modules (iterable, optional) – an iterable of modules to add

Example:

class MyModule(nn.Module):
    def __init__(self):
        super(MyModule, self).__init__()
        self.linears = nn.ModuleList([nn.Linear(10, 10) for i in range(10)])

    def forward(self, x):
        # ModuleList can act as an iterable, or be indexed using ints
        for i, l in enumerate(self.linears):
            x = self.linears[i // 2](x) + l(x)
        return x

Methods:

append(module)

Appends a given module to the end of the list.

extend(modules)

Appends modules from a Python iterable to the end of the list.

insert(index, module)

Insert a given module before a given index in the list.

append(module: Module) ModuleList[source]

Appends a given module to the end of the list.

Parameters:

module (nn.Module) – module to append

extend(modules: Iterable[Module]) ModuleList[source]

Appends modules from a Python iterable to the end of the list.

Parameters:

modules (iterable) – iterable of modules to append

insert(index: int, module: Module) None[source]

Insert a given module before a given index in the list.

Parameters:
  • index (int) – index to insert.

  • module (nn.Module) – module to insert

class aitemplate.frontend.nn.MultiScaleBlock(dim: int, dim_out: int, num_heads: int, seq_len: int, mlp_ratio: float = 4.0, qkv_bias: bool = False, dropout_rate: float = 0.0, droppath_rate: float = 0.0, act_layer: ~aitemplate.frontend.nn.module.Module = <class 'aitemplate.frontend.nn.activation.GELU'>, norm_layer: ~aitemplate.frontend.nn.module.Module = <class 'aitemplate.frontend.nn.layer_norm.LayerNorm'>, attn_norm_layer: ~aitemplate.frontend.nn.module.Module = <class 'aitemplate.frontend.nn.layer_norm.LayerNorm'>, kernel_q=(1, 1, 1), kernel_kv=(1, 1, 1), stride_q=(1, 1, 1), stride_kv=(1, 1, 1), pool_mode: str = 'conv', has_cls_embed: bool = True, pool_first: bool = False, residual_pool: bool = False, depthwise_conv: bool = True, bias_on: bool = True, separate_qkv: bool = False)[source]

Implementation of a multiscale vision transformer block. Each block contains a multiscale attention layer and a Mlp layer.

      Input
        |-------------------+
        ↓                   |
       Norm                 |
        ↓                   |
MultiScaleAttention        Pool
        ↓                   |
     DropPath               |
        ↓                   |
    Summation ←-------------+
        |
        |-------------------+
        ↓                   |
       Norm                 |
        ↓                   |
       Mlp                 Proj
        ↓                   |
     DropPath               |
        ↓                   |
    Summation  ←------------+

Methods:

forward(x, t_shape, h_shape, w_shape)

param x:

Input tensor.

forward(x: Tensor, t_shape: int, h_shape: int, w_shape: int) Tuple[Tensor, List[int]][source]
Parameters:
  • x (Tensor) – Input tensor.

  • thw_shape (List) – The shape of the input tensor (before flattening).

class aitemplate.frontend.nn.MultiheadAttention(dim, batch_size, seq_len, num_heads=8, qkv_bias=False, attn_drop=0.0, proj_drop=0.0, has_residual=True, causal=False, mask_seq=0, use_mem_eff=False, dtype='float16')[source]

Multi-Head Attention.

Allows the model to jointly attend to information from different representation subspaces as described in the paper: Attention Is All You Need.

Multi-Head Attention is defined as: .. math:

\text{MultiHead}(Q, K, V) = \text{Concat}(head_1,\dots,head_h)W^O

where \(head_i = \text{Attention}(QW_i^Q, KW_i^K, VW_i^V)\).

Parameters:
  • dim – total dimension of the model

  • batch_size – batch size

  • seq_len – sequence length

  • num_heads – Number of parallel attention heads. Default: 8

  • qkv_bias – whether to add bias to QKV. Default: False

  • attn_drop – Dropout probability on attention output weights. Default: 0.0 (no dropout).

  • proj_drop – Dropout probability on projection layers. Default: 0.0 (no dropout).

  • has_residual – has or has no residual. Default: True.

  • causal – default: False.

  • mask_seq – sequence mask, default: 0.

Methods:

forward(*args)

forward pass for calling mha module

forward(*args)[source]

forward pass for calling mha module

class aitemplate.frontend.nn.Ndhwc3to8[source]

Pads the input data with ndhwc dimensions from 3 channels to 8 channels

Methods:

forward(*args)

Defines the computation performed at every call.

forward(*args)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class aitemplate.frontend.nn.Nhwc3to8[source]

Pads the input data with nhwc dimensions from 3 channels to 8 channels

Methods:

forward(*args)

Defines the computation performed at every call.

forward(*args)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class aitemplate.frontend.nn.Proposal(im_shape, feat_stride=16, scales=(32, 64, 128, 256, 512), ratios=(0.5, 1, 2), clip_box=True, nms_on=True, rpn_pre_nms_top_n=6000, rpn_post_nms_top_n=300, iou_threshold=0.3, rpn_min_size=0, level=-1, f_proc=None, batch_size=1, dtype='float16')[source]

Methods:

box_transform(bbox_deltas, anchors)

apply transformation for proposals

forward(*args)

Defines the computation performed at every call.

box_transform(bbox_deltas, anchors)[source]

apply transformation for proposals

forward(*args)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class aitemplate.frontend.nn.Reshape[source]

Returns a tensor with the same data and number of elements as input, but with the specified shape. Inputs must be contiguous.

A single dimension may be -1, in which case it’s inferred from the remaining dimensions and the number of elements in input.

Methods:

forward(*args)

Reshaped the input to given size.

forward(*args)[source]

Reshaped the input to given size.

class aitemplate.frontend.nn.RoiAlign(num_rois, pooled_size, sampling_ratio, spatial_scale, position_sensitive, continuous_coordinate)[source]

Performs Region of Interest (RoI) Align operator with average pooling, as described in Mask R-CNN.

  • num_rois identifies the number of RoIs in the input.

  • pooled_size identifies the size of the pooling section, i.e., the size of the output (in bins or pixels) after the pooling is performed, as (height, width).

  • sampling_ratio is the number of sampling points in the interpolation grid used to compute the output value of each pooled output bin. If > 0, then exactly sampling_ratio x sampling_ratio sampling points per bin are used. If <= 0, then an adaptive number of grid points are used (computed as ceil(roi_width / output_width), and likewise for height).

  • spatial_scale is a scaling factor that maps the box coordinates to the input coordinates. For example, if your boxes are defined on the scale of a 224x224 image and your input is a 112x112 feature map (resulting from a 0.5x scaling of the original image), you’ll want to set this to 0.5.

  • position_sensitive, a bool value.

  • continuous_coordinate. a bool value.

Parameters:
  • x (Tensor[N, H, W, C]) – the feature map, i.e. a batch with N elements. Each element contains C feature maps of dimensions H x W.

  • rois (Tensor[roi_batch, 5]) – the list of RoIs and each ROI contains the index of the corresponding element in the batch, i.e. a number in [0, N - 1], and the box coordinates in (x1, y1, x2, y2) format where the regions will be taken from. The coordinate must satisfy 0 <= x1 < x2 and 0 <= y1 < y2.

Returns:

the fixed-size feature maps, i.e., the pooled RoIs.

Return type:

Tensor[roi_batch, pooled_size, pooled_size, C]

Methods:

forward(*args)

Performs RoiAlign on the input.

forward(*args)[source]

Performs RoiAlign on the input.

class aitemplate.frontend.nn.ScaledDotProductAttention[source]

Methods:

forward(q, k, v)

Defines the computation performed at every call.

forward(q, k, v)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class aitemplate.frontend.nn.Sequential(*args: Module)[source]
class aitemplate.frontend.nn.Sequential(arg: OrderedDict[str, Module])

A sequential container. Modules will be added to it in the order they are passed in the constructor. Alternatively, an OrderedDict of modules can be passed in. The forward() method of Sequential accepts any input and forwards it to the first module it contains. It then “chains” outputs to inputs sequentially for each subsequent module, finally returning the output of the last module.

The value a Sequential provides over manually calling a sequence of modules is that it allows treating the whole container as a single module, such that performing a transformation on the Sequential applies to each of the modules it stores (which are each a registered submodule of the Sequential).

What’s the difference between a Sequential and a nn.ModuleList? A ModuleList is exactly what it sounds like–a list for storing Module s! On the other hand, the layers in a Sequential are connected in a cascading way.

Example:

# Using Sequential to create a small model. When `model` is run,
# input will first be passed to `Conv2d(1,20,5)`. The output of
# `Conv2d(1,20,5)` will be used as the input to the first
# `ReLU`; the output of the first `ReLU` will become the input
# for `Conv2d(20,64,5)`. Finally, the output of
# `Conv2d(20,64,5)` will be used as input to the second `ReLU`
model = nn.Sequential(
          nn.Conv2d(1,20,5),
          nn.ReLU(),
          nn.Conv2d(20,64,5),
          nn.ReLU()
        )

# Using Sequential with OrderedDict. This is functionally the
# same as the above code
model = nn.Sequential(OrderedDict([
          ('conv1', nn.Conv2d(1,20,5)),
          ('relu1', nn.ReLU()),
          ('conv2', nn.Conv2d(20,64,5)),
          ('relu2', nn.ReLU())
        ]))

Methods:

append(module)

Appends a given module to the end.

forward(input)

Defines the computation performed at every call.

append(module: Module) Sequential[source]

Appends a given module to the end.

Parameters:

module (nn.Module) – module to append

forward(input)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class aitemplate.frontend.nn.T5DenseGatedGeluDense(in_channels, out_channels, dtype='float16')[source]

T5DenseGatedGeluDense.

Methods:

forward(*args)

forward pass for calling T5 block

forward(*args)[source]

forward pass for calling T5 block

class aitemplate.frontend.nn.Tensor(shape: List[IntVar], name: Optional[str] = None, src_ops: Optional[Iterable[Node]] = None, dst_ops: Optional[Iterable[Node]] = None, dtype: str = 'float16', is_input: bool = False, is_output: bool = False, value: Optional[Any] = None, is_view_of: Optional[Any] = None, is_internal_constant: bool = False, skip_constant_folding: bool = False, check_nan_and_inf: bool = False, check_outputs: bool = False, original_name: Optional[str] = None)[source]

A Tensor represents a piece of data, which is used as an input / output of an Operator. Both Tensor and Operator are used at model compilation stage.

Methods:

dst_ops()

Returns a set of destination operators which read from this Tensor.

dtype()

Returns Tensor's data type str.

is_a_const_num()

Returns whether this Tensor represents a constant number.

is_jagged()

Whether the Tensor is jagged (the first dim is JaggedIntVar).

pseudo_code([with_shape])

Returns a string containing pseudo code of this object.

shape()

Returns the shape of the tensor.

size_bytes([alignment])

Returns actual size (in bytes) of this Tensor.

src_ops()

Returns a set of source operators which write to this Tensor.

dst_ops() Set[Operator][source]

Returns a set of destination operators which read from this Tensor.

dtype() str[source]

Returns Tensor’s data type str.

is_a_const_num() bool[source]

Returns whether this Tensor represents a constant number.

is_jagged() bool[source]

Whether the Tensor is jagged (the first dim is JaggedIntVar).

pseudo_code(with_shape=True) str[source]

Returns a string containing pseudo code of this object.

Parameters:

with_shape (bool) – Marks whether to include shape info in the returned pseudo code.

Returns:

Pseudo code.

Return type:

str

shape() List[IntVar][source]

Returns the shape of the tensor. It should not be used directly in IR.

size_bytes(alignment: int = 1) int[source]

Returns actual size (in bytes) of this Tensor.

src_ops() Set[Operator][source]

Returns a set of source operators which write to this Tensor.

class aitemplate.frontend.nn.Upsampling2d(scale_factor, mode)[source]

Applies a 2D bilinear upsampling to an input signal composed of several input channels.

To specify the scale, it takes the scale_factor as it’s constructor argument.

  • scale_factor (float): multiplier for spatial size.

  • mode (str): the upsampling algorithm: one of 'nearest', 'linear', 'bilinear', 'bicubic' and 'trilinear'. Currently we support 'bilinear' and 'nearest' mode.

Parameters:

input (Tensor [N, H, W, C]) – the input data.

Returns:

Tensor [N, H_out, W_out, C].

Methods:

forward(*args)

Defines the computation performed at every call.

forward(*args)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class aitemplate.frontend.nn.Upsampling2dAdd(scale_factor, mode)[source]

Applies Upsampling2d + add.

Methods:

forward(*args)

Defines the computation performed at every call.

forward(*args)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class aitemplate.frontend.nn.VanillaCrossAttention(dim, seq_len, seq_len_kv, num_heads, qkv_bias=False, attn_drop=0.0, proj_drop=0.0, has_residual=True, causal=False)[source]

Vanilla Cross Multi-head Attention.

Allows the model to jointly attend to information from different representation subspaces as described in the paper: Attention Is All You Need.

Multi-Head Attention is defined as:

\[\text{MultiHead}(Q, K, V) = \text{Concat}(head_1,\dots,head_h)W^O\]

where \(head_i = \text{Attention}(QW_i^Q, KW_i^K, VW_i^V)\).

Parameters:
  • dim – total dimension of the model

  • batch_size – batch size

  • seq_len – sequence length

  • num_heads – Number of parallel attention heads. Default: 8

  • qkv_bias – whether to add bias to QKV. Default: False

  • attn_drop – Dropout probability on attention output weights. Default: 0.0 (no dropout).

  • proj_drop – Dropout probability on projection layers. Default: 0.0 (no dropout).

  • has_residual – has or has no residual. Default: True.

  • causal – default: False.

  • mask_seq – sequence mask, default: 0.

Methods:

forward(*args)

forward pass for calling mha module

forward(*args)[source]

forward pass for calling mha module

class aitemplate.frontend.nn.VanillaMultiheadAttention(dim, batch_size=-1, seq_len=-1, num_heads=8, qkv_bias=False, attn_drop=0.0, proj_drop=0.0, has_residual=True, causal=False, attn_mask: Optional[Tensor] = None, mask_seq=0)[source]

Vanilla Multi-Head Attention.

Allows the model to jointly attend to information from different representation subspaces as described in the paper: Attention Is All You Need.

Multi-Head Attention is defined as: .. math:

\text{MultiHead}(Q, K, V) = \text{Concat}(head_1,\dots,head_h)W^O

where \(head_i = \text{Attention}(QW_i^Q, KW_i^K, VW_i^V)\).

Parameters:
  • dim – total dimension of the model

  • batch_size – batch size

  • seq_len – sequence length

  • num_heads – Number of parallel attention heads. Default: 8

  • qkv_bias – whether to add bias to QKV. Default: False

  • attn_drop – Dropout probability on attention output weights. Default: 0.0 (no dropout).

  • proj_drop – Dropout probability on projection layers. Default: 0.0 (no dropout).

  • has_residual – has or has no residual. Default: True.

  • causal – default: False.

  • attn_mask – Attention mask. If causal this should be a tensor of shape [1, seq_len, seq_len] filled with -inf and 0

  • mask_seq – sequence mask, default: 0.

Methods:

forward(*args)

forward pass for calling mha module

forward(*args)[source]

forward pass for calling mha module

class aitemplate.frontend.nn.View[source]

Placeholder for View layer. The current implementation is the same as Reshape. Returns a tensor with the same data and number of elements as input, but with the specified shape. Inputs must be contiguous.

A single dimension may be -1, in which case it’s inferred from the remaining dimensions and the number of elements in input.

Methods:

forward(*args)

Creates a view (copy) of the input with given shape.

forward(*args)[source]

Creates a view (copy) of the input with given shape.

class aitemplate.frontend.nn.avg_pool2d(kernel_size, stride, pad)[source]

Applies a 2D average pooling over an input signal composed of several input planes.

In the simplest case, the output value of the layer with input size \((N, H, W, C)\), output \((N, H_{out}, W_{out}, C)\) and kernel_size \((kH, kW)\) can be precisely described as:

\[out(N_i, C_j, h, w) = \frac{1}{kH * kW} \sum_{m=0}^{kH-1} \sum_{n=0}^{kW-1} input(N_i, C_j, stride[0] \times h + m, stride[1] \times w + n)\]

If pad is non-zero, then the input is implicitly zero-padded on both sides for pad number of points.

  • .attr.:kernel_size: the size of the window

  • .attr.:stride: the stride of the window

  • .attr.:pad: implicit zero padding to be added on both sides

Parameters:

input (Tensor [N, H, W, C]) – the input tensor.

Returns:

Tensor [N, H_out, W_out, C].

class aitemplate.frontend.nn.conv3d(stride, pad, dilate=1, group=1)[source]

Methods:

gen_function()

Generates function source code string.

gen_profiler([workdir, ...])

Profiler generator.

profile([workdir, devices, ...])

Selects the fastest kernel configurations.

gen_function() str[source]

Generates function source code string.

Returns:

str

Return type:

a string which contains C++ function implementation source code.

Raises:

NotImplementedError

gen_profiler(workdir: Optional[str] = None, dynamic_profiling_strategy=DynamicProfileStrategy.HINTS) None[source]

Profiler generator.

Parameters:
  • workdir (str, optional, by default None) –

  • dynamic_profiling_strategy (DynamicProfileStrategy, optional) – A dynamic profiling strategy, used to filter generated profiles at compile time. See also: profile()

profile(workdir='./', devices=None, dynamic_profiling_strategy=DynamicProfileStrategy.HINTS)[source]

Selects the fastest kernel configurations.

Parameters:
  • workdir (str, optional) – The directory which contains source files, by default “./”

  • devices (list, optional) – A list of device ids which can be used for profiling.

  • dynamic_profiling_strategy (DynamicProfileStrategy, optional) – Profiling strategy used when there are dynamic dims. By default, MAX is used, i.e. to profile a dynamic range, an upper bound will be used.

class aitemplate.frontend.nn.conv3d_bias(stride, pad, dilate=1, group=1)[source]
class aitemplate.frontend.nn.depthwise_conv3d(stride, pad, dilate=1, group=1, bias=False)[source]

Methods:

gen_function()

Generates function source code string.

gen_function() str[source]

Generates function source code string.

Returns:

str

Return type:

a string which contains C++ function implementation source code.

Raises:

NotImplementedError

aitemplate.frontend.nn.detect_target(**kwargs)[source]

Detect GPU target based on nvidia-smi and rocminfo

Returns:

CUDA or ROCM target

Return type:

Target

class aitemplate.frontend.nn.flatten(start_dim=0, end_dim=-1)[source]

Flattens input by reshaping it into a one-dimensional tensor. If start_dim or end_dim are passed, only dimensions starting with start_dim and ending with end_dim are flattened. The order of elements in input is unchanged.

Methods:

gen_function()

Generates function source code string.

gen_function() str[source]

Generates function source code string.

Returns:

str

Return type:

a string which contains C++ function implementation source code.

Raises:

NotImplementedError

class aitemplate.frontend.nn.max_pool2d(kernel_size, stride, pad)[source]

Applies a 2D max pooling over an input signal composed of several input planes.

In the simplest case, the output value of the layer with input size \((N, C, H, W)\), output \((N, C, H_{out}, W_{out})\) and kernel_size \((kH, kW)\) can be precisely described as:

\[\begin{split}\begin{aligned} out(N_i, C_j, h, w) ={} & \max_{m=0, \ldots, kH-1} \max_{n=0, \ldots, kW-1} \\ & \text{input}(N_i, C_j, \text{stride[0]} \times h + m, \text{stride[1]} \times w + n) \end{aligned}\end{split}\]

If pad is non-zero, then the input is implicitly padded with negative infinity on both sides.

  • .attr.:kernel_size: the size of the window

  • .attr.:stride: the stride of the window

  • .attr.:pad: implicit zero padding to be added on both sides

Parameters:

input (Tensor [N, H, W, C]) – the input tensor.

Returns:

Tensor [N, H_out, W_out, C].

class aitemplate.frontend.nn.multi_level_roi_align(num_rois, pooled_size, sampling_ratio, spatial_scale, position_sensitive, continuous_coordinate, im_shape)[source]

Performs Multiple level Region of Interest (RoI) Align operator with average pooling, as described in Mask R-CNN.

  • num_rois identifies the number of RoIs in the input.

  • pooled_size identifies the size of the pooling section, i.e., the size of the output (in bins or pixels) after the pooling is performed, as (height, width).

  • sampling_ratio is the number of sampling points in the interpolation grid used to compute the output value of each pooled output bin. If > 0, then exactly sampling_ratio x sampling_ratio sampling points per bin are used. If <= 0, then an adaptive number of grid points are used (computed as ceil(roi_width / output_width), and likewise for height).

  • spatial_scale is a scaling factor that maps the box coordinates to the input coordinates. For example, if your boxes are defined on the scale of a 224x224 image and your input is a 112x112 feature map (resulting from a 0.5x scaling of the original image), you’ll want to set this to 0.5.

  • position_sensitive, a bool value.

  • continuous_coordinate, a bool value.

  • im_shape, original image shape.

Parameters:
  • p1 (Tensor[N, H//4, W//4, C]) – the feature map, i.e. a batch with N elements. Each element contains C feature maps of dimensions (H//4) x (W//4).

  • p2 (Tensor[N, H//8, W//8, C]) – the feature map, i.e. a batch with N elements. Each element contains C feature maps of dimensions (H//8) x (W//8).

  • p3 (Tensor[N, H//16, W//16, C]) – the feature map, i.e. a batch with N elements. Each element contains C feature maps of dimensions (H//16) x (W//16).

  • p4 (Tensor[N, H//32, W//32, C]) – the feature map, i.e. a batch with N elements. Each element contains C feature maps of dimensions (H//32) x (W//32).

  • rois (Tensor[roi_batch, 5]) – the list of RoIs and each ROI contains the index of the corresponding element in the batch, i.e. a number in [0, N - 1], and the box coordinates in (x1, y1, x2, y2) format where the regions will be taken from. The coordinate must satisfy 0 <= x1 < x2 and 0 <= y1 < y2.

Returns:

the fixed-size feature maps, i.e., the pooled RoIs.

Return type:

Tensor[num_rois * N, pooled_size, pooled_size, C]

class aitemplate.frontend.nn.ndhwc3to8[source]

Pad the 3-channel input data to 8-channel.

Methods:

gen_function()

Generates function source code string.

gen_function() str[source]

Generates function source code string.

Returns:

str

Return type:

a string which contains C++ function implementation source code.

Raises:

NotImplementedError

class aitemplate.frontend.nn.nhwc3to8[source]
class aitemplate.frontend.nn.reshape[source]

Returns a tensor with the same data and number of elements as input, but with the specified shape. Inputs must be contiguous.

A single dimension may be -1, in which case it’s inferred from the remaining dimensions and the number of elements in input.

Methods:

gen_function()

Generates function source code string.

gen_function() str[source]

Generates function source code string.

Returns:

str

Return type:

a string which contains C++ function implementation source code.

Raises:

NotImplementedError

class aitemplate.frontend.nn.roi_align(num_rois, pooled_size, sampling_ratio, spatial_scale, position_sensitive, continuous_coordinate)[source]

Performs Region of Interest (RoI) Align operator with average pooling, as described in Mask R-CNN.

  • num_rois identifies the number of RoIs in the input.

  • pooled_size identifies the size of the pooling section, i.e., the size of the output (in bins or pixels) after the pooling is performed, as (height, width).

  • sampling_ratio is the number of sampling points in the interpolation grid used to compute the output value of each pooled output bin. If > 0, then exactly sampling_ratio x sampling_ratio sampling points per bin are used. If <= 0, then an adaptive number of grid points are used (computed as ceil(roi_width / output_width), and likewise for height).

  • spatial_scale is a scaling factor that maps the box coordinates to the input coordinates. For example, if your boxes are defined on the scale of a 224x224 image and your input is a 112x112 feature map (resulting from a 0.5x scaling of the original image), you’ll want to set this to 0.5.

  • position_sensitive, a bool value.

  • continuous_coordinate. a bool value.

Parameters:
  • x (Tensor[N, H, W, C]) – the feature map, i.e. a batch with N elements. Each element contains C feature maps of dimensions H x W.

  • rois (Tensor[roi_batch, 5]) – the list of RoIs and each ROI contains the index of the corresponding element in the batch, i.e. a number in [0, N - 1], and the box coordinates in (x1, y1, x2, y2) format where the regions will be taken from. The coordinate must satisfy 0 <= x1 < x2 and 0 <= y1 < y2.

Returns:

the fixed-size feature maps, i.e., the pooled RoIs.

Return type:

Tensor[roi_batch, pooled_size, pooled_size, C]

class aitemplate.frontend.nn.squeeze(dim: Optional[int])[source]

Examines the specified dimension and gets rid of it if it is of size 1.

>>> x = Tensor(shape=[IntImm(3), IntImm(2), IntImm(1)])
>>> squeeze(2)(x)
Tensor(shape=[IntImm(3), IntImm(2)])

>>> x = Tensor(shape=[IntImm(3), IntImm(2), IntImm(1)])
>>> squeeze(1)(x)
Tensor(shape=[IntImm(3), IntImm(2), IntImm(1)])

>>> x = Tensor(shape=[IntImm(4), IntImm(1), IntImm(3)])
>>> squeeze(-2)(x)
Tensor(shape=[IntImm(4), IntImm(3)])

>>> x = Tensor(shape=[IntImm(1), IntImm(1), IntImm(4)])
>>> squeeze(None)(x)
Tensor(shape=[IntImm(4)])

There are some additional assumptions for dynamic dims. Since our shape inference system cannot handle outputs with variable outputs, we assume that if a dynamic dim is squeezed, it contains no ones:

>>> x = Tensor(shape=[IntVar([3, 2]), IntImm(2)])
>>> y = Tensor(shape=[IntVar([1, 2]), IntImm(2)])
>>> squeeze(0)(x) # OK
Tensor(shape=[IntVar([3, 2]), IntImm(2)])
>>> squeeze(1)(y) # error!
  • dim (Optional[int]) : the dimension to get rid of. If None, get rid of all dimensions of size 1.

Parameters:

x (Tensor) – the source tensor to squeeze.

Returns:

the squeezed tensor.

Return type:

Tensor

Methods:

gen_function()

Generates function source code string.

gen_function() str[source]

Generates function source code string.

Returns:

str

Return type:

a string which contains C++ function implementation source code.

Raises:

NotImplementedError

class aitemplate.frontend.nn.unsqueeze(dim: int)[source]

Adds a dimension of size 1 at a specified location. >>> x = Tensor(shape=[IntImm(4), IntImm(3)]) >>> unsqueeze(0)(x) Tensor(shape=[IntImm(1), IntImm(4), IntImm(3)]) >>> unsqueeze(-1)(x) Tensor(shape=[IntImm(4), IntImm(3), IntImm(1)])

Parameters:

dim (int) – Where to add the dimension, must be in range [-input_ndim - 1, input_dim + 1)

class aitemplate.frontend.nn.upsampling2d(scale_factor, mode)[source]

Applies a 2D bilinear upsampling to an input signal composed of several input channels.

To specify the scale, it takes the scale_factor as it’s constructor argument.

  • scale_factor (float): multiplier for spatial size.

Parameters:

input (Tensor [N, H, W, C]) – the input data.

Returns:

Tensor [N, H_out, W_out, C].

class aitemplate.frontend.nn.upsampling2d_add(scale_factor, mode)[source]

Fused op for bilinear_upsampling + add.

Applies a 2D bilinear upsampling to an input signal composed of several input channels, and adds an residual.

To specify the scale, it takes the scale_factor as it’s constructor argument.

  • scale_factor (float): multiplier for spatial size.

Parameters:
  • input (Tensor [N, H, W, C]) – the input data.

  • r (Tensor [N, H_out, W_out, C]) – the residual.

Returns:

Tensor [N, H_out, W_out, C].

aitemplate.frontend.nn.vanilla_attention(q: Tensor, k: Tensor, v: Tensor, scale: Optional[float] = None, attn_mask: Optional[Tensor] = None) Tensor[source]

Vanilla attention in the most basic form. q,k,v: batch, seqlen, num_heads, head_dim

Either batch or sequence dimension could be variable (but not both)

attn_mask: attention mask is added to the attention,

use 0 and -inf to mask a sequence index