aitemplate.backend

aitemplate.backend.task_runner

This module is a general-purpose subprocess-based task runner.

Classes:

`BaseRunner`(devs, tag[, timeout])	Genetic subprocess task runner for different purposes
`DeviceFarm`(devs)	Device Farm is a stateful object to schedule and assigns a task to the available devices.
`Task`(idx, cmd, name, **kwargs)	Task is an object containing a bash command, process for the command, and output of the process.

class aitemplate.backend.task_runner.BaseRunner(devs: List[int], tag: str, timeout: int = 10)[source]

Genetic subprocess task runner for different purposes

Methods:

`join`()	Waiting until all tasks are finished.
`pull`(ftask_proc, fret_proc)	Pull results from all tasks executed on the runner.
`push`(idx, cmd)	Push a task into runner
`reset`()	Reset runner, clear task queue and device states

join() → None[source]: Waiting until all tasks are finished.

pull(ftask_proc: Callable, fret_proc: Callable) → List[source]

Pull results from all tasks executed on the runner.

Parameters:

ftask_proc (Callable) – Function to process each task’s output
fret_proc (Callable) – Function to extract returns from task

Returns:

Aggregated returns from all tasks

Return type:

List

push(idx: Union[int, str], cmd: str)[source]

Push a task into runner

Parameters:

idx (Union[int, str]) – id of the task
cmd (str) – bash command line for the task

reset() → None[source]: Reset runner, clear task queue and device states

class aitemplate.backend.task_runner.DeviceFarm(devs: List[int])[source]

Device Farm is a stateful object to schedule and assigns a task to the available devices. Devices are logical devices, can be CPUs or GPUs.

Methods:

`next_idle_dev`()	Return the next idle (available) device id
`reset_all`()	Reset all devices to be idle
`reset_dev_state`(dev_id)	Rest the device id state to idle

next_idle_dev() → Optional[int][source]

Return the next idle (available) device id

Returns:: The next idle device id. If all devices are busy, return None
Return type:: Union[None, int]

reset_all() → None[source]: Reset all devices to be idle

reset_dev_state(dev_id: int) → None[source]

Rest the device id state to idle

Parameters:: dev_id (int) – The id of device will be reset

class aitemplate.backend.task_runner.Task(idx: Union[int, str], cmd: str, name: str, **kwargs)[source]

Task is an object containing a bash command, process for the command, and output of the process.

Methods:

`assigned_dev`()	Return the assigned device id for the task
`is_failed`()	Check whether the task is failed
`is_finished`()	Check whether the task is finished
`is_running`()	Check whether the task process is still running.
`is_timeout`()	Check whether the task is timeout
`poll`(current_time, timeout)	Given the current time, check whether the task is running, finished or timed out.
`pull`(fproc)	Pull stdout & stderr from process, process stdout & stderr with fproc, and set the output for the task.

assigned_dev() → int[source]

Return the assigned device id for the task

Returns:: Assigned device id
Return type:: int

is_failed() → bool[source]

Check whether the task is failed

Returns:: Whether the task is failed
Return type:: bool

is_finished() → bool[source]

Check whether the task is finished

Returns:: Whether the task is finished
Return type:: bool

is_running() → bool[source]

Check whether the task process is still running.

Returns:: Whether the task process is still running
Return type:: bool

is_timeout() → bool[source]

Check whether the task is timeout

Returns:: Whether the task is timeout
Return type:: bool

poll(current_time, timeout) → bool[source]

Given the current time, check whether the task is running, finished or timed out.

Parameters:

current_time (int) – Current timestamp
timeout (int) – Timeout time

Returns:

Whether the task is finished

Return type:

bool

pull(fproc: Callable) → None[source]

Pull stdout & stderr from process, process stdout & stderr with fproc, and set the output for the task.

Parameters:: fproc (Callable) – Process function of the task given stdout & stderr

aitemplate.backend.builder

Builder is a module to compile generated source code files into binary objects.

Classes:

`AITDebugSettings`([check_all_nan_and_inf, ...])	This class contains the options for configuring debug settings Arguments: check_all_nan_and_inf : bool (default: False) Whether or not to check this tensor is nan or inf during runtime. check_all_outputs : bool (default: False) Whether or not to print this tensor's value out during runtime. gen_profiler_annotation : bool (default: False) Whether or not to add profile annotation primitives when doing codegen. (e.g. NVTX for CUDA and rocTX for AMD) Currently only supports NVIDIA. dump_ait_to_py: str, optional The path where the AIT graph is dumped into a .py file. gen_standalone : bool (default: False) Generate a standalone executable for the model.
`Builder`([n_jobs, timeout])	Builder is a module to compile generated source code files into binary objects.
`Path`(args, *kwargs)	PurePath subclass that can make system calls.
`Runner`(devs[, timeout])	A parallel runner for compiling tasks.
`Task`(idx, cmd, name, **kwargs)	Task is an object containing a bash command, process for the command, and output of the process.

Functions:

`is_cmake_compilation`()	When enabled, compiles the model via invoking CMake rather than invoking make directly.
`process_return`(task)	This function process the task.
`process_task`(task)	This function extracts stdout and stderr from a finished task.
`sha1`([string, usedforsecurity])	Returns a sha1 hash object; optionally initialized with a string
`write_binhash_file`(build_dir[, ...])	Hash all binary input files, so we don't have to keep them ( Usecase: constants.obj / constants.bin )

class aitemplate.backend.builder.AITDebugSettings(check_all_nan_and_inf: bool = False, check_all_outputs: bool = False, gen_profiler_annotation: bool = False, dump_ait_to_py: Optional[str] = None, gen_standalone: bool = False)[source]

This class contains the options for configuring debug settings Arguments: check_all_nan_and_inf : bool (default: False)

Whether or not to check this tensor is nan or inf during runtime.

check_all_outputsbool (default: False): Whether or not to print this tensor’s value out during runtime.
gen_profiler_annotationbool (default: False): Whether or not to add profile annotation primitives when doing codegen. (e.g. NVTX for CUDA and rocTX for AMD) Currently only supports NVIDIA.
dump_ait_to_py: str, optional: The path where the AIT graph is dumped into a .py file.
gen_standalonebool (default: False): Generate a standalone executable for the model

class aitemplate.backend.builder.Builder(n_jobs: int = -1, timeout: int = 180)[source]

Builder is a module to compile generated source code files into binary objects.

Methods:

`build_objs`(files, cc_cmd[, binary_cc_cmd])	Generate building task for each source code file, then build in parallel
`build_so`(target, objs)	Generate a task to build all objects into a dynamic library

build_objs(files: List[Tuple[str, str]], cc_cmd: str, binary_cc_cmd: Optional[str] = None)[source]

Generate building task for each source code file, then build in parallel

Parameters:

files (List[Tuple[str, str]]) – list of tuples of source code path and object file path
cc_cmd (str) – command line template for building objects
binary_cc_cmd (optional, str) – command line template for turning raw binary files (those ending in .bin) into objects. Since most compilation jobs will not need to compile these, this argument is optional.

build_so(target: Target, objs: List[str])[source]

Generate a task to build all objects into a dynamic library

Parameters:

target (Target) – Device target of dynamic library
objs (List[str]) – List of all object file paths for building the dynamic library.

class aitemplate.backend.builder.Path(*args, **kwargs)[source]

PurePath subclass that can make system calls.

Path represents a filesystem path but unlike PurePath, also offers methods to do system calls on path objects. Depending on your system, instantiating a Path will return either a PosixPath or a WindowsPath object. You can also instantiate a PosixPath or WindowsPath directly, but cannot instantiate a WindowsPath on a POSIX system or vice versa.

Methods:

`absolute`()	Return an absolute version of this path.
`chmod`(mode)	Change the permissions of the path, like os.chmod().
`cwd`()	Return a new path pointing to the current working directory (as returned by os.getcwd()).
`exists`()	Whether this path exists.
`expanduser`()	Return a new path with expanded ~ and ~user constructs (as returned by os.path.expanduser)
`glob`(pattern)	Iterate over this subtree and yield all existing files (of any kind, including directories) matching the given relative pattern.
`group`()	Return the group name of the file gid.
`home`()	Return a new path pointing to the user's home directory (as returned by os.path.expanduser('~')).
`is_block_device`()	Whether this path is a block device.
`is_char_device`()	Whether this path is a character device.
`is_dir`()	Whether this path is a directory.
`is_fifo`()	Whether this path is a FIFO.
`is_file`()	Whether this path is a regular file (also True for symlinks pointing to regular files).
`is_mount`()	Check if this path is a POSIX mount point
`is_socket`()	Whether this path is a socket.
`is_symlink`()	Whether this path is a symbolic link.
`iterdir`()	Iterate over the files in this directory.
`lchmod`(mode)	Like chmod(), except if the path points to a symlink, the symlink's permissions are changed, rather than its target's.
`link_to`(target)	Make the target path a hard link pointing to this path.
`lstat`()	Like stat(), except if the path points to a symlink, the symlink's status information is returned, rather than its target's.
`mkdir`([mode, parents, exist_ok])	Create a new directory at this given path.
`open`([mode, buffering, encoding, errors, ...])	Open the file pointed by this path and return a file object, as the built-in open() function does.
`owner`()	Return the login name of the file owner.
`read_bytes`()	Open the file in bytes mode, read it, and close the file.
`read_text`([encoding, errors])	Open the file in text mode, read it, and close the file.
`readlink`()	Return the path to which the symbolic link points.
`rename`(target)	Rename this path to the target path.
`replace`(target)	Rename this path to the target path, overwriting if that path exists.
`resolve`([strict])	Make the path absolute, resolving all symlinks on the way and also normalizing it (for example turning slashes into backslashes under Windows).
`rglob`(pattern)	Recursively yield all existing files (of any kind, including directories) matching the given relative pattern, anywhere in this subtree.
`rmdir`()	Remove this directory.
`samefile`(other_path)	Return whether other_path is the same or not as this file (as returned by os.path.samefile()).
`stat`()	Return the result of the stat() system call on this path, like os.stat() does.
`symlink_to`(target[, target_is_directory])	Make this path a symlink pointing to the target path.
`touch`([mode, exist_ok])	Create this file with the given access mode, if it doesn't exist.
`unlink`([missing_ok])	Remove this file or link.
`write_bytes`(data)	Open the file in bytes mode, write to it, and close the file.
`write_text`(data[, encoding, errors])	Open the file in text mode, write to it, and close the file.

absolute()[source]

Return an absolute version of this path. This function works even if the path doesn’t point to anything.

No normalization is done, i.e. all ‘.’ and ‘..’ will be kept along. Use resolve() to get the canonical path to a file.

chmod(mode)[source]: Change the permissions of the path, like os.chmod().

classmethod cwd()[source]: Return a new path pointing to the current working directory (as returned by os.getcwd()).

exists()[source]: Whether this path exists.

expanduser()[source]: Return a new path with expanded ~ and ~user constructs (as returned by os.path.expanduser)

glob(pattern)[source]: Iterate over this subtree and yield all existing files (of any kind, including directories) matching the given relative pattern.

group()[source]: Return the group name of the file gid.

classmethod home()[source]: Return a new path pointing to the user’s home directory (as returned by os.path.expanduser(‘~’)).

is_block_device()[source]: Whether this path is a block device.

is_char_device()[source]: Whether this path is a character device.

is_dir()[source]: Whether this path is a directory.

is_fifo()[source]: Whether this path is a FIFO.

is_file()[source]: Whether this path is a regular file (also True for symlinks pointing to regular files).

is_mount()[source]: Check if this path is a POSIX mount point

is_socket()[source]: Whether this path is a socket.

is_symlink()[source]: Whether this path is a symbolic link.

iterdir()[source]: Iterate over the files in this directory. Does not yield any result for the special paths ‘.’ and ‘..’.

lchmod(mode)[source]: Like chmod(), except if the path points to a symlink, the symlink’s permissions are changed, rather than its target’s.

link_to(target)[source]

Make the target path a hard link pointing to this path.

Note this function does not make this path a hard link to target, despite the implication of the function and argument names. The order of arguments (target, link) is the reverse of Path.symlink_to, but matches that of os.link.

lstat()[source]: Like stat(), except if the path points to a symlink, the symlink’s status information is returned, rather than its target’s.

mkdir(mode=511, parents=False, exist_ok=False)[source]: Create a new directory at this given path.

open(mode='r', buffering=-1, encoding=None, errors=None, newline=None)[source]: Open the file pointed by this path and return a file object, as the built-in open() function does.

owner()[source]: Return the login name of the file owner.

read_bytes()[source]: Open the file in bytes mode, read it, and close the file.

read_text(encoding=None, errors=None)[source]: Open the file in text mode, read it, and close the file.

readlink()[source]: Return the path to which the symbolic link points.

rename(target)[source]

Rename this path to the target path.

The target path may be absolute or relative. Relative paths are interpreted relative to the current working directory, not the directory of the Path object.

Returns the new Path instance pointing to the target path.

replace(target)[source]

Rename this path to the target path, overwriting if that path exists.

The target path may be absolute or relative. Relative paths are interpreted relative to the current working directory, not the directory of the Path object.

Returns the new Path instance pointing to the target path.

resolve(strict=False)[source]: Make the path absolute, resolving all symlinks on the way and also normalizing it (for example turning slashes into backslashes under Windows).

rglob(pattern)[source]: Recursively yield all existing files (of any kind, including directories) matching the given relative pattern, anywhere in this subtree.

rmdir()[source]: Remove this directory. The directory must be empty.

samefile(other_path)[source]: Return whether other_path is the same or not as this file (as returned by os.path.samefile()).

stat()[source]: Return the result of the stat() system call on this path, like os.stat() does.

symlink_to(target, target_is_directory=False)[source]: Make this path a symlink pointing to the target path. Note the order of arguments (link, target) is the reverse of os.symlink.

touch(mode=438, exist_ok=True)[source]: Create this file with the given access mode, if it doesn’t exist.

unlink(missing_ok=False)[source]: Remove this file or link. If the path is a directory, use rmdir() instead.

write_bytes(data)[source]: Open the file in bytes mode, write to it, and close the file.

write_text(data, encoding=None, errors=None)[source]: Open the file in text mode, write to it, and close the file.

class aitemplate.backend.builder.Runner(devs: List[int], timeout: int = 10)[source]

A parallel runner for compiling tasks. Runner is inherited from BaseRunner.

Methods:

`pull`()	Pull building results.
`push`(idx, cmd, target)	Push a building task into runner

pull() → List[source]

Pull building results. Check whether all building tasks are successful.

Returns:: An empty list
Return type:: list

push(idx: Union[int, str], cmd: str, target: Target) → None[source]

Push a building task into runner

Parameters:

idx (Union[int, str]) – Task id
cmd (str) – bash command for compiling
target (Target) – Target device type for building

class aitemplate.backend.builder.Task(idx: Union[int, str], cmd: str, name: str, **kwargs)[source]

Task is an object containing a bash command, process for the command, and output of the process.

Methods:

`assigned_dev`()	Return the assigned device id for the task
`is_failed`()	Check whether the task is failed
`is_finished`()	Check whether the task is finished
`is_running`()	Check whether the task process is still running.
`is_timeout`()	Check whether the task is timeout
`poll`(current_time, timeout)	Given the current time, check whether the task is running, finished or timed out.
`pull`(fproc)	Pull stdout & stderr from process, process stdout & stderr with fproc, and set the output for the task.

assigned_dev() → int[source]

Return the assigned device id for the task

Returns:: Assigned device id
Return type:: int

is_failed() → bool[source]

Check whether the task is failed

Returns:: Whether the task is failed
Return type:: bool

is_finished() → bool[source]

Check whether the task is finished

Returns:: Whether the task is finished
Return type:: bool

is_running() → bool[source]

Check whether the task process is still running.

Returns:: Whether the task process is still running
Return type:: bool

is_timeout() → bool[source]

Check whether the task is timeout

Returns:: Whether the task is timeout
Return type:: bool

poll(current_time, timeout) → bool[source]

Given the current time, check whether the task is running, finished or timed out.

Parameters:

current_time (int) – Current timestamp
timeout (int) – Timeout time

Returns:

Whether the task is finished

Return type:

bool

pull(fproc: Callable) → None[source]

Pull stdout & stderr from process, process stdout & stderr with fproc, and set the output for the task.

Parameters:: fproc (Callable) – Process function of the task given stdout & stderr

aitemplate.backend.builder.is_cmake_compilation() → bool[source]: When enabled, compiles the model via invoking CMake rather than invoking make directly.

aitemplate.backend.builder.process_return(task: Task) → None[source]

This function process the task. If task is timeout or failed, raise a runtime error.

Parameters:: task (Task) – A compiling task.
Raises:: RuntimeError – Compiling failed.

aitemplate.backend.builder.process_task(task: Task) → None[source]

This function extracts stdout and stderr from a finished task. If the task process return code is not 0, will mark the task as a failed task.

Parameters:: task (Task) – A compiling task

aitemplate.backend.builder.sha1(string=b'', *, usedforsecurity=True): Returns a sha1 hash object; optionally initialized with a string

aitemplate.backend.builder.write_binhash_file(build_dir, binhash_filename='constants.hash', filter_func: ~typing.Callable[[str], bool] = <function is_bin_file>)[source]

Hash all binary input files, so we don’t have to keep them ( Usecase: constants.obj / constants.bin )

Parameters:

build_dir (str) – Path to build directory
binhash_filename (str, optional) – File to be written within build_dir, defaults to “constants.hash”.
filter_func (Callable[[str], bool], optional) – Filter function to determine which files to hash. Defaults to is_bin_file.

aitemplate.backend.codegen

This module is for generating the final C++ source code in files from Tensor and Operators. Functions in this module will be used for generating function source code files, profiler source code files, and model driver source code files.

Classes:

`AITDebugSettings`([check_all_nan_and_inf, ...])	This class contains the options for configuring debug settings Arguments: check_all_nan_and_inf : bool (default: False) Whether or not to check this tensor is nan or inf during runtime. check_all_outputs : bool (default: False) Whether or not to print this tensor's value out during runtime. gen_profiler_annotation : bool (default: False) Whether or not to add profile annotation primitives when doing codegen. (e.g. NVTX for CUDA and rocTX for AMD) Currently only supports NVIDIA. dump_ait_to_py: str, optional The path where the AIT graph is dumped into a .py file. gen_standalone : bool (default: False) Generate a standalone executable for the model.
`IntImm`(value[, name])	An IntImm represents a static dimension.
`IntVar`(values[, name, symbolic_value])	An IntVar represents a dynamic dimension.
`IntVarTensor`(int_var[, name, src_ops, ...])	A special tensor which represents an IntImm / IntVar.
`Operator`()	Base class for all operators
`Path`(args, *kwargs)	PurePath subclass that can make system calls.
`TensorAccessor`(original_tensor)	A tensor accessor which manages how to access a Tensor.
`Workspace`(shared_size, unique_size)
`defaultdict`	defaultdict(default_factory=None, /, [...]) --> dict with default factory

Functions:

`check_not_null`(tensor[, tensor_idx, ...])	Generate a nullptr check to be used by pointer initialization code.
`dtype_to_enumerator`(dtype)	Returns the string representation of the AITemplateDtype enum (defined in model_interface.h) for the given dtype str.
`gen_function_src`(sorted_graph, workdir[, ...])	Generate functions source code files for the given graph
`gen_library_src`(sorted_graph, max_blob_size, ...)	Generate model driver source code files for the given graph
`gen_profiler`(sorted_graph, workdir, ...)	Generate operator profiler source code files for the given graph
`get_dtype_size`(dtype)	Returns size (in bytes) of the given dtype str.
`map_set`(map_name, key_name[, value_name, indent])	Generate a string setting a value in a map.
`multistream_additional_streams`()	Number of extra streams in multi-stream mode.
`multistream_max_mem_parallel_ops`()	Maximum number of parallel operators used in memory planning for simple multi-stream mode.
`multistream_mode`()	Multi-stream mode.
`set_value_from_map`(map_name, var_name[, indent])	Generate a string that sets a value to something stored in a map.
`split_simple_multistream_parallel_ops`(...)	Make sure that no more than max_parallel_ops operators are run in parallel.

class aitemplate.backend.codegen.AITDebugSettings(check_all_nan_and_inf: bool = False, check_all_outputs: bool = False, gen_profiler_annotation: bool = False, dump_ait_to_py: Optional[str] = None, gen_standalone: bool = False)[source]

This class contains the options for configuring debug settings Arguments: check_all_nan_and_inf : bool (default: False)

Whether or not to check this tensor is nan or inf during runtime.

check_all_outputsbool (default: False): Whether or not to print this tensor’s value out during runtime.
gen_profiler_annotationbool (default: False): Whether or not to add profile annotation primitives when doing codegen. (e.g. NVTX for CUDA and rocTX for AMD) Currently only supports NVIDIA.
dump_ait_to_py: str, optional: The path where the AIT graph is dumped into a .py file.
gen_standalonebool (default: False): Generate a standalone executable for the model

class aitemplate.backend.codegen.IntImm(value: int, name: Optional[str] = None)[source]

An IntImm represents a static dimension. IntVar (see above) and IntImm are used together to represent a Tensor’s shape.

Methods:

`pseudo_code`([with_shape])	Returns a string containing pseudo code of this object.
`value`()	Returns value of this IntImm.

pseudo_code(with_shape=False) → str[source]

Returns a string containing pseudo code of this object.

Parameters:: with_shape (bool) – Marks whether to include shape info in the returned pseudo code.
Returns:: Pseudo code.
Return type:: str

value() → int[source]: Returns value of this IntImm.

class aitemplate.backend.codegen.IntVar(values: List[int], name: Optional[str] = None, symbolic_value: Optional[Basic] = None)[source]

An IntVar represents a dynamic dimension. IntVar and IntImm (see below) are used together to represent a Tensor’s shape.

IntVar supports basic arithmetic operations, and returns the most conservative IntVar w.r.t. range of _attrs[“values”].

Methods:

`lower_bound`()	Returns lower bound of this dynamic dim.
`pseudo_code`([with_shape])	Returns a string containing pseudo code of this object.
`symbolic_value`()	Returns the symbolic value of this dynamic dim.
`upper_bound`()	Returns upper bound of this dynamic dim.

lower_bound() → int[source]: Returns lower bound of this dynamic dim.

pseudo_code(with_shape=False) → str[source]

Returns a string containing pseudo code of this object.

Parameters:: with_shape (bool) – Marks whether to include shape info in the returned pseudo code.
Returns:: Pseudo code.
Return type:: str

symbolic_value()[source]: Returns the symbolic value of this dynamic dim.

upper_bound() → int[source]: Returns upper bound of this dynamic dim.

class aitemplate.backend.codegen.IntVarTensor(int_var: IntVar, name: Optional[str] = None, src_ops: Optional[Set[Node]] = None, dst_ops: Optional[Set[Node]] = None, dtype: str = 'float16', is_input: bool = False, is_output: bool = False, value: Optional[Any] = None, is_view_of: Optional[Any] = None)[source]

A special tensor which represents an IntImm / IntVar. This Tensor can be used as inputs of some Operators (e.g. reshape, layernorm). An IntVarTensor instead of IntVar is used here to keep reference to src_ops and dst_ops.

Methods:

pseudo_code([with_shape])

Returns a string containing pseudo code of this object.

pseudo_code(with_shape=True) → str[source]

Returns a string containing pseudo code of this object.

Parameters:: with_shape (bool) – Marks whether to include shape info in the returned pseudo code.
Returns:: Pseudo code.
Return type:: str

class aitemplate.backend.codegen.Operator[source]

Base class for all operators

Methods:

`gen_function`()	Generates function source code string.
`gen_profiler`([workdir, ...])	Generates source files for profiling purpose.
`profile`([workdir, devices, ...])	Selects the fastest kernel configurations.
`pseudo_code`([with_shape])	Returns a string containing pseudo code of this object.
`replace_input_tensor`(old_tensor, new_tensor)	Replaces old_tensors in self._attrs["inputs"] with new_tensor.

gen_function() → str[source]

Generates function source code string.

Returns:: str
Return type:: a string which contains C++ function implementation source code.
Raises:: NotImplementedError –

gen_profiler(workdir: Optional[str] = None, dynamic_profiling_strategy=None) → None[source]

Generates source files for profiling purpose.

Parameters:

workdir (str, optional) – The directory to generate source files.
dynamic_profiling_strategy (DynamicProfileStrategy, optional) – A dynamic profiling strategy, used to filter generated profiles at compile time. See also: profile()

profile(workdir='./', devices=None, dynamic_profiling_strategy=DynamicProfileStrategy.MAX) → None[source]

Selects the fastest kernel configurations.

Parameters:

workdir (str, optional) – The directory which contains source files, by default “./”
devices (list, optional) – A list of device ids which can be used for profiling.
dynamic_profiling_strategy (DynamicProfileStrategy, optional) – Profiling strategy used when there are dynamic dims. By default, MAX is used, i.e. to profile a dynamic range, an upper bound will be used.

pseudo_code(with_shape=True)[source]

Returns a string containing pseudo code of this object.

Parameters:: with_shape (bool) – Marks whether to include shape info in the returned pseudo code.
Returns:: Pseudo code.
Return type:: str

replace_input_tensor(old_tensor, new_tensor) → None[source]

Replaces old_tensors in self._attrs[“inputs”] with new_tensor.

Parameters:

old_tensor (Tensor) – The old tensor to be replaced.
new_tensor (Tensor) – The new tensor.

Return type:

None.

class aitemplate.backend.codegen.Path(*args, **kwargs)[source]

PurePath subclass that can make system calls.

Path represents a filesystem path but unlike PurePath, also offers methods to do system calls on path objects. Depending on your system, instantiating a Path will return either a PosixPath or a WindowsPath object. You can also instantiate a PosixPath or WindowsPath directly, but cannot instantiate a WindowsPath on a POSIX system or vice versa.

Methods:

`absolute`()	Return an absolute version of this path.
`chmod`(mode)	Change the permissions of the path, like os.chmod().
`cwd`()	Return a new path pointing to the current working directory (as returned by os.getcwd()).
`exists`()	Whether this path exists.
`expanduser`()	Return a new path with expanded ~ and ~user constructs (as returned by os.path.expanduser)
`glob`(pattern)	Iterate over this subtree and yield all existing files (of any kind, including directories) matching the given relative pattern.
`group`()	Return the group name of the file gid.
`home`()	Return a new path pointing to the user's home directory (as returned by os.path.expanduser('~')).
`is_block_device`()	Whether this path is a block device.
`is_char_device`()	Whether this path is a character device.
`is_dir`()	Whether this path is a directory.
`is_fifo`()	Whether this path is a FIFO.
`is_file`()	Whether this path is a regular file (also True for symlinks pointing to regular files).
`is_mount`()	Check if this path is a POSIX mount point
`is_socket`()	Whether this path is a socket.
`is_symlink`()	Whether this path is a symbolic link.
`iterdir`()	Iterate over the files in this directory.
`lchmod`(mode)	Like chmod(), except if the path points to a symlink, the symlink's permissions are changed, rather than its target's.
`link_to`(target)	Make the target path a hard link pointing to this path.
`lstat`()	Like stat(), except if the path points to a symlink, the symlink's status information is returned, rather than its target's.
`mkdir`([mode, parents, exist_ok])	Create a new directory at this given path.
`open`([mode, buffering, encoding, errors, ...])	Open the file pointed by this path and return a file object, as the built-in open() function does.
`owner`()	Return the login name of the file owner.
`read_bytes`()	Open the file in bytes mode, read it, and close the file.
`read_text`([encoding, errors])	Open the file in text mode, read it, and close the file.
`readlink`()	Return the path to which the symbolic link points.
`rename`(target)	Rename this path to the target path.
`replace`(target)	Rename this path to the target path, overwriting if that path exists.
`resolve`([strict])	Make the path absolute, resolving all symlinks on the way and also normalizing it (for example turning slashes into backslashes under Windows).
`rglob`(pattern)	Recursively yield all existing files (of any kind, including directories) matching the given relative pattern, anywhere in this subtree.
`rmdir`()	Remove this directory.
`samefile`(other_path)	Return whether other_path is the same or not as this file (as returned by os.path.samefile()).
`stat`()	Return the result of the stat() system call on this path, like os.stat() does.
`symlink_to`(target[, target_is_directory])	Make this path a symlink pointing to the target path.
`touch`([mode, exist_ok])	Create this file with the given access mode, if it doesn't exist.
`unlink`([missing_ok])	Remove this file or link.
`write_bytes`(data)	Open the file in bytes mode, write to it, and close the file.
`write_text`(data[, encoding, errors])	Open the file in text mode, write to it, and close the file.

absolute()[source]

Return an absolute version of this path. This function works even if the path doesn’t point to anything.

No normalization is done, i.e. all ‘.’ and ‘..’ will be kept along. Use resolve() to get the canonical path to a file.

chmod(mode)[source]: Change the permissions of the path, like os.chmod().

classmethod cwd()[source]: Return a new path pointing to the current working directory (as returned by os.getcwd()).

exists()[source]: Whether this path exists.

expanduser()[source]: Return a new path with expanded ~ and ~user constructs (as returned by os.path.expanduser)

glob(pattern)[source]: Iterate over this subtree and yield all existing files (of any kind, including directories) matching the given relative pattern.

group()[source]: Return the group name of the file gid.

classmethod home()[source]: Return a new path pointing to the user’s home directory (as returned by os.path.expanduser(‘~’)).

is_block_device()[source]: Whether this path is a block device.

is_char_device()[source]: Whether this path is a character device.

is_dir()[source]: Whether this path is a directory.

is_fifo()[source]: Whether this path is a FIFO.

is_file()[source]: Whether this path is a regular file (also True for symlinks pointing to regular files).

is_mount()[source]: Check if this path is a POSIX mount point

is_socket()[source]: Whether this path is a socket.

is_symlink()[source]: Whether this path is a symbolic link.

iterdir()[source]: Iterate over the files in this directory. Does not yield any result for the special paths ‘.’ and ‘..’.

lchmod(mode)[source]: Like chmod(), except if the path points to a symlink, the symlink’s permissions are changed, rather than its target’s.

link_to(target)[source]

Make the target path a hard link pointing to this path.

Note this function does not make this path a hard link to target, despite the implication of the function and argument names. The order of arguments (target, link) is the reverse of Path.symlink_to, but matches that of os.link.

lstat()[source]: Like stat(), except if the path points to a symlink, the symlink’s status information is returned, rather than its target’s.

mkdir(mode=511, parents=False, exist_ok=False)[source]: Create a new directory at this given path.

open(mode='r', buffering=-1, encoding=None, errors=None, newline=None)[source]: Open the file pointed by this path and return a file object, as the built-in open() function does.

owner()[source]: Return the login name of the file owner.

read_bytes()[source]: Open the file in bytes mode, read it, and close the file.

read_text(encoding=None, errors=None)[source]: Open the file in text mode, read it, and close the file.

readlink()[source]: Return the path to which the symbolic link points.

rename(target)[source]

Rename this path to the target path.

The target path may be absolute or relative. Relative paths are interpreted relative to the current working directory, not the directory of the Path object.

Returns the new Path instance pointing to the target path.

replace(target)[source]

Rename this path to the target path, overwriting if that path exists.

The target path may be absolute or relative. Relative paths are interpreted relative to the current working directory, not the directory of the Path object.

Returns the new Path instance pointing to the target path.

resolve(strict=False)[source]: Make the path absolute, resolving all symlinks on the way and also normalizing it (for example turning slashes into backslashes under Windows).

rglob(pattern)[source]: Recursively yield all existing files (of any kind, including directories) matching the given relative pattern, anywhere in this subtree.

rmdir()[source]: Remove this directory. The directory must be empty.

samefile(other_path)[source]: Return whether other_path is the same or not as this file (as returned by os.path.samefile()).

stat()[source]: Return the result of the stat() system call on this path, like os.stat() does.

symlink_to(target, target_is_directory=False)[source]: Make this path a symlink pointing to the target path. Note the order of arguments (link, target) is the reverse of os.symlink.

touch(mode=438, exist_ok=True)[source]: Create this file with the given access mode, if it doesn’t exist.

unlink(missing_ok=False)[source]: Remove this file or link. If the path is a directory, use rmdir() instead.

write_bytes(data)[source]: Open the file in bytes mode, write to it, and close the file.

write_text(data, encoding=None, errors=None)[source]: Open the file in text mode, write to it, and close the file.

class aitemplate.backend.codegen.TensorAccessor(original_tensor: Tensor)[source]

A tensor accessor which manages how to access a Tensor. Must always be used together with a Tensor.

Methods:

`gen_stride_str`(dim, dim_names)	Returns the str to calculate the stride of a certain dim.
`is_rightmost_dim_contiguous`(cat_dim)	Check if the rightmost diminsion would be contiguous after concatenation along a given cat_dim.
`stride`(dim)	Returns stride (a number) for the given dim.
`try_get_stride_strs`(dim[, dim_names])	Tries to return a list of stride strs for the given dim.
`update_base_tensor`(new_tensor, stride_dim, ...)	Updates the TensorAccessor with a new base tensor.
`update_base_tensor_shape`(new_tensor)	Updates the TensorAccessor's actual shape.

gen_stride_str(dim: int, dim_names: List[str]) → str[source]

Returns the str to calculate the stride of a certain dim. This is a temporary solution to get around dynamic shapes problems with tensor_accessor. dim_names is a list of str, such as [“B”, “M”, “K”] for the first input to bmm_rcr.

Note that both dim and dim_names are based on self.original_shapes.

Throws RuntimeError if such a stride number cannot be computed.

is_rightmost_dim_contiguous(cat_dim: int) → bool[source]: Check if the rightmost diminsion would be contiguous after concatenation along a given cat_dim. This is a necessary condition for GEMM+concat fusion, since GEMM doesn’t support discontinuous rightmost dimension for row-major outout. Rightmost diminsion is contiguous iff the concat dimension corresponds to one of the dimensions in the original shape and it’s the first dimension in its group of actual dimensions.

stride(dim: int) → int[source]

Returns stride (a number) for the given dim. Note that dim is based on self.original_shapes. This API assumes that all dims after dim are static (IntImm).

Throws RuntimeError if such a stride number cannot be computed.

try_get_stride_strs(dim: int, dim_names: Optional[List[str]] = None) → Optional[List[str]][source]

Tries to return a list of stride strs for the given dim. Note that both dim and dim_names are based on self.original_shapes.

Returns None if this function fails to calculate stride strs.

update_base_tensor(new_tensor: Tensor, stride_dim: int, stride_dim_offset: int) → None[source]: Updates the TensorAccessor with a new base tensor. This API is useful to handle ops with a stride dim, e.g. split, cat. It can also be used by slice if slice is only operated on one dim.

update_base_tensor_shape(new_tensor: Tensor) → None[source]: Updates the TensorAccessor’s actual shape. This API is useful to handle view ops, e.g. reshape, flatten, etc.

class aitemplate.backend.codegen.Workspace(shared_size: int, unique_size: int)[source]

aitemplate.backend.codegen.check_not_null(tensor: Tensor, tensor_idx: Optional[int] = None, skip_if_lower_bound_is_zero: bool = False) → str[source]

Generate a nullptr check to be used by pointer initialization code.

If skip_if_lower_bound_is_zero == True, no code will be generated when the Tensor has at least one dynamic dim with a lower bound of zero. This is most useful for outputs; we put the nullptr checks at the start of the inference, but we won’t know output shapes until after Run() finishes. We therefore just relax the check for these outputs - only allow them to be null if their lower bound is zero, otherwise never allow them to be null.

class aitemplate.backend.codegen.defaultdict

defaultdict(default_factory=None, /, […]) –> dict with default factory

The default factory is called without arguments to produce a new value when a key is not present, in __getitem__ only. A defaultdict compares equal to a dict with the same items. All remaining arguments are treated the same as if they were passed to the dict constructor, including keyword arguments.

Methods:

copy()

Attributes:

default_factory

Factory for default value called by __missing__().

copy() → a shallow copy of D.

default_factory: Factory for default value called by __missing__().

aitemplate.backend.codegen.dtype_to_enumerator(dtype: str) → str[source]

Returns the string representation of the AITemplateDtype enum (defined in model_interface.h) for the given dtype str.

Parameters:: dtype (str) – A data type string.
Returns:: the AITemplateDtype enum string representation.
Return type:: str

aitemplate.backend.codegen.gen_function_src(sorted_graph: List[Tensor], workdir: str, model_name: str = '') → List[Tuple[str, str]][source]

Generate functions source code files for the given graph

Parameters:

sorted_graph (List[Tensor]) – The network after running toposort transformation
workdir (str) – Target directory for generated C++ source code files
model_name (str, optional) – Sub working directory in the workdir for the given model, by default “”

Returns:

List of tuple (source file path, object file path)

Return type:

List[Tuple[str, str]]

aitemplate.backend.codegen.gen_library_src(sorted_graph: List[Tensor], max_blob_size: int, max_constant_blob_size: int, workspace: Workspace, workdir: str, output_tensors: List[Tensor], model_name: str = '', debug_settings: AITDebugSettings = AITDebugSettings(check_all_nan_and_inf=False, check_all_outputs=False, gen_profiler_annotation=False, dump_ait_to_py=None, gen_standalone=False), additional_unbound_constants: Optional[List[Tensor]] = None) → List[Tuple[str, str]][source]

Generate model driver source code files for the given graph

Parameters:

sorted_graph (List[Tensor]) – The network after running toposort transformation
max_blob_size (int) – Total memory for input/output tensor and intermediate results, calculated by memory planning transformation
workspace (Workspace) – Workspace sizes, computed by memory planning
workdir (str) – Target directory for generated C++ source code files
model_name (str, optional) – Sub working directory in the workdir for the given model, by default “”
debug_settings (AITDebugSettings) – specify debug settings such as where to dump AITemplate model Python file, etc.

Returns:

List of tuple (source file path, object file path)

Return type:

List[Tuple[str, str]]

aitemplate.backend.codegen.gen_profiler(sorted_graph: List[Tensor], workdir: str, dynamic_profiling_strategy)[source]

Generate operator profiler source code files for the given graph

Parameters:

sorted_graph (List[Tensor]) – The network after running toposort transformation
workdir (str) – Target directory for generated C++ source code files
dynamic_profiling_strategy (DynamicProfileStrategy, optional) – A dynamic profiling strategy, used to filter generated profiles at compile time. Pass-through to gen_profiler kernels of nodes in the graph. See also: profile()

aitemplate.backend.codegen.get_dtype_size(dtype: str) → int[source]

Returns size (in bytes) of the given dtype str.

Parameters:: dtype (str) – A data type string.
Returns:: Size (in bytes) of this dtype.
Return type:: int

aitemplate.backend.codegen.map_set(map_name: str, key_name: str, value_name: Optional[str] = None, indent: str = ' ') → str[source]

Generate a string setting a value in a map.

If value name is given, sets map_name[“key_name”] = value_name. Else, sets map_name[“key_name”] = key_name. Special maps like dim_map may make additional modificiations to the LHS of this expression.

Parameters:

map_name (str) – The map to use
key_name (str) – The key to set. Will be put into quotes.
value_name (Optional[str]) – If set, force map_name[“key_name”] = value_name
indent (str) – For formatting

Returns:

The formatted map set statement.

Return type:

str

aitemplate.backend.codegen.multistream_additional_streams() → int[source]

Number of extra streams in multi-stream mode.

This option is independent from AIT_MULTISTREAM_MAX_MEM_PARALLEL_OPS.

For example, say, there are 100 ops that can be run in parallel.

Example 1: AIT_MULTISTREAM_EXTRA_STREAMS=4 and AIT_MULTISTREAM_MAX_MEM_PARALLEL_OPS=100. In this case 5 streams will be used (1 base and 4 extra), every stream gets 20 operators and no inter-stream barriers are used. Memory planning is done for 100 parallel ops.

Example 2: AIT_MULTISTREAM_EXTRA_STREAMS=4 and AIT_MULTISTREAM_MAX_MEM_PARALLEL_OPS=5. In this case 5 streams will be used (1 base and 4 extra), there will be 20 waves separated by inter-stream barriers, every stream gets 1 operator for every wave. Memory planning is done for 20 waves of 5 parallel ops each.

aitemplate.backend.codegen.multistream_max_mem_parallel_ops() → int[source]

Maximum number of parallel operators used in memory planning for simple multi-stream mode. Larger value imply higher level of possible parallelism, but higher memory allocations.

This option is independent from AIT_MULTISTREAM_EXTRA_STREAMS.

For example, say, there are 100 ops that can be run in parallel.

Example 1: AIT_MULTISTREAM_EXTRA_STREAMS=4 and AIT_MULTISTREAM_MAX_MEM_PARALLEL_OPS=100. In this case 5 streams will be used (1 base and 4 extra), every stream gets 20 operators and no inter-stream barriers are used. Memory planning is done for 100 parallel ops.

Example 2: AIT_MULTISTREAM_EXTRA_STREAMS=4 and AIT_MULTISTREAM_MAX_MEM_PARALLEL_OPS=5. In this case 5 streams will be used (1 base and 4 extra), there will be 20 waves separated by inter-stream barriers, every stream gets 1 operator for every wave. Memory planning is done for 20 waves of 5 parallel ops each.

aitemplate.backend.codegen.multistream_mode() → int[source]: Multi-stream mode. 0 - no multistream. 1 - simple multistream. Default: 0.

aitemplate.backend.codegen.set_value_from_map(map_name: Any, var_name: Any, indent: str = ' ') → str[source]

Generate a string that sets a value to something stored in a map.

Parameters:

map_name (str) – The map to use
var_name (str) – The var_name, used as the name of the value and the key.
indent (str) – For formatting

Returns:

The formatted statement.

Return type:

str

aitemplate.backend.codegen.split_simple_multistream_parallel_ops(ops_by_order, max_parallel_ops: int)[source]

Make sure that no more than max_parallel_ops operators are run in parallel.

Say, on the first step op1, op2 and op3 can be executed in parallel. On the second one, it is op4 and op5. On the third one it is op6, op7, op8, op9. Then, ops_by_order is something like

{ 1: [op1, op2, op3], 2: [op4, op5], 3: [op6, op7, op8, op9] }

Given max_parallel_ops=2, the output will be:: [[op1, op2], [op3], [op4, op5], [op6, op7], [op8, op9]]

Parameters:

ops_by_order (Dict[int, List[Operator]]) – A dictionary, its keys represent the execution order and its values represent operators that are executed in parallel.
max_parallel_ops (int) – Number of operators that are allowed to be run in parallel
Output (List[List[Operator]]) – transformed sequence of operators to execute.

aitemplate.backend.profiler_cache

SQLite backend for conv/gemm profiling cache

Classes:

`CacheMode`(value)	Enum for cache mode
`ProfileCacheDB`(target[, path, uri, port])	Local SQLite profile cache database.

class aitemplate.backend.profiler_cache.CacheMode(value)[source]

Enum for cache mode

Profiling cache can be stored locally or remotely. For LOCAL mode, the cache is stored in a SQLite database. For REMOTE mode, the profiled results can be queried with RESTFul API.

REMOTE mode is not implemented yet.

class aitemplate.backend.profiler_cache.ProfileCacheDB(target: str, path: Optional[str] = None, uri: Optional[str] = None, port: Optional[str] = None)[source]

Local SQLite profile cache database.

Methods:

`insert_conv`(args)	a function to insert conv op epilogue into cache, here we use the same sql table for conv and gemm
`insert_conv3d`(args)	a function to insert conv op epilogue into cache, here we use the same sql table for conv and gemm
`insert_gemm`(args)	a function to insert gemm op epilogue into cache
`insert_normalization`(args)	a function to insert conv op epilogue into cache, here we use the same sql table for conv and gemm
`query_conv`(args)	a function to query conv op epilogue from cache, here we use the same sql table for conv and gemm
`query_conv3d`(args)	a function to query conv op epilogue from cache, here we use the same sql table for conv and gemm
`query_gemm`(args)	a function to query gemm op epilogue from cache
`query_normalization`(args)	a function to query normalization op epilogue from cache

insert_conv(args: Dict[str, Any]) → None[source]

a function to insert conv op epilogue into cache, here we use the same sql table for conv and gemm

Parameters:: args (Dict) – Conv Record Entry

insert_conv3d(args: Dict[str, Any]) → None[source]

a function to insert conv op epilogue into cache, here we use the same sql table for conv and gemm

Parameters:: args (Dict) – Conv Record Entry

insert_gemm(args: Dict[str, Any]) → None[source]

a function to insert gemm op epilogue into cache

Parameters:: args (Dict) – Gemm Record Entry

insert_normalization(args: Dict[str, Any]) → None[source]

a function to insert conv op epilogue into cache, here we use the same sql table for conv and gemm

Parameters:: args (Dict) – Conv Record Entry

query_conv(args: Dict[str, Any]) → Tuple[str, int][source]

a function to query conv op epilogue from cache, here we use the same sql table for conv and gemm

Parameters:: args (Dict) – Conv query entry
Returns:: profiling results
Return type:: Tuple

query_conv3d(args: Dict[str, Any]) → Tuple[str, int][source]

a function to query conv op epilogue from cache, here we use the same sql table for conv and gemm

Parameters:: args (Dict) – Conv3d query entry
Returns:: profiling results
Return type:: Tuple

query_gemm(args: Dict[str, Any]) → Tuple[str, int][source]

a function to query gemm op epilogue from cache

Parameters:: args (Dict) – gemm query entry
Returns:: profiling results
Return type:: Tuple

query_normalization(args: Dict[str, Any]) → Tuple[str, int][source]

a function to query normalization op epilogue from cache

Parameters:: args (Dict) – Conv query entry
Returns:: profiling results
Return type:: Tuple

aitemplate.backend.profiler_runner

A subprocess based multiple GPUs runner for auto-tuning

Classes:

`ProfileResult`(op_config, duration, workspace)	Object to store profiling result
`ProfilerRunner`(devices, postprocessing_delegate)	Another parallel runner to execute profilers on multiple GPUs in parallel It uses a process pool for implementation, avoiding process creation overhead The size of the process pool is equal to the number of provided GPUs, so ~ideally~ each process should execute a profiler on its dedicated GPU.
`Queue`([maxsize])	Create a queue object with a given maximum size.
`Runner`(devs, op_name[, timeout])	A parallel runner for multiple GPUs profiling tasks.

Functions:

`detect_target`(**kwargs)	Detect GPU target based on nvidia-smi and rocminfo
`process_return`(task)	Generate profile result from a profiling task
`process_task`(task)	Extract kernel execution time and workspace from task process outputs
`sleep`(seconds)	Delay execution for a given number of seconds.

class aitemplate.backend.profiler_runner.ProfileResult(op_config, duration, workspace)

Object to store profiling result

Attributes:

`duration`	Alias for field number 1
`op_config`	Alias for field number 0
`workspace`	Alias for field number 2

duration: Alias for field number 1

op_config: Alias for field number 0

workspace: Alias for field number 2

class aitemplate.backend.profiler_runner.ProfilerRunner(devices: List[str], postprocessing_delegate, timeout: int = 500)[source]

Another parallel runner to execute profilers on multiple GPUs in parallel It uses a process pool for implementation, avoiding process creation overhead The size of the process pool is equal to the number of provided GPUs, so ~ideally~ each process should execute a profiler on its dedicated GPU. This property hasn’t been properly verified yet, however, the results are empirically better compared to the previous runner.

Methods:

`join`()	Wait for subprocesses completion or timeout; postprocess the profiler results with delegate(s)
`push`(cmds, process_result_callback)	Schedule the profiler for execution in a separate process, Call the callback after subprocess completion

join()[source]: Wait for subprocesses completion or timeout; postprocess the profiler results with delegate(s)

push(cmds: List[str], process_result_callback: Callable)[source]

Schedule the profiler for execution in a separate process, Call the callback after subprocess completion

Parameters:

cmds (List[str]) – argv for the launched profiler
process_result_callback (Callable) – Called after subprocess completion in the main process (but possibly not main thread). Currently used to aggregate profiler results, so the callable takes result and postprocessing_delegate parameters It is also used to propagate the profiler launch context to the aggregation point, namely, split_k value for the gemm profilers

class aitemplate.backend.profiler_runner.Queue(maxsize=0)[source]

Create a queue object with a given maximum size.

If maxsize is <= 0, the queue size is infinite.

Methods:

`empty`()	Return True if the queue is empty, False otherwise (not reliable!).
`full`()	Return True if the queue is full, False otherwise (not reliable!).
`get`([block, timeout])	Remove and return an item from the queue.
`get_nowait`()	Remove and return an item from the queue without blocking.
`join`()	Blocks until all items in the Queue have been gotten and processed.
`put`(item[, block, timeout])	Put an item into the queue.
`put_nowait`(item)	Put an item into the queue without blocking.
`qsize`()	Return the approximate size of the queue (not reliable!).
`task_done`()	Indicate that a formerly enqueued task is complete.

empty()[source]

Return True if the queue is empty, False otherwise (not reliable!).

This method is likely to be removed at some point. Use qsize() == 0 as a direct substitute, but be aware that either approach risks a race condition where a queue can grow before the result of empty() or qsize() can be used.

To create code that needs to wait for all queued tasks to be completed, the preferred technique is to use the join() method.

full()[source]

Return True if the queue is full, False otherwise (not reliable!).

This method is likely to be removed at some point. Use qsize() >= n as a direct substitute, but be aware that either approach risks a race condition where a queue can shrink before the result of full() or qsize() can be used.

get(block=True, timeout=None)[source]

Remove and return an item from the queue.

If optional args ‘block’ is true and ‘timeout’ is None (the default), block if necessary until an item is available. If ‘timeout’ is a non-negative number, it blocks at most ‘timeout’ seconds and raises the Empty exception if no item was available within that time. Otherwise (‘block’ is false), return an item if one is immediately available, else raise the Empty exception (‘timeout’ is ignored in that case).

get_nowait()[source]

Remove and return an item from the queue without blocking.

Only get an item if one is immediately available. Otherwise raise the Empty exception.

join()[source]

Blocks until all items in the Queue have been gotten and processed.

The count of unfinished tasks goes up whenever an item is added to the queue. The count goes down whenever a consumer thread calls task_done() to indicate the item was retrieved and all work on it is complete.

When the count of unfinished tasks drops to zero, join() unblocks.

put(item, block=True, timeout=None)[source]

Put an item into the queue.

If optional args ‘block’ is true and ‘timeout’ is None (the default), block if necessary until a free slot is available. If ‘timeout’ is a non-negative number, it blocks at most ‘timeout’ seconds and raises the Full exception if no free slot was available within that time. Otherwise (‘block’ is false), put an item on the queue if a free slot is immediately available, else raise the Full exception (‘timeout’ is ignored in that case).

put_nowait(item)[source]

Put an item into the queue without blocking.

Only enqueue the item if a free slot is immediately available. Otherwise raise the Full exception.

qsize()[source]: Return the approximate size of the queue (not reliable!).

task_done()[source]

Indicate that a formerly enqueued task is complete.

Used by Queue consumer threads. For each get() used to fetch a task, a subsequent call to task_done() tells the queue that the processing on the task is complete.

If a join() is currently blocking, it will resume when all items have been processed (meaning that a task_done() call was received for every item that had been put() into the queue).

Raises a ValueError if called more times than there were items placed in the queue.

class aitemplate.backend.profiler_runner.Runner(devs: List[int], op_name: str, timeout: int = 30)[source]

A parallel runner for multiple GPUs profiling tasks. Runner is inherited from BaseRunner.

Methods:

`pull`()	Pull results from all profiling tasks assigned to runner.
`push`(idx, cmd[, return_ops])	Push a new profiling task into runner's queue

pull()[source]

Pull results from all profiling tasks assigned to runner.

Returns:: Profiling results of all successful tasks.
Return type:: List[Tuple[Union[int, str], ProfileResult]]

push(idx: Union[int, str], cmd: str, return_ops: Optional[List[str]] = None)[source]

Push a new profiling task into runner’s queue

Parameters:

idx (Union[int, str]) – Profiling task id (usually is algorithm id or name)
cmd (str) – Bash command to execute the profiling task
return_ops (List[str]) – Names of the ops to return the profiling results for. If specified, instead of a single (best) ProfileResult instance, a list with the ProfileResults for each op in the return_ops is returned from pull.

aitemplate.backend.profiler_runner.detect_target(**kwargs)[source]

Detect GPU target based on nvidia-smi and rocminfo

Returns:: CUDA or ROCM target
Return type:: Target

aitemplate.backend.profiler_runner.process_return(task: Task) → Tuple[Union[int, str], ProfileResult][source]

Generate profile result from a profiling task

Parameters:: task (Task) – A profiling task
Returns:: Tuple of task idx (usually the algorithm name/id) and profiling result
Return type:: Tuple[Union[int, str], ProfileResult]

aitemplate.backend.profiler_runner.process_task(task: Task) → None[source]

Extract kernel execution time and workspace from task process outputs

Parameters:: task (Task) – A profiling task

aitemplate.backend.profiler_runner.sleep(seconds): Delay execution for a given number of seconds. The argument may be a floating point number for subsecond precision.

aitemplate.backend.registry

Registry is a design pattern to map a string key to a function. The registry decorator is mainly used for backend functions.

Functions:

`get`(func_name)	Get a function from registry by using a key
`reg`(func_name[, func])	Register a new function

aitemplate.backend.registry.get(func_name: str) → Callable[source]

Get a function from registry by using a key

Example

func = registry.get("func_name")
func(args)

Parameters:: func_name (str) – Key for function in registry
Returns:: Function associated with the key
Return type:: Callable
Raises:: RuntimeError – If key is not founded in registry, will raise a RuntimeError

aitemplate.backend.registry.reg(func_name: str, func: Optional[Callable] = None) → Callable[source]

Register a new function

Example

@registry.reg("func_name")
def func(args):
    ....

Parameters:

func_name (str) – Registry key for the function
func (Callable, optional) – Function to be registered, by default None

Returns:

Function in registry

Return type:

Callable

Raises:

RuntimeError – If same key is founded in registry, will raise a RuntimeError

aitemplate.backend.target

Target object for AITemplate.

Functions:

`CUDA`([template_path, arch])	Create a CUDA target.
`ROCM`([template_path, arch])	Create a ROCM target.

Classes:

`IntEnum`(value)	Enum where members are also (and must be) ints
`ProfileCacheDB`(target[, path, uri, port])	Local SQLite profile cache database.
`TargetType`(value)	Enum for target type.

aitemplate.backend.target.CUDA(template_path: str = '/home/runner/work/AITemplate/AITemplate/3rdparty/cutlass', arch: str = '80', **kwargs)[source]: Create a CUDA target.

class aitemplate.backend.target.IntEnum(value)[source]: Enum where members are also (and must be) ints

class aitemplate.backend.target.ProfileCacheDB(target: str, path: Optional[str] = None, uri: Optional[str] = None, port: Optional[str] = None)[source]

Local SQLite profile cache database.

Methods:

`insert_conv`(args)	a function to insert conv op epilogue into cache, here we use the same sql table for conv and gemm
`insert_conv3d`(args)	a function to insert conv op epilogue into cache, here we use the same sql table for conv and gemm
`insert_gemm`(args)	a function to insert gemm op epilogue into cache
`insert_normalization`(args)	a function to insert conv op epilogue into cache, here we use the same sql table for conv and gemm
`query_conv`(args)	a function to query conv op epilogue from cache, here we use the same sql table for conv and gemm
`query_conv3d`(args)	a function to query conv op epilogue from cache, here we use the same sql table for conv and gemm
`query_gemm`(args)	a function to query gemm op epilogue from cache
`query_normalization`(args)	a function to query normalization op epilogue from cache

insert_conv(args: Dict[str, Any]) → None[source]

a function to insert conv op epilogue into cache, here we use the same sql table for conv and gemm

Parameters:: args (Dict) – Conv Record Entry

insert_conv3d(args: Dict[str, Any]) → None[source]

a function to insert conv op epilogue into cache, here we use the same sql table for conv and gemm

Parameters:: args (Dict) – Conv Record Entry

insert_gemm(args: Dict[str, Any]) → None[source]

a function to insert gemm op epilogue into cache

Parameters:: args (Dict) – Gemm Record Entry

insert_normalization(args: Dict[str, Any]) → None[source]

a function to insert conv op epilogue into cache, here we use the same sql table for conv and gemm

Parameters:: args (Dict) – Conv Record Entry

query_conv(args: Dict[str, Any]) → Tuple[str, int][source]

a function to query conv op epilogue from cache, here we use the same sql table for conv and gemm

Parameters:: args (Dict) – Conv query entry
Returns:: profiling results
Return type:: Tuple

query_conv3d(args: Dict[str, Any]) → Tuple[str, int][source]

a function to query conv op epilogue from cache, here we use the same sql table for conv and gemm

Parameters:: args (Dict) – Conv3d query entry
Returns:: profiling results
Return type:: Tuple

query_gemm(args: Dict[str, Any]) → Tuple[str, int][source]

a function to query gemm op epilogue from cache

Parameters:: args (Dict) – gemm query entry
Returns:: profiling results
Return type:: Tuple

query_normalization(args: Dict[str, Any]) → Tuple[str, int][source]

a function to query normalization op epilogue from cache

Parameters:: args (Dict) – Conv query entry
Returns:: profiling results
Return type:: Tuple

aitemplate.backend.target.ROCM(template_path: str = '/home/runner/work/AITemplate/AITemplate/3rdparty/composable_kernel', arch: str = 'gfx908', **kwargs)[source]: Create a ROCM target.

class aitemplate.backend.target.TargetType(value)[source]: Enum for target type.