aitemplate.backend
aitemplate.backend.task_runner
This module is a general-purpose subprocess-based task runner.
Classes:
|
Genetic subprocess task runner for different purposes |
|
Device Farm is a stateful object to schedule and assigns a task to the available devices. |
|
Task is an object containing a bash command, process for the command, and output of the process. |
- class aitemplate.backend.task_runner.BaseRunner(devs: List[int], tag: str, timeout: int = 10)[source]
Genetic subprocess task runner for different purposes
Methods:
join
()Waiting until all tasks are finished.
pull
(ftask_proc, fret_proc)Pull results from all tasks executed on the runner.
push
(idx, cmd)Push a task into runner
reset
()Reset runner, clear task queue and device states
- pull(ftask_proc: Callable, fret_proc: Callable) List [source]
Pull results from all tasks executed on the runner.
- Parameters:
ftask_proc (Callable) – Function to process each task’s output
fret_proc (Callable) – Function to extract returns from task
- Returns:
Aggregated returns from all tasks
- Return type:
List
- class aitemplate.backend.task_runner.DeviceFarm(devs: List[int])[source]
Device Farm is a stateful object to schedule and assigns a task to the available devices. Devices are logical devices, can be CPUs or GPUs.
Methods:
Return the next idle (available) device id
Reset all devices to be idle
reset_dev_state
(dev_id)Rest the device id state to idle
- class aitemplate.backend.task_runner.Task(idx: Union[int, str], cmd: str, name: str, **kwargs)[source]
Task is an object containing a bash command, process for the command, and output of the process.
Methods:
Return the assigned device id for the task
Check whether the task is failed
Check whether the task is finished
Check whether the task process is still running.
Check whether the task is timeout
poll
(current_time, timeout)Given the current time, check whether the task is running, finished or timed out.
pull
(fproc)Pull stdout & stderr from process, process stdout & stderr with fproc, and set the output for the task.
- assigned_dev() int [source]
Return the assigned device id for the task
- Returns:
Assigned device id
- Return type:
int
- is_failed() bool [source]
Check whether the task is failed
- Returns:
Whether the task is failed
- Return type:
bool
- is_finished() bool [source]
Check whether the task is finished
- Returns:
Whether the task is finished
- Return type:
bool
- is_running() bool [source]
Check whether the task process is still running.
- Returns:
Whether the task process is still running
- Return type:
bool
- is_timeout() bool [source]
Check whether the task is timeout
- Returns:
Whether the task is timeout
- Return type:
bool
aitemplate.backend.builder
Builder is a module to compile generated source code files into binary objects.
Classes:
|
This class contains the options for configuring debug settings Arguments: check_all_nan_and_inf : bool (default: False) Whether or not to check this tensor is nan or inf during runtime. check_all_outputs : bool (default: False) Whether or not to print this tensor's value out during runtime. gen_profiler_annotation : bool (default: False) Whether or not to add profile annotation primitives when doing codegen. (e.g. NVTX for CUDA and rocTX for AMD) Currently only supports NVIDIA. dump_ait_to_py: str, optional The path where the AIT graph is dumped into a .py file. gen_standalone : bool (default: False) Generate a standalone executable for the model. |
|
Builder is a module to compile generated source code files into binary objects. |
|
PurePath subclass that can make system calls. |
|
A parallel runner for compiling tasks. |
|
Task is an object containing a bash command, process for the command, and output of the process. |
Functions:
When enabled, compiles the model via invoking CMake rather than invoking make directly. |
|
|
This function process the task. |
|
This function extracts stdout and stderr from a finished task. |
|
Returns a sha1 hash object; optionally initialized with a string |
|
Hash all binary input files, so we don't have to keep them ( Usecase: constants.obj / constants.bin ) |
- class aitemplate.backend.builder.AITDebugSettings(check_all_nan_and_inf: bool = False, check_all_outputs: bool = False, gen_profiler_annotation: bool = False, dump_ait_to_py: Optional[str] = None, gen_standalone: bool = False)[source]
This class contains the options for configuring debug settings Arguments: check_all_nan_and_inf : bool (default: False)
Whether or not to check this tensor is nan or inf during runtime.
- check_all_outputsbool (default: False)
Whether or not to print this tensor’s value out during runtime.
- gen_profiler_annotationbool (default: False)
Whether or not to add profile annotation primitives when doing codegen. (e.g. NVTX for CUDA and rocTX for AMD) Currently only supports NVIDIA.
- dump_ait_to_py: str, optional
The path where the AIT graph is dumped into a .py file.
- gen_standalonebool (default: False)
Generate a standalone executable for the model
- class aitemplate.backend.builder.Builder(n_jobs: int = -1, timeout: int = 180)[source]
Builder is a module to compile generated source code files into binary objects.
Methods:
build_objs
(files, cc_cmd[, binary_cc_cmd])Generate building task for each source code file, then build in parallel
build_so
(target, objs)Generate a task to build all objects into a dynamic library
- build_objs(files: List[Tuple[str, str]], cc_cmd: str, binary_cc_cmd: Optional[str] = None)[source]
Generate building task for each source code file, then build in parallel
- Parameters:
files (List[Tuple[str, str]]) – list of tuples of source code path and object file path
cc_cmd (str) – command line template for building objects
binary_cc_cmd (optional, str) – command line template for turning raw binary files (those ending in .bin) into objects. Since most compilation jobs will not need to compile these, this argument is optional.
- class aitemplate.backend.builder.Path(*args, **kwargs)[source]
PurePath subclass that can make system calls.
Path represents a filesystem path but unlike PurePath, also offers methods to do system calls on path objects. Depending on your system, instantiating a Path will return either a PosixPath or a WindowsPath object. You can also instantiate a PosixPath or WindowsPath directly, but cannot instantiate a WindowsPath on a POSIX system or vice versa.
Methods:
absolute
()Return an absolute version of this path.
chmod
(mode)Change the permissions of the path, like os.chmod().
cwd
()Return a new path pointing to the current working directory (as returned by os.getcwd()).
exists
()Whether this path exists.
Return a new path with expanded ~ and ~user constructs (as returned by os.path.expanduser)
glob
(pattern)Iterate over this subtree and yield all existing files (of any kind, including directories) matching the given relative pattern.
group
()Return the group name of the file gid.
home
()Return a new path pointing to the user's home directory (as returned by os.path.expanduser('~')).
Whether this path is a block device.
Whether this path is a character device.
is_dir
()Whether this path is a directory.
is_fifo
()Whether this path is a FIFO.
is_file
()Whether this path is a regular file (also True for symlinks pointing to regular files).
is_mount
()Check if this path is a POSIX mount point
Whether this path is a socket.
Whether this path is a symbolic link.
iterdir
()Iterate over the files in this directory.
lchmod
(mode)Like chmod(), except if the path points to a symlink, the symlink's permissions are changed, rather than its target's.
link_to
(target)Make the target path a hard link pointing to this path.
lstat
()Like stat(), except if the path points to a symlink, the symlink's status information is returned, rather than its target's.
mkdir
([mode, parents, exist_ok])Create a new directory at this given path.
open
([mode, buffering, encoding, errors, ...])Open the file pointed by this path and return a file object, as the built-in open() function does.
owner
()Return the login name of the file owner.
Open the file in bytes mode, read it, and close the file.
read_text
([encoding, errors])Open the file in text mode, read it, and close the file.
readlink
()Return the path to which the symbolic link points.
rename
(target)Rename this path to the target path.
replace
(target)Rename this path to the target path, overwriting if that path exists.
resolve
([strict])Make the path absolute, resolving all symlinks on the way and also normalizing it (for example turning slashes into backslashes under Windows).
rglob
(pattern)Recursively yield all existing files (of any kind, including directories) matching the given relative pattern, anywhere in this subtree.
rmdir
()Remove this directory.
samefile
(other_path)Return whether other_path is the same or not as this file (as returned by os.path.samefile()).
stat
()Return the result of the stat() system call on this path, like os.stat() does.
symlink_to
(target[, target_is_directory])Make this path a symlink pointing to the target path.
touch
([mode, exist_ok])Create this file with the given access mode, if it doesn't exist.
unlink
([missing_ok])Remove this file or link.
write_bytes
(data)Open the file in bytes mode, write to it, and close the file.
write_text
(data[, encoding, errors])Open the file in text mode, write to it, and close the file.
- absolute()[source]
Return an absolute version of this path. This function works even if the path doesn’t point to anything.
No normalization is done, i.e. all ‘.’ and ‘..’ will be kept along. Use resolve() to get the canonical path to a file.
- classmethod cwd()[source]
Return a new path pointing to the current working directory (as returned by os.getcwd()).
- expanduser()[source]
Return a new path with expanded ~ and ~user constructs (as returned by os.path.expanduser)
- glob(pattern)[source]
Iterate over this subtree and yield all existing files (of any kind, including directories) matching the given relative pattern.
- classmethod home()[source]
Return a new path pointing to the user’s home directory (as returned by os.path.expanduser(‘~’)).
- is_file()[source]
Whether this path is a regular file (also True for symlinks pointing to regular files).
- iterdir()[source]
Iterate over the files in this directory. Does not yield any result for the special paths ‘.’ and ‘..’.
- lchmod(mode)[source]
Like chmod(), except if the path points to a symlink, the symlink’s permissions are changed, rather than its target’s.
- link_to(target)[source]
Make the target path a hard link pointing to this path.
Note this function does not make this path a hard link to target, despite the implication of the function and argument names. The order of arguments (target, link) is the reverse of Path.symlink_to, but matches that of os.link.
- lstat()[source]
Like stat(), except if the path points to a symlink, the symlink’s status information is returned, rather than its target’s.
- open(mode='r', buffering=-1, encoding=None, errors=None, newline=None)[source]
Open the file pointed by this path and return a file object, as the built-in open() function does.
- read_text(encoding=None, errors=None)[source]
Open the file in text mode, read it, and close the file.
- rename(target)[source]
Rename this path to the target path.
The target path may be absolute or relative. Relative paths are interpreted relative to the current working directory, not the directory of the Path object.
Returns the new Path instance pointing to the target path.
- replace(target)[source]
Rename this path to the target path, overwriting if that path exists.
The target path may be absolute or relative. Relative paths are interpreted relative to the current working directory, not the directory of the Path object.
Returns the new Path instance pointing to the target path.
- resolve(strict=False)[source]
Make the path absolute, resolving all symlinks on the way and also normalizing it (for example turning slashes into backslashes under Windows).
- rglob(pattern)[source]
Recursively yield all existing files (of any kind, including directories) matching the given relative pattern, anywhere in this subtree.
- samefile(other_path)[source]
Return whether other_path is the same or not as this file (as returned by os.path.samefile()).
- symlink_to(target, target_is_directory=False)[source]
Make this path a symlink pointing to the target path. Note the order of arguments (link, target) is the reverse of os.symlink.
- touch(mode=438, exist_ok=True)[source]
Create this file with the given access mode, if it doesn’t exist.
- class aitemplate.backend.builder.Runner(devs: List[int], timeout: int = 10)[source]
A parallel runner for compiling tasks. Runner is inherited from BaseRunner.
Methods:
pull
()Pull building results.
push
(idx, cmd, target)Push a building task into runner
- class aitemplate.backend.builder.Task(idx: Union[int, str], cmd: str, name: str, **kwargs)[source]
Task is an object containing a bash command, process for the command, and output of the process.
Methods:
Return the assigned device id for the task
Check whether the task is failed
Check whether the task is finished
Check whether the task process is still running.
Check whether the task is timeout
poll
(current_time, timeout)Given the current time, check whether the task is running, finished or timed out.
pull
(fproc)Pull stdout & stderr from process, process stdout & stderr with fproc, and set the output for the task.
- assigned_dev() int [source]
Return the assigned device id for the task
- Returns:
Assigned device id
- Return type:
int
- is_failed() bool [source]
Check whether the task is failed
- Returns:
Whether the task is failed
- Return type:
bool
- is_finished() bool [source]
Check whether the task is finished
- Returns:
Whether the task is finished
- Return type:
bool
- is_running() bool [source]
Check whether the task process is still running.
- Returns:
Whether the task process is still running
- Return type:
bool
- is_timeout() bool [source]
Check whether the task is timeout
- Returns:
Whether the task is timeout
- Return type:
bool
- aitemplate.backend.builder.is_cmake_compilation() bool [source]
When enabled, compiles the model via invoking CMake rather than invoking make directly.
- aitemplate.backend.builder.process_return(task: Task) None [source]
This function process the task. If task is timeout or failed, raise a runtime error.
- Parameters:
task (Task) – A compiling task.
- Raises:
RuntimeError – Compiling failed.
- aitemplate.backend.builder.process_task(task: Task) None [source]
This function extracts stdout and stderr from a finished task. If the task process return code is not 0, will mark the task as a failed task.
- Parameters:
task (Task) – A compiling task
- aitemplate.backend.builder.sha1(string=b'', *, usedforsecurity=True)
Returns a sha1 hash object; optionally initialized with a string
- aitemplate.backend.builder.write_binhash_file(build_dir, binhash_filename='constants.hash', filter_func: ~typing.Callable[[str], bool] = <function is_bin_file>)[source]
Hash all binary input files, so we don’t have to keep them ( Usecase: constants.obj / constants.bin )
- Parameters:
build_dir (str) – Path to build directory
binhash_filename (str, optional) – File to be written within build_dir, defaults to “constants.hash”.
filter_func (Callable[[str], bool], optional) – Filter function to determine which files to hash. Defaults to is_bin_file.
aitemplate.backend.codegen
This module is for generating the final C++ source code in files from Tensor and Operators. Functions in this module will be used for generating function source code files, profiler source code files, and model driver source code files.
Classes:
|
This class contains the options for configuring debug settings Arguments: check_all_nan_and_inf : bool (default: False) Whether or not to check this tensor is nan or inf during runtime. check_all_outputs : bool (default: False) Whether or not to print this tensor's value out during runtime. gen_profiler_annotation : bool (default: False) Whether or not to add profile annotation primitives when doing codegen. (e.g. NVTX for CUDA and rocTX for AMD) Currently only supports NVIDIA. dump_ait_to_py: str, optional The path where the AIT graph is dumped into a .py file. gen_standalone : bool (default: False) Generate a standalone executable for the model. |
|
An IntImm represents a static dimension. |
|
An IntVar represents a dynamic dimension. |
|
A special tensor which represents an IntImm / IntVar. |
|
Base class for all operators |
|
PurePath subclass that can make system calls. |
|
A tensor accessor which manages how to access a Tensor. |
|
|
defaultdict(default_factory=None, /, [...]) --> dict with default factory |
Functions:
|
Generate a nullptr check to be used by pointer initialization code. |
|
Returns the string representation of the AITemplateDtype enum (defined in model_interface.h) for the given dtype str. |
|
Generate functions source code files for the given graph |
|
Generate model driver source code files for the given graph |
|
Generate operator profiler source code files for the given graph |
|
Returns size (in bytes) of the given dtype str. |
|
Generate a string setting a value in a map. |
Number of extra streams in multi-stream mode. |
|
Maximum number of parallel operators used in memory planning for simple multi-stream mode. |
|
Multi-stream mode. |
|
|
Generate a string that sets a value to something stored in a map. |
Make sure that no more than max_parallel_ops operators are run in parallel. |
- class aitemplate.backend.codegen.AITDebugSettings(check_all_nan_and_inf: bool = False, check_all_outputs: bool = False, gen_profiler_annotation: bool = False, dump_ait_to_py: Optional[str] = None, gen_standalone: bool = False)[source]
This class contains the options for configuring debug settings Arguments: check_all_nan_and_inf : bool (default: False)
Whether or not to check this tensor is nan or inf during runtime.
- check_all_outputsbool (default: False)
Whether or not to print this tensor’s value out during runtime.
- gen_profiler_annotationbool (default: False)
Whether or not to add profile annotation primitives when doing codegen. (e.g. NVTX for CUDA and rocTX for AMD) Currently only supports NVIDIA.
- dump_ait_to_py: str, optional
The path where the AIT graph is dumped into a .py file.
- gen_standalonebool (default: False)
Generate a standalone executable for the model
- class aitemplate.backend.codegen.IntImm(value: int, name: Optional[str] = None)[source]
An IntImm represents a static dimension. IntVar (see above) and IntImm are used together to represent a Tensor’s shape.
Methods:
pseudo_code
([with_shape])Returns a string containing pseudo code of this object.
value
()Returns value of this IntImm.
- class aitemplate.backend.codegen.IntVar(values: List[int], name: Optional[str] = None, symbolic_value: Optional[Basic] = None)[source]
An IntVar represents a dynamic dimension. IntVar and IntImm (see below) are used together to represent a Tensor’s shape.
IntVar supports basic arithmetic operations, and returns the most conservative IntVar w.r.t. range of _attrs[“values”].
Methods:
Returns lower bound of this dynamic dim.
pseudo_code
([with_shape])Returns a string containing pseudo code of this object.
Returns the symbolic value of this dynamic dim.
Returns upper bound of this dynamic dim.
- class aitemplate.backend.codegen.IntVarTensor(int_var: IntVar, name: Optional[str] = None, src_ops: Optional[Set[Node]] = None, dst_ops: Optional[Set[Node]] = None, dtype: str = 'float16', is_input: bool = False, is_output: bool = False, value: Optional[Any] = None, is_view_of: Optional[Any] = None)[source]
A special tensor which represents an IntImm / IntVar. This Tensor can be used as inputs of some Operators (e.g. reshape, layernorm). An IntVarTensor instead of IntVar is used here to keep reference to src_ops and dst_ops.
Methods:
pseudo_code
([with_shape])Returns a string containing pseudo code of this object.
- class aitemplate.backend.codegen.Operator[source]
Base class for all operators
Methods:
Generates function source code string.
gen_profiler
([workdir, ...])Generates source files for profiling purpose.
profile
([workdir, devices, ...])Selects the fastest kernel configurations.
pseudo_code
([with_shape])Returns a string containing pseudo code of this object.
replace_input_tensor
(old_tensor, new_tensor)Replaces old_tensors in self._attrs["inputs"] with new_tensor.
- gen_function() str [source]
Generates function source code string.
- Returns:
str
- Return type:
a string which contains C++ function implementation source code.
- Raises:
NotImplementedError –
- gen_profiler(workdir: Optional[str] = None, dynamic_profiling_strategy=None) None [source]
Generates source files for profiling purpose.
- Parameters:
workdir (str, optional) – The directory to generate source files.
dynamic_profiling_strategy (DynamicProfileStrategy, optional) – A dynamic profiling strategy, used to filter generated profiles at compile time. See also:
profile()
- profile(workdir='./', devices=None, dynamic_profiling_strategy=DynamicProfileStrategy.MAX) None [source]
Selects the fastest kernel configurations.
- Parameters:
workdir (str, optional) – The directory which contains source files, by default “./”
devices (list, optional) – A list of device ids which can be used for profiling.
dynamic_profiling_strategy (DynamicProfileStrategy, optional) – Profiling strategy used when there are dynamic dims. By default, MAX is used, i.e. to profile a dynamic range, an upper bound will be used.
- class aitemplate.backend.codegen.Path(*args, **kwargs)[source]
PurePath subclass that can make system calls.
Path represents a filesystem path but unlike PurePath, also offers methods to do system calls on path objects. Depending on your system, instantiating a Path will return either a PosixPath or a WindowsPath object. You can also instantiate a PosixPath or WindowsPath directly, but cannot instantiate a WindowsPath on a POSIX system or vice versa.
Methods:
absolute
()Return an absolute version of this path.
chmod
(mode)Change the permissions of the path, like os.chmod().
cwd
()Return a new path pointing to the current working directory (as returned by os.getcwd()).
exists
()Whether this path exists.
Return a new path with expanded ~ and ~user constructs (as returned by os.path.expanduser)
glob
(pattern)Iterate over this subtree and yield all existing files (of any kind, including directories) matching the given relative pattern.
group
()Return the group name of the file gid.
home
()Return a new path pointing to the user's home directory (as returned by os.path.expanduser('~')).
Whether this path is a block device.
Whether this path is a character device.
is_dir
()Whether this path is a directory.
is_fifo
()Whether this path is a FIFO.
is_file
()Whether this path is a regular file (also True for symlinks pointing to regular files).
is_mount
()Check if this path is a POSIX mount point
Whether this path is a socket.
Whether this path is a symbolic link.
iterdir
()Iterate over the files in this directory.
lchmod
(mode)Like chmod(), except if the path points to a symlink, the symlink's permissions are changed, rather than its target's.
link_to
(target)Make the target path a hard link pointing to this path.
lstat
()Like stat(), except if the path points to a symlink, the symlink's status information is returned, rather than its target's.
mkdir
([mode, parents, exist_ok])Create a new directory at this given path.
open
([mode, buffering, encoding, errors, ...])Open the file pointed by this path and return a file object, as the built-in open() function does.
owner
()Return the login name of the file owner.
Open the file in bytes mode, read it, and close the file.
read_text
([encoding, errors])Open the file in text mode, read it, and close the file.
readlink
()Return the path to which the symbolic link points.
rename
(target)Rename this path to the target path.
replace
(target)Rename this path to the target path, overwriting if that path exists.
resolve
([strict])Make the path absolute, resolving all symlinks on the way and also normalizing it (for example turning slashes into backslashes under Windows).
rglob
(pattern)Recursively yield all existing files (of any kind, including directories) matching the given relative pattern, anywhere in this subtree.
rmdir
()Remove this directory.
samefile
(other_path)Return whether other_path is the same or not as this file (as returned by os.path.samefile()).
stat
()Return the result of the stat() system call on this path, like os.stat() does.
symlink_to
(target[, target_is_directory])Make this path a symlink pointing to the target path.
touch
([mode, exist_ok])Create this file with the given access mode, if it doesn't exist.
unlink
([missing_ok])Remove this file or link.
write_bytes
(data)Open the file in bytes mode, write to it, and close the file.
write_text
(data[, encoding, errors])Open the file in text mode, write to it, and close the file.
- absolute()[source]
Return an absolute version of this path. This function works even if the path doesn’t point to anything.
No normalization is done, i.e. all ‘.’ and ‘..’ will be kept along. Use resolve() to get the canonical path to a file.
- classmethod cwd()[source]
Return a new path pointing to the current working directory (as returned by os.getcwd()).
- expanduser()[source]
Return a new path with expanded ~ and ~user constructs (as returned by os.path.expanduser)
- glob(pattern)[source]
Iterate over this subtree and yield all existing files (of any kind, including directories) matching the given relative pattern.
- classmethod home()[source]
Return a new path pointing to the user’s home directory (as returned by os.path.expanduser(‘~’)).
- is_file()[source]
Whether this path is a regular file (also True for symlinks pointing to regular files).
- iterdir()[source]
Iterate over the files in this directory. Does not yield any result for the special paths ‘.’ and ‘..’.
- lchmod(mode)[source]
Like chmod(), except if the path points to a symlink, the symlink’s permissions are changed, rather than its target’s.
- link_to(target)[source]
Make the target path a hard link pointing to this path.
Note this function does not make this path a hard link to target, despite the implication of the function and argument names. The order of arguments (target, link) is the reverse of Path.symlink_to, but matches that of os.link.
- lstat()[source]
Like stat(), except if the path points to a symlink, the symlink’s status information is returned, rather than its target’s.
- open(mode='r', buffering=-1, encoding=None, errors=None, newline=None)[source]
Open the file pointed by this path and return a file object, as the built-in open() function does.
- read_text(encoding=None, errors=None)[source]
Open the file in text mode, read it, and close the file.
- rename(target)[source]
Rename this path to the target path.
The target path may be absolute or relative. Relative paths are interpreted relative to the current working directory, not the directory of the Path object.
Returns the new Path instance pointing to the target path.
- replace(target)[source]
Rename this path to the target path, overwriting if that path exists.
The target path may be absolute or relative. Relative paths are interpreted relative to the current working directory, not the directory of the Path object.
Returns the new Path instance pointing to the target path.
- resolve(strict=False)[source]
Make the path absolute, resolving all symlinks on the way and also normalizing it (for example turning slashes into backslashes under Windows).
- rglob(pattern)[source]
Recursively yield all existing files (of any kind, including directories) matching the given relative pattern, anywhere in this subtree.
- samefile(other_path)[source]
Return whether other_path is the same or not as this file (as returned by os.path.samefile()).
- symlink_to(target, target_is_directory=False)[source]
Make this path a symlink pointing to the target path. Note the order of arguments (link, target) is the reverse of os.symlink.
- touch(mode=438, exist_ok=True)[source]
Create this file with the given access mode, if it doesn’t exist.
- class aitemplate.backend.codegen.TensorAccessor(original_tensor: Tensor)[source]
A tensor accessor which manages how to access a Tensor. Must always be used together with a Tensor.
Methods:
gen_stride_str
(dim, dim_names)Returns the str to calculate the stride of a certain dim.
is_rightmost_dim_contiguous
(cat_dim)Check if the rightmost diminsion would be contiguous after concatenation along a given cat_dim.
stride
(dim)Returns stride (a number) for the given dim.
try_get_stride_strs
(dim[, dim_names])Tries to return a list of stride strs for the given dim.
update_base_tensor
(new_tensor, stride_dim, ...)Updates the TensorAccessor with a new base tensor.
update_base_tensor_shape
(new_tensor)Updates the TensorAccessor's actual shape.
- gen_stride_str(dim: int, dim_names: List[str]) str [source]
Returns the str to calculate the stride of a certain dim. This is a temporary solution to get around dynamic shapes problems with tensor_accessor. dim_names is a list of str, such as [“B”, “M”, “K”] for the first input to bmm_rcr.
Note that both dim and dim_names are based on self.original_shapes.
Throws RuntimeError if such a stride number cannot be computed.
- is_rightmost_dim_contiguous(cat_dim: int) bool [source]
Check if the rightmost diminsion would be contiguous after concatenation along a given cat_dim. This is a necessary condition for GEMM+concat fusion, since GEMM doesn’t support discontinuous rightmost dimension for row-major outout. Rightmost diminsion is contiguous iff the concat dimension corresponds to one of the dimensions in the original shape and it’s the first dimension in its group of actual dimensions.
- stride(dim: int) int [source]
Returns stride (a number) for the given dim. Note that dim is based on self.original_shapes. This API assumes that all dims after dim are static (IntImm).
Throws RuntimeError if such a stride number cannot be computed.
- try_get_stride_strs(dim: int, dim_names: Optional[List[str]] = None) Optional[List[str]] [source]
Tries to return a list of stride strs for the given dim. Note that both dim and dim_names are based on self.original_shapes.
Returns None if this function fails to calculate stride strs.
- aitemplate.backend.codegen.check_not_null(tensor: Tensor, tensor_idx: Optional[int] = None, skip_if_lower_bound_is_zero: bool = False) str [source]
Generate a nullptr check to be used by pointer initialization code.
If skip_if_lower_bound_is_zero == True, no code will be generated when the Tensor has at least one dynamic dim with a lower bound of zero. This is most useful for outputs; we put the nullptr checks at the start of the inference, but we won’t know output shapes until after Run() finishes. We therefore just relax the check for these outputs - only allow them to be null if their lower bound is zero, otherwise never allow them to be null.
- class aitemplate.backend.codegen.defaultdict
defaultdict(default_factory=None, /, […]) –> dict with default factory
The default factory is called without arguments to produce a new value when a key is not present, in __getitem__ only. A defaultdict compares equal to a dict with the same items. All remaining arguments are treated the same as if they were passed to the dict constructor, including keyword arguments.
Methods:
copy
()Attributes:
Factory for default value called by __missing__().
- copy() a shallow copy of D.
- default_factory
Factory for default value called by __missing__().
- aitemplate.backend.codegen.dtype_to_enumerator(dtype: str) str [source]
Returns the string representation of the AITemplateDtype enum (defined in model_interface.h) for the given dtype str.
- Parameters:
dtype (str) – A data type string.
- Returns:
the AITemplateDtype enum string representation.
- Return type:
str
- aitemplate.backend.codegen.gen_function_src(sorted_graph: List[Tensor], workdir: str, model_name: str = '') List[Tuple[str, str]] [source]
Generate functions source code files for the given graph
- Parameters:
sorted_graph (List[Tensor]) – The network after running toposort transformation
workdir (str) – Target directory for generated C++ source code files
model_name (str, optional) – Sub working directory in the workdir for the given model, by default “”
- Returns:
List of tuple (source file path, object file path)
- Return type:
List[Tuple[str, str]]
- aitemplate.backend.codegen.gen_library_src(sorted_graph: List[Tensor], max_blob_size: int, max_constant_blob_size: int, workspace: Workspace, workdir: str, output_tensors: List[Tensor], model_name: str = '', debug_settings: AITDebugSettings = AITDebugSettings(check_all_nan_and_inf=False, check_all_outputs=False, gen_profiler_annotation=False, dump_ait_to_py=None, gen_standalone=False), additional_unbound_constants: Optional[List[Tensor]] = None) List[Tuple[str, str]] [source]
Generate model driver source code files for the given graph
- Parameters:
sorted_graph (List[Tensor]) – The network after running toposort transformation
max_blob_size (int) – Total memory for input/output tensor and intermediate results, calculated by memory planning transformation
workspace (Workspace) – Workspace sizes, computed by memory planning
workdir (str) – Target directory for generated C++ source code files
model_name (str, optional) – Sub working directory in the workdir for the given model, by default “”
debug_settings (AITDebugSettings) – specify debug settings such as where to dump AITemplate model Python file, etc.
- Returns:
List of tuple (source file path, object file path)
- Return type:
List[Tuple[str, str]]
- aitemplate.backend.codegen.gen_profiler(sorted_graph: List[Tensor], workdir: str, dynamic_profiling_strategy)[source]
Generate operator profiler source code files for the given graph
- Parameters:
sorted_graph (List[Tensor]) – The network after running toposort transformation
workdir (str) – Target directory for generated C++ source code files
dynamic_profiling_strategy (DynamicProfileStrategy, optional) – A dynamic profiling strategy, used to filter generated profiles at compile time. Pass-through to gen_profiler kernels of nodes in the graph. See also:
profile()
- aitemplate.backend.codegen.get_dtype_size(dtype: str) int [source]
Returns size (in bytes) of the given dtype str.
- Parameters:
dtype (str) – A data type string.
- Returns:
Size (in bytes) of this dtype.
- Return type:
int
- aitemplate.backend.codegen.map_set(map_name: str, key_name: str, value_name: Optional[str] = None, indent: str = ' ') str [source]
Generate a string setting a value in a map.
If value name is given, sets map_name[“key_name”] = value_name. Else, sets map_name[“key_name”] = key_name. Special maps like dim_map may make additional modificiations to the LHS of this expression.
- Parameters:
map_name (str) – The map to use
key_name (str) – The key to set. Will be put into quotes.
value_name (Optional[str]) – If set, force map_name[“key_name”] = value_name
indent (str) – For formatting
- Returns:
The formatted map set statement.
- Return type:
str
- aitemplate.backend.codegen.multistream_additional_streams() int [source]
Number of extra streams in multi-stream mode.
This option is independent from AIT_MULTISTREAM_MAX_MEM_PARALLEL_OPS.
For example, say, there are 100 ops that can be run in parallel.
Example 1: AIT_MULTISTREAM_EXTRA_STREAMS=4 and AIT_MULTISTREAM_MAX_MEM_PARALLEL_OPS=100. In this case 5 streams will be used (1 base and 4 extra), every stream gets 20 operators and no inter-stream barriers are used. Memory planning is done for 100 parallel ops.
Example 2: AIT_MULTISTREAM_EXTRA_STREAMS=4 and AIT_MULTISTREAM_MAX_MEM_PARALLEL_OPS=5. In this case 5 streams will be used (1 base and 4 extra), there will be 20 waves separated by inter-stream barriers, every stream gets 1 operator for every wave. Memory planning is done for 20 waves of 5 parallel ops each.
- aitemplate.backend.codegen.multistream_max_mem_parallel_ops() int [source]
Maximum number of parallel operators used in memory planning for simple multi-stream mode. Larger value imply higher level of possible parallelism, but higher memory allocations.
This option is independent from AIT_MULTISTREAM_EXTRA_STREAMS.
For example, say, there are 100 ops that can be run in parallel.
Example 1: AIT_MULTISTREAM_EXTRA_STREAMS=4 and AIT_MULTISTREAM_MAX_MEM_PARALLEL_OPS=100. In this case 5 streams will be used (1 base and 4 extra), every stream gets 20 operators and no inter-stream barriers are used. Memory planning is done for 100 parallel ops.
Example 2: AIT_MULTISTREAM_EXTRA_STREAMS=4 and AIT_MULTISTREAM_MAX_MEM_PARALLEL_OPS=5. In this case 5 streams will be used (1 base and 4 extra), there will be 20 waves separated by inter-stream barriers, every stream gets 1 operator for every wave. Memory planning is done for 20 waves of 5 parallel ops each.
- aitemplate.backend.codegen.multistream_mode() int [source]
Multi-stream mode. 0 - no multistream. 1 - simple multistream. Default: 0.
- aitemplate.backend.codegen.set_value_from_map(map_name: Any, var_name: Any, indent: str = ' ') str [source]
Generate a string that sets a value to something stored in a map.
- Parameters:
map_name (str) – The map to use
var_name (str) – The var_name, used as the name of the value and the key.
indent (str) – For formatting
- Returns:
The formatted statement.
- Return type:
str
- aitemplate.backend.codegen.split_simple_multistream_parallel_ops(ops_by_order, max_parallel_ops: int)[source]
Make sure that no more than max_parallel_ops operators are run in parallel.
Say, on the first step op1, op2 and op3 can be executed in parallel. On the second one, it is op4 and op5. On the third one it is op6, op7, op8, op9. Then, ops_by_order is something like
{ 1: [op1, op2, op3], 2: [op4, op5], 3: [op6, op7, op8, op9] }
- Given max_parallel_ops=2, the output will be:
[[op1, op2], [op3], [op4, op5], [op6, op7], [op8, op9]]
- Parameters:
ops_by_order (Dict[int, List[Operator]]) – A dictionary, its keys represent the execution order and its values represent operators that are executed in parallel.
max_parallel_ops (int) – Number of operators that are allowed to be run in parallel
Output (List[List[Operator]]) – transformed sequence of operators to execute.
aitemplate.backend.profiler_cache
SQLite backend for conv/gemm profiling cache
Classes:
|
Enum for cache mode |
|
Local SQLite profile cache database. |
- class aitemplate.backend.profiler_cache.CacheMode(value)[source]
Enum for cache mode
Profiling cache can be stored locally or remotely. For LOCAL mode, the cache is stored in a SQLite database. For REMOTE mode, the profiled results can be queried with RESTFul API.
REMOTE mode is not implemented yet.
- class aitemplate.backend.profiler_cache.ProfileCacheDB(target: str, path: Optional[str] = None, uri: Optional[str] = None, port: Optional[str] = None)[source]
Local SQLite profile cache database.
Methods:
insert_conv
(args)a function to insert conv op epilogue into cache, here we use the same sql table for conv and gemm
insert_conv3d
(args)a function to insert conv op epilogue into cache, here we use the same sql table for conv and gemm
insert_gemm
(args)a function to insert gemm op epilogue into cache
insert_normalization
(args)a function to insert conv op epilogue into cache, here we use the same sql table for conv and gemm
query_conv
(args)a function to query conv op epilogue from cache, here we use the same sql table for conv and gemm
query_conv3d
(args)a function to query conv op epilogue from cache, here we use the same sql table for conv and gemm
query_gemm
(args)a function to query gemm op epilogue from cache
query_normalization
(args)a function to query normalization op epilogue from cache
- insert_conv(args: Dict[str, Any]) None [source]
a function to insert conv op epilogue into cache, here we use the same sql table for conv and gemm
- Parameters:
args (Dict) – Conv Record Entry
- insert_conv3d(args: Dict[str, Any]) None [source]
a function to insert conv op epilogue into cache, here we use the same sql table for conv and gemm
- Parameters:
args (Dict) – Conv Record Entry
- insert_gemm(args: Dict[str, Any]) None [source]
a function to insert gemm op epilogue into cache
- Parameters:
args (Dict) – Gemm Record Entry
- insert_normalization(args: Dict[str, Any]) None [source]
a function to insert conv op epilogue into cache, here we use the same sql table for conv and gemm
- Parameters:
args (Dict) – Conv Record Entry
- query_conv(args: Dict[str, Any]) Tuple[str, int] [source]
a function to query conv op epilogue from cache, here we use the same sql table for conv and gemm
- Parameters:
args (Dict) – Conv query entry
- Returns:
profiling results
- Return type:
Tuple
- query_conv3d(args: Dict[str, Any]) Tuple[str, int] [source]
a function to query conv op epilogue from cache, here we use the same sql table for conv and gemm
- Parameters:
args (Dict) – Conv3d query entry
- Returns:
profiling results
- Return type:
Tuple
aitemplate.backend.profiler_runner
A subprocess based multiple GPUs runner for auto-tuning
Classes:
|
Object to store profiling result |
|
Another parallel runner to execute profilers on multiple GPUs in parallel It uses a process pool for implementation, avoiding process creation overhead The size of the process pool is equal to the number of provided GPUs, so ~ideally~ each process should execute a profiler on its dedicated GPU. |
|
Create a queue object with a given maximum size. |
|
A parallel runner for multiple GPUs profiling tasks. |
Functions:
|
Detect GPU target based on nvidia-smi and rocminfo |
|
Generate profile result from a profiling task |
|
Extract kernel execution time and workspace from task process outputs |
|
Delay execution for a given number of seconds. |
- class aitemplate.backend.profiler_runner.ProfileResult(op_config, duration, workspace)
Object to store profiling result
Attributes:
Alias for field number 1
Alias for field number 0
Alias for field number 2
- duration
Alias for field number 1
- op_config
Alias for field number 0
- workspace
Alias for field number 2
- class aitemplate.backend.profiler_runner.ProfilerRunner(devices: List[str], postprocessing_delegate, timeout: int = 500)[source]
Another parallel runner to execute profilers on multiple GPUs in parallel It uses a process pool for implementation, avoiding process creation overhead The size of the process pool is equal to the number of provided GPUs, so ~ideally~ each process should execute a profiler on its dedicated GPU. This property hasn’t been properly verified yet, however, the results are empirically better compared to the previous runner.
Methods:
join
()Wait for subprocesses completion or timeout; postprocess the profiler results with delegate(s)
push
(cmds, process_result_callback)Schedule the profiler for execution in a separate process, Call the callback after subprocess completion
- join()[source]
Wait for subprocesses completion or timeout; postprocess the profiler results with delegate(s)
- push(cmds: List[str], process_result_callback: Callable)[source]
Schedule the profiler for execution in a separate process, Call the callback after subprocess completion
- Parameters:
cmds (List[str]) – argv for the launched profiler
process_result_callback (Callable) – Called after subprocess completion in the main process (but possibly not main thread). Currently used to aggregate profiler results, so the callable takes result and postprocessing_delegate parameters It is also used to propagate the profiler launch context to the aggregation point, namely, split_k value for the gemm profilers
- class aitemplate.backend.profiler_runner.Queue(maxsize=0)[source]
Create a queue object with a given maximum size.
If maxsize is <= 0, the queue size is infinite.
Methods:
empty
()Return True if the queue is empty, False otherwise (not reliable!).
full
()Return True if the queue is full, False otherwise (not reliable!).
get
([block, timeout])Remove and return an item from the queue.
Remove and return an item from the queue without blocking.
join
()Blocks until all items in the Queue have been gotten and processed.
put
(item[, block, timeout])Put an item into the queue.
put_nowait
(item)Put an item into the queue without blocking.
qsize
()Return the approximate size of the queue (not reliable!).
Indicate that a formerly enqueued task is complete.
- empty()[source]
Return True if the queue is empty, False otherwise (not reliable!).
This method is likely to be removed at some point. Use qsize() == 0 as a direct substitute, but be aware that either approach risks a race condition where a queue can grow before the result of empty() or qsize() can be used.
To create code that needs to wait for all queued tasks to be completed, the preferred technique is to use the join() method.
- full()[source]
Return True if the queue is full, False otherwise (not reliable!).
This method is likely to be removed at some point. Use qsize() >= n as a direct substitute, but be aware that either approach risks a race condition where a queue can shrink before the result of full() or qsize() can be used.
- get(block=True, timeout=None)[source]
Remove and return an item from the queue.
If optional args ‘block’ is true and ‘timeout’ is None (the default), block if necessary until an item is available. If ‘timeout’ is a non-negative number, it blocks at most ‘timeout’ seconds and raises the Empty exception if no item was available within that time. Otherwise (‘block’ is false), return an item if one is immediately available, else raise the Empty exception (‘timeout’ is ignored in that case).
- get_nowait()[source]
Remove and return an item from the queue without blocking.
Only get an item if one is immediately available. Otherwise raise the Empty exception.
- join()[source]
Blocks until all items in the Queue have been gotten and processed.
The count of unfinished tasks goes up whenever an item is added to the queue. The count goes down whenever a consumer thread calls task_done() to indicate the item was retrieved and all work on it is complete.
When the count of unfinished tasks drops to zero, join() unblocks.
- put(item, block=True, timeout=None)[source]
Put an item into the queue.
If optional args ‘block’ is true and ‘timeout’ is None (the default), block if necessary until a free slot is available. If ‘timeout’ is a non-negative number, it blocks at most ‘timeout’ seconds and raises the Full exception if no free slot was available within that time. Otherwise (‘block’ is false), put an item on the queue if a free slot is immediately available, else raise the Full exception (‘timeout’ is ignored in that case).
- put_nowait(item)[source]
Put an item into the queue without blocking.
Only enqueue the item if a free slot is immediately available. Otherwise raise the Full exception.
- task_done()[source]
Indicate that a formerly enqueued task is complete.
Used by Queue consumer threads. For each get() used to fetch a task, a subsequent call to task_done() tells the queue that the processing on the task is complete.
If a join() is currently blocking, it will resume when all items have been processed (meaning that a task_done() call was received for every item that had been put() into the queue).
Raises a ValueError if called more times than there were items placed in the queue.
- class aitemplate.backend.profiler_runner.Runner(devs: List[int], op_name: str, timeout: int = 30)[source]
A parallel runner for multiple GPUs profiling tasks. Runner is inherited from BaseRunner.
Methods:
pull
()Pull results from all profiling tasks assigned to runner.
push
(idx, cmd[, return_ops])Push a new profiling task into runner's queue
- pull()[source]
Pull results from all profiling tasks assigned to runner.
- Returns:
Profiling results of all successful tasks.
- Return type:
List[Tuple[Union[int, str], ProfileResult]]
- push(idx: Union[int, str], cmd: str, return_ops: Optional[List[str]] = None)[source]
Push a new profiling task into runner’s queue
- Parameters:
idx (Union[int, str]) – Profiling task id (usually is algorithm id or name)
cmd (str) – Bash command to execute the profiling task
return_ops (List[str]) – Names of the ops to return the profiling results for. If specified, instead of a single (best) ProfileResult instance, a list with the ProfileResults for each op in the return_ops is returned from pull.
- aitemplate.backend.profiler_runner.detect_target(**kwargs)[source]
Detect GPU target based on nvidia-smi and rocminfo
- Returns:
CUDA or ROCM target
- Return type:
Target
- aitemplate.backend.profiler_runner.process_return(task: Task) Tuple[Union[int, str], ProfileResult] [source]
Generate profile result from a profiling task
- Parameters:
task (Task) – A profiling task
- Returns:
Tuple of task idx (usually the algorithm name/id) and profiling result
- Return type:
Tuple[Union[int, str], ProfileResult]
- aitemplate.backend.profiler_runner.process_task(task: Task) None [source]
Extract kernel execution time and workspace from task process outputs
- Parameters:
task (Task) – A profiling task
- aitemplate.backend.profiler_runner.sleep(seconds)
Delay execution for a given number of seconds. The argument may be a floating point number for subsecond precision.
aitemplate.backend.registry
Registry is a design pattern to map a string key to a function. The registry decorator is mainly used for backend functions.
Functions:
|
Get a function from registry by using a key |
|
Register a new function |
- aitemplate.backend.registry.get(func_name: str) Callable [source]
Get a function from registry by using a key
Example
func = registry.get("func_name") func(args)
- Parameters:
func_name (str) – Key for function in registry
- Returns:
Function associated with the key
- Return type:
Callable
- Raises:
RuntimeError – If key is not founded in registry, will raise a RuntimeError
- aitemplate.backend.registry.reg(func_name: str, func: Optional[Callable] = None) Callable [source]
Register a new function
Example
@registry.reg("func_name") def func(args): ....
- Parameters:
func_name (str) – Registry key for the function
func (Callable, optional) – Function to be registered, by default None
- Returns:
Function in registry
- Return type:
Callable
- Raises:
RuntimeError – If same key is founded in registry, will raise a RuntimeError
aitemplate.backend.target
Target object for AITemplate.
Functions:
|
Create a CUDA target. |
|
Create a ROCM target. |
Classes:
|
Enum where members are also (and must be) ints |
|
Local SQLite profile cache database. |
|
Enum for target type. |
- aitemplate.backend.target.CUDA(template_path: str = '/home/runner/work/AITemplate/AITemplate/3rdparty/cutlass', arch: str = '80', **kwargs)[source]
Create a CUDA target.
- class aitemplate.backend.target.IntEnum(value)[source]
Enum where members are also (and must be) ints
- class aitemplate.backend.target.ProfileCacheDB(target: str, path: Optional[str] = None, uri: Optional[str] = None, port: Optional[str] = None)[source]
Local SQLite profile cache database.
Methods:
insert_conv
(args)a function to insert conv op epilogue into cache, here we use the same sql table for conv and gemm
insert_conv3d
(args)a function to insert conv op epilogue into cache, here we use the same sql table for conv and gemm
insert_gemm
(args)a function to insert gemm op epilogue into cache
insert_normalization
(args)a function to insert conv op epilogue into cache, here we use the same sql table for conv and gemm
query_conv
(args)a function to query conv op epilogue from cache, here we use the same sql table for conv and gemm
query_conv3d
(args)a function to query conv op epilogue from cache, here we use the same sql table for conv and gemm
query_gemm
(args)a function to query gemm op epilogue from cache
query_normalization
(args)a function to query normalization op epilogue from cache
- insert_conv(args: Dict[str, Any]) None [source]
a function to insert conv op epilogue into cache, here we use the same sql table for conv and gemm
- Parameters:
args (Dict) – Conv Record Entry
- insert_conv3d(args: Dict[str, Any]) None [source]
a function to insert conv op epilogue into cache, here we use the same sql table for conv and gemm
- Parameters:
args (Dict) – Conv Record Entry
- insert_gemm(args: Dict[str, Any]) None [source]
a function to insert gemm op epilogue into cache
- Parameters:
args (Dict) – Gemm Record Entry
- insert_normalization(args: Dict[str, Any]) None [source]
a function to insert conv op epilogue into cache, here we use the same sql table for conv and gemm
- Parameters:
args (Dict) – Conv Record Entry
- query_conv(args: Dict[str, Any]) Tuple[str, int] [source]
a function to query conv op epilogue from cache, here we use the same sql table for conv and gemm
- Parameters:
args (Dict) – Conv query entry
- Returns:
profiling results
- Return type:
Tuple
- query_conv3d(args: Dict[str, Any]) Tuple[str, int] [source]
a function to query conv op epilogue from cache, here we use the same sql table for conv and gemm
- Parameters:
args (Dict) – Conv3d query entry
- Returns:
profiling results
- Return type:
Tuple