aitemplate.testing

detect_target

Automatic detect target for testing

Functions:

detect_target(**kwargs)

Detect GPU target based on nvidia-smi and rocminfo

aitemplate.testing.detect_target.detect_target(**kwargs)[source]

Detect GPU target based on nvidia-smi and rocminfo

Returns:: CUDA or ROCM target
Return type:: Target

benchmark_pt

helper function to benchmark eager pytorch

Functions:

benchmark_torch_function(iters, function, ...)

function for benchmarking a pytorch function.

aitemplate.testing.benchmark_pt.benchmark_torch_function(iters: int, function, *args, **kwargs) → float[source]

function for benchmarking a pytorch function.

Parameters:

iters (int) – Number of iterations.
function (lambda function) – function to benchmark.
args (Any type) – Args to function.

Returns:

Runtime per iteration in ms.

Return type:

float

benchmark_ait

Functions:

`make_input_output_pools`(*, pool_size, ...)	Make input and output pools for benchmarking.
`run_benchmark`(*, ait_module, inputs_pool, ...)	Run the benchmark.
`run_module_with_pools`(*, ait_module, ...[, ...])	Run the module with the given inputs and outputs pools.

aitemplate.testing.benchmark_ait.make_input_output_pools(*, pool_size, eval_pt_func, input_filter_func, output_filter_func)[source]

Make input and output pools for benchmarking. The rationale is avoiding retrieving the same input from the device cache for fair perf assessment. :param pool_size: The size of the pool. :type pool_size: int :param eval_pt_func: A callable that returns a dict of inputs and outputs. :type eval_pt_func: callable :param input_filter_func: A callable that takes a key and a value and returns True if the key-value pair from the eval_pt_func result should be included in the input pool. :type input_filter_func: callable :param output_filter_func: A callable that takes a key and a value and returns True if the key-value pair from the eval_pt_func result should be included in the output pool. :type output_filter_func: callable

Returns:

inputs_pool (List[Dict[str, torch.Tensor]]) – A list of inputs to pass into Model.RunWithTensors.
outputs_pool (List[Dict[str, torch.Tensor]]) – A list of outputs to pass into Model.RunWithTensors.

aitemplate.testing.benchmark_ait.run_benchmark(*, ait_module, inputs_pool, outputs_pool, num_iters, num_warmup_iters, stream: Optional[Stream] = None, sync: bool = False, graph_mode: bool = False)[source]

Run the benchmark. :param ait_module: The AIT module to run. :type ait_module: Model :param inputs_pool: A list of inputs to pass into Model.RunWithTensors. :type inputs_pool: List[Dict[str, torch.Tensor]] :param outputs_pool: A list of outputs to pass into Model.RunWithTensors. :type outputs_pool: List[Dict[str, torch.Tensor]] :param num_iters: The number of iterations to run. :type num_iters: int :param num_warmup_iters: The number of warmup iterations to run. :type num_warmup_iters: int :param stream: The CUDA stream to run the module on; if None, use the default stream. :type stream: Optional[torch.cuda.Stream] :param sync: Whether to synchronize the CUDA stream after each iteration. :type sync: bool :param graph_mode: Whether to run the module in graph mode. :type graph_mode: bool

Returns:: The average time per iteration in milliseconds.
Return type:: float

aitemplate.testing.benchmark_ait.run_module_with_pools(*, ait_module, inputs_pool, outputs_pool, num_iters, stream_ptr: Optional[int] = None, sync: bool = False, graph_mode: bool = False)[source]: Run the module with the given inputs and outputs pools. :param ait_module: The AIT module to run. :type ait_module: Model :param inputs_pool: A list of inputs to pass into Model.RunWithTensors. :type inputs_pool: List[Dict[str, torch.Tensor]] :param outputs_pool: A list of outputs to pass into Model.RunWithTensors. :type outputs_pool: List[Dict[str, torch.Tensor]] :param num_iters: The number of iterations to run. :type num_iters: int :param stream_ptr: The CUDA stream pointer to run the module on; if None, use the legacy stream. :type stream_ptr: Optional[int] :param sync: Whether to synchronize the CUDA stream after each iteration. :type sync: bool :param graph_mode: Whether to run the module in graph mode. :type graph_mode: bool