PDF4PRO ⚡AMP

Modern search engine that looking for books and documents around the web

Example: bankruptcy

8-bit Inference with TensorRT

8-bit Inference with TensorRTSzymon Migacz, NVIDIAMay 8, 2017 Intro Goal: Convert FP32 CNNs into INT8 without significant accuracy loss. Why: INT8 math has higher throughput, and lower memory requirements. Challenge: INT8 has significantly lower precision and dynamic range than FP32. Solution: Minimize loss of information when quantizing trained model weights to INT8 and during INT8 computation of activations. Result: Method was implemented in TensorRT . It does not require any additional fine tuning or INT8 compute Quantization Calibration Workflow in TensorRT ResultsINT8 InferenceChallenge INT8 has significantly lower precision and dynamic range compared to FP32.

High-throughput INT8 math Requires sm_61+ (Pascal TitanX, GTX 1080, Tesla P4, P40 and others). Four-way byte dot product accumulated in 32-bit result.

Loading..

Tags:

  Tensorrt

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Spam in document Broken preview Other abuse

Transcription of 8-bit Inference with TensorRT

Related search queries