Transcription of 8-bit Inference with TensorRT
{{id}} {{{paragraph}}}
8-bit Inference with TensorRTSzymon Migacz, NVIDIAMay 8, 2017 Intro Goal: Convert FP32 CNNs into INT8 without significant accuracy loss. Why: INT8 math has higher throughput, and lower memory requirements. Challenge: INT8 has significantly lower precision and dynamic range than FP32. Solution: Minimize loss of information when quantizing trained model weights to INT8 and during INT8 computation of activations. Result: Method was implemented in TensorRT . It does not require any additional fine tuning or INT8 compute Quantization Calibration Workflow in TensorRT ResultsINT8 InferenceChallenge INT8 has significantly lower precision and dynamic range compared to FP32.
High-throughput INT8 math Requires sm_61+ (Pascal TitanX, GTX 1080, Tesla P4, P40 and others). Four-way byte dot product accumulated in 32-bit result.
Domain:
Source:
Link to this page:
Please notify us if you found a problem with this document:
{{id}} {{{paragraph}}}