AI model efficiency is crucial for making AI ubiquitous, leading to smarter devices and enhanced lives. Besides the performance benefit, quantized neural networks also increase power efficiency for two reasons: reduced memory access costs and increased compute efficiency.
The quantization work done by the Qualcomm AI Research team is crucial in implementing machine learning algorithms on low-power edge devices. In network quantization, we focus on both pushing the state-of-the-art (SOTA) in compression and making quantized inference as easy to access as possible. For example, our SOTA work on oscillations in quantization-aware training that push the boundaries of what is possible with INT4 quantization. Furthermore, for ease of deployment, the integer formats such as INT16 and INT8 give comparable performance to floating point, i.e., FP16 and FP8, but have significantly better performance-per-watt performance. Researchers and developers can make use of this quantization research to successfully optimize and deploy their models across devices with open-sourced tools like AI Model Efficiency Toolkit (AIMET).
In this webinar you will learn about:
• The state-of-the-art in AI model efficiency from Qualcomm AI Research’s latest papers
• How 4-bit integer weight quantization is possible without sacrificing much accuracy using our advanced Quantization-Aware-Training (QAT) techniques
• The benefits of quantization and why we recommend integer inference (INT4, INT8, INT16) over floating point inference (FP8, FP16, FP32)
• How Qualcomm’s AI research in quantization is feasible in real-life applications
• The tools available for AI developers to implement their models on low-power edge devices: AIMET and the AIMET Model Zoo
Presenters: Tijmen Blankevoort and Chirag Patel
Ещё видео!