AImotive’s new aiWare3P powers production NN acceleration for L2-L3 automotive AI
Latest release of award-winning aiWare3 hardware IP offers improved host CPU offload, lower memory bandwidth and upgraded SDK tools enabling scalable, low-power high-performance solutions from 1–100+ TOPS
Budapest, Hungary, 10th December 2019 — AImotive, one of the world’s leading suppliers of modular automated driving technologies, today announced that it has begun shipment of the latest release of its award-winning(**) aiWare NN (Neural Network) hardware inference engine IP to its lead customers. The aiWare3P IP core incorporates numerous new features that result in significantly improved performance, lower power consumption, greater host CPU offload and simpler layout for larger chip designs.
“Our production-ready aiWare3P release brings together everything we know about accelerating neural networks for vision-based automotive AI inference applications;” said Marton Feher, senior vice president of hardware engineering for AImotive. “We now have one of the automotive industry’s most efficient and compelling NN acceleration solutions for volume production L2/L2+/L3 AI. When complemented by AImotive’s significant automated driving expertise, we believe we offer the most technology-rich automotive-focused solutions available today”.
The aiWare3P hardware IP core offers up to 16 TMAC/s (>32 TOPS) at 2GHz, with multi-core and multi-chip implementations up to several hundred INT8 TOPS with little or no loss of efficiency. The core is designed for AEC-Q100 extended temperature operation and includes a range of features to enable users to achieve ASIL-B and above certification. Key features include:
- Higher efficiency for wider range of NN functions thanks to improved on-chip data reuse and movement, more sophisticated scheduling algorithms and upgraded external memory bandwidth management
- Support for a larger scope of embedded activation and pooling functions, ensuring that 100% of most NNs execute within the aiWare3P core without host CPU intervention
- Real-time data compression for all external memory operations
- Advanced cross-coupling between C-LAM convolution engines and F-LAM function engines, to further increase execution efficiency
- Physical tile-based microarchitecture, enabling easier physical implementation of large aiWare cores
- Logical tile-based data management, enabling scalability to >100 TOPS without the need for complex caches, NOCs or other multi-core processor-based approaches
- Upgraded SDK, including improved compiler and performance analysis tools for both offline estimation and real-time target hardware analysis
The aiWare3P hardware IP is being deployed in a range of L2/L2+ production solutions, as well as being adopted for studies of more advanced heterogeneous sensor applications. Customers include Nextchip for their forthcoming Apache5 Imaging Edge Processor, and ON Semiconductor for their collaborative project with AImotive to demonstrate advanced heterogeneous sensor fusion capabilities.
As part of their commitment to open benchmarking using benchmarks that reflect real applications, AImotive will be releasing a full update to their public benchmark results in January 2020 based on the aiWare3P IP core.
The aiWare3P RTL will be shipping to all customers from December 2019.
The aiWare hardware NN accelerator has been proven through extensive benchmarks* to be among the most efficient hardware NN accelerator architectures available for high-resolution automotive vision applications. The aiWare hardware IP supports a range of advanced architectural features that enable it to be ideal component within ISO26262 ASIL A, B and above certified subsystems.
Designed to scale up to more than 50 TMAC/s (>100 TOPS) at high efficiency and low power, aiWare’s low-level micro-architecture ensures the IP core needs far less host CPU or shared memory resources than other hardware NN accelerators. Leveraging a ground-up design for highly deterministic dataflow management, the unique, highly parallel memory-centric architecture features up to 100x more on-chip memory bandwidth than other hardware NN accelerators, ensuring up to 95% sustained efficiency for complex DNNs used with large inputs such as HD cameras.
Supporting Khronos’ NNEF as well as open standard ONNX inputs, the comprehensive aiWare SDK directly compiles binaries with no need for low-level programming of DSPs or MCUs. It includes automated FP32 to INT8 conversion with little or no loss of accuracy, alongside a growing portfolio of sophisticated DNN performance analysis tools.
* See the latest aiWare benchmarks at www.aimotive.com/benchmarks
** aiWare won best vision processor award at the Embedded Vision Summit 2018 — see https://www.embedded-vision.com/product-awards-2018-winners