Wednesday October 23, 2024 3:15pm - 3:55pm PDT
High Performance and Efficiency 512-B & 1024-B VLEN Vector Processor and AI Related Accelerator - Nathan Ma, Nuclei System Technology
In this presentation, we delve into the powerful synergy between RISC-V Vector Processing, with a spotlight on the transformative RVV1.0 extension (specifically on VLEN=512b and 1024b), and AI acceleration. RISC-V, becomes even more impactful with the introduction of the RVV1.0 extension, specifically designed to elevate vector processing capabilities. In 2024, we released our Intelligence Class Core IP Series, specifically focus on AI applications and others require intensive parallel vector computing capability.
Enhancing RISC-V ISA to Support Sub-FP8 Quantization for Machine Learning Models -
Mengshiun Yu & Jhih-Kuan Lin, National Tsing Hua University
In this session we'll present our research proposes extending the RISC-V Instruction Set Architecture (ISA) to support sub-FP8 quantized data formats, optimizing AI and machine learning models for low-power edge devices. The study develops new instructions to enable the RISC-V CPU core to handle data types below FP8, such as 6-bit and 4-bit formats. These improvements enhance AI workload performance and energy efficiency, allowing complex machine learning tasks to be performed locally on edge devices like smartphones, IoT devices, and wearables. The proposed ISA extension supports mixed-precision workloads and ensures backward compatibility with existing hardware for easy adoption. The research includes designing a new sub-FP8 extension with computational, configuration, load/store, and conversion instructions. The design is demonstrated with two examples using assembly code: one for adding two FP8 (E5M2) values and another for performing saxpy computation with vector extension.
The Efficient Way to Design a RISC-V Edge AI Processor with Software Hardware Co-Design Methodology - Meng Zhang, Terapines Technology (Wuhan) Co., Ltd
This talk will show you how to improve the performance of an AI model running on a virtualized RISC-V architecture with software hardware co-design methodology. This method can be done all the way from micro-architecture design, to support adding customized instructions in compiler, debugger and simulator, and to profile AI model performance on virtualized platform by one person in as short as a few hours, without knowing how to customize compiler, debugger or simulator as all of those have automatically done in the our software hardware co-design flow.
Creating Custom RISC-V Processors Using ASIP Design Tools: A Neural Network Acceleration Case Study - Gert Goossens, Synopsys
The AI revolution triggers an increased awareness for application-specific instruction-set processors (ASIPs). A RISC-V architecture can be extended with specialized datapaths, storages, and custom instructions to accelerate AI workloads. New instructions can be encoded in RISC-V's reserved opcode space or in additional parallel slots of an extended long instruction word. Notwithstanding the specialization, compatibility with and reuse of the RISC-V ecosystem is maintained.
Synopsys’ ASIP Designer tool-suite enables the design of custom RISC-V processors. Starting from a formal ISA model, it assists designers in selecting ISA extensions, generates an SDK with an optimizing compiler supporting the extensions, and produces an efficient RTL implementation.
We illustrate this approach with the design of a custom RISC-V processor to accelerate convolutional neural network algorithms for edge AI, with programming support for TensorFlow Lite for Microcontrollers (TFLM). ISA specialization includes the introduction of 4-lane SIMD with a local vector memory, 4 specialized convolution units with 16 multipliers each, dedicated accumulator registers, and 2-way instruction-level parallelism.
Towards an Integrated Matrix Extension: Workload Analysis of CNN Inference with QEMU TCG Plugins - Matheus Ferst, Instituto de Pesquisas ELDORADO
Following the gap analysis done in the second half of 2023, the SIG-Vector has been working on specifying instructions to accelerate matrix operations. Two Task Groups were proposed to explore different approaches. The "Attached Matrix Extension" (AME) is working on a set of instructions independent of other extensions and requires new registers to hold matrix data. The Integrated Matrix Extension (IME) proposes the reuse of the Vector Registers introduced by the V extension. The AME solution is similar to how other architectures added matrix operations, like Intel's AMX and ARM's SME, while the IME proposal resembles how the POWER architecture added matrix operations. The IME might also help applications that interleave matrix and vector operations by avoiding data movement between different types of registers.
To verify how commonly that happens on AI/ML workloads, we developed a QEMU TCG Plugin to instrument the inference of eight CNN models optimized to use the IME-like POWER10 matrix instructions. The results also show some types of vector operations that interact with matrix data and would be helpful in an AME implementation to avoid sending data back to memory.
Enhancing the Future of AI/ML with Attached Matrix Extension - Jing Qui, Alibaba
We've now updated Xuantie Attached Matrix Extension ISA to keep pace with rapid advances in AI.
The new matrix ISA uses 64-bit instructions. These self-contained long instructions can support more architectural registers, facilitate sparse operations, include longer immediates and more metadata. This enhanced encoding scheme increases both the flexibility and efficiency of matrix computations. Another enhancement is the introduction of structured sparsity techniques that allow for variable sparsity ratios (N:M sparsity) across k dimensions. The new extension also supports innovative data types, such as int4/fp8, commonly used in large language models. In addition to multi-precision, it also supports mixed-precision operations. Har
Speakers
technology expert, Alibaba
QiuJing is a technology expert in the CPU R&D department at Alibaba. His current work focuses on the design and specification of the matrix-related and AI domain-specific architecture of the Xuantie processors.QiuJing received his Ph.D. in Circuit and System from Zhejiang University...
Read More →
Executive Director of Engineering, Synopsys
Gert Goossens is an Executive Director of Engineering at Synopsys, where he is currently leading the company’s tool development group for Application-Specific Instruction-set Processors (ASIPs). Previously, he was a co-founder and the CEO of Target Compiler Technologies, the company...
Read More →
Senior Director of Strategy and Business Development, Nuclei System Technology
Nathan Ma started his career in Marvell and SiFive before joined Nuclei as Senior Director of Strategy and Business Development. Nathan is now managing Nuclei's fund raising, technical marketing and global business development.
graduate student, National Tsing Hua University
Jhih-Kuan Lin is a dedicated graduate student at the Parallel and Distributed Systems Laboratory (PLLAB) in the Department of Computer Science at National Tsing Hua University (NTHU). Jhih-Kuan Lin's research focuses on the cutting-edge development and optimization of the RISC-V...
Read More →
Ph.D. candidate, National Tsing Hua University
MENG-SHIUN YU is currently a Ph.D. candidate in the Department of Computer Science at National Tsinghua University, Taiwan. His research interests include compiler optimization for deep neural networks and computer vision, and compiler construction for hardware accelerators. Currently...
Read More →
Software Engineer, Terapines Technology (Wuhan) Co., Ltd
Software Engineer from Company Terapines Technology (Wuhan) Co., Ltd
Software Developer, Instituto de Pesquisas ELDORADO
Matheus is a software developer at the Embedded Computing Department of Instituto de Pesquisas Eldorado. He graduated in Computer Engineering at Universidade Tecnológica Federal do Paraná and holds a Master's in Electrical Engineering from the same institution. He is also an open-source...
Read More →