Difference between revisions of "Acceleration and Transprecision"
From iis-projects
Line 1: | Line 1: | ||
− | [[File:NVIDIA Tesla V100.jpg]] | + | [[File:NVIDIA Tesla V100.jpg|thumb|right|A NVIDIA Tesla V100 GP-GPU. This cutting-edge accelerator provides huge computational power on a [https://arstechnica.com/gadgets/2017/05/nvidia-tesla-v100-gpu-details/ massive 800 mm² die].]] |
− | [[File:Google Cloud TPU.jpg]] | + | [[File:Google Cloud TPU.jpg|thumb|right|Google's Cloud TPU (Tensor Processing Unit). This machine learning accelerator can do one thing extremely well: multiply-accumulate operations.]] |
====Francesco Conti==== | ====Francesco Conti==== |
Revision as of 16:22, 7 November 2017
Contents
Francesco Conti
- fconti@iis.ee.ethz.ch
- ETZ J78
Stefan Mach
- smach@iis.ee.ethz.ch
- ETZ J89
Fabian Schuiki
- fschuiki@iis.ee.ethz.ch
- ETZ J89
Available Projects
- Extending our FPU with Internal High-Precision Accumulation (M)
- Low Precision Ara for ML
- Hardware Exploration of Shared-Exponent MiniFloats (M)
- Approximate Matrix Multiplication based Hardware Accelerator to achieve the next 10x in Energy Efficiency: Full System Intregration
- Extended Verification for Ara
- Ibex: Tightly-Coupled Accelerators and ISA Extensions
- RVfplib
- Scalable Heterogeneous L1 Memory Interconnect for Smart Accelerator Coupling in Ultra-Low Power Multicores
Projects In Progress
- Fault-Tolerant Floating-Point Units (M)
- Virtual Memory Ara
- New RVV 1.0 Vector Instructions for Ara
- Big Data Analytics Benchmarks for Ara
- An all Standard-Cell Based Energy Efficient HW Accelerator for DSP and Deep Learning Applications
Completed Projects
- Integrating an Open-Source Double-Precision Floating-Point DivSqrt Unit into CVFPU (1S)
- Investigating the Cost of Special-Case Handling in Low-Precision Floating-Point Dot Product Units (1S)
- Optimizing the Pipeline in our Floating Point Architectures (1S)
- Streaming Integer Extensions for Snitch (M/1-2S)
- A Unified Compute Kernel Library for Snitch (1-2S)
- NVDLA meets PULP
- Hardware Accelerators for Lossless Quantized Deep Neural Networks
- Floating-Point Divide & Square Root Unit for Transprecision
- Low-Energy Cluster-Coupled Vector Coprocessor for Special-Purpose PULP Acceleration
- Design and Implementation of an Approximate Floating Point Unit