Shady Agwa - EVE: Ephemeral Vector Engines

Journal article

Khalid Al-Hawaj, T. Ta, Nick Cebry, Shady Agwa, O. Afuye, Eric Hall, Courtney Golden, A. Apsel, C. Batten
International Symposium on High-Performance Computer Architecture, 2023

Semantic Scholar DBLP DOI

Cite

APA Click to copy
Al-Hawaj, K., Ta, T., Cebry, N., Agwa, S., Afuye, O., Hall, E., … Batten, C. (2023). EVE: Ephemeral Vector Engines. International Symposium on High-Performance Computer Architecture.

Chicago/Turabian Click to copy
Al-Hawaj, Khalid, T. Ta, Nick Cebry, Shady Agwa, O. Afuye, Eric Hall, Courtney Golden, A. Apsel, and C. Batten. “EVE: Ephemeral Vector Engines.” International Symposium on High-Performance Computer Architecture (2023).

MLA Click to copy
Al-Hawaj, Khalid, et al. “EVE: Ephemeral Vector Engines.” International Symposium on High-Performance Computer Architecture, 2023.

BibTeX Click to copy

@article{khalid2023a,
  title = {EVE: Ephemeral Vector Engines},
  year = {2023},
  journal = {International Symposium on High-Performance Computer Architecture},
  author = {Al-Hawaj, Khalid and Ta, T. and Cebry, Nick and Agwa, Shady and Afuye, O. and Hall, Eric and Golden, Courtney and Apsel, A. and Batten, C.}
}

Abstract

There has been a resurgence of interest in vector architectures evident by recent adoption of vector extensions in mainstream instruction set architectures. Traditionally, vector engines leverage this abstraction by exploiting its inherent regularity to increase performance and efficiency. Recent work on SRAM-based compute-in-memory has shown promise in reducing the area overhead of these engines. In this work, we propose ephemeral vector engines (EVE) where we leverage SRAM-based compute-in-memory techniquesas well as bit-peripheral computations to facilitate efficient vector execution. EVE uses a novel approach of bit-hybrid execution, striking a balance between throughput and latency. Evaluated on the Rodinia and RiVEC benchmark suites, EVE achieves almost 8× speed-up compared to an out-of-order processor and 4.59× compared to an integrated vector unit. EVE achieves speed-ups comparable to an aggressive decoupled vector unit and increases the area-normalized performance by over 2 ×. By repurposing SRAM arrays in the L2 cache to create ephemeral vector execution units, EVE is able to efficiently achieve high performance while incurring as little as 11.7% area overhead.