Shady Agwa - Implementing Low-Diameter On-Chip Networks for Manycore

Journal article

Yanghui Ou, Shady Agwa, C. Batten
ACM/IEEE International Symposium on Networks-on-Chips, 2020

Cite

APA Click to copy
Ou, Y., Agwa, S., & Batten, C. (2020). Implementing Low-Diameter On-Chip Networks for Manycore Processors Using a Tiled Physical Design Methodology. ACM/IEEE International Symposium on Networks-on-Chips.

Chicago/Turabian Click to copy
Ou, Yanghui, Shady Agwa, and C. Batten. “Implementing Low-Diameter On-Chip Networks for Manycore Processors Using a Tiled Physical Design Methodology.” ACM/IEEE International Symposium on Networks-on-Chips (2020).

MLA Click to copy
Ou, Yanghui, et al. “Implementing Low-Diameter On-Chip Networks for Manycore Processors Using a Tiled Physical Design Methodology.” ACM/IEEE International Symposium on Networks-on-Chips, 2020.

BibTeX Click to copy

@article{yanghui2020a,
  title = {Implementing Low-Diameter On-Chip Networks for Manycore Processors Using a Tiled Physical Design Methodology},
  year = {2020},
  journal = {ACM/IEEE International Symposium on Networks-on-Chips},
  author = {Ou, Yanghui and Agwa, Shady and Batten, C.}
}

Abstract

Manycore processors are now integrating up to 1000 simple cores into a single die, yet these processors still rely on high-diameter mesh on-chip networks (OCNs) without complex flow-control nor custom circuits due to three reasons: (1) manycores require simple, low-area routers; (2) manycores usually use standard-cell-based design; and (3) manycores use a tiled physical design methodology. In this paper, we explore mesh and torus topologies with internal concentration and/or ruche channels that require low area overhead and can be implemented using a traditional standard-cell-based tiled physical design methodology. We use a combination of analytical and RTL modeling along with layout-level results for both hard macros and a 3×3mm 256terminal OCN in a 14-nm technology for twelve topologies. Critically, the networks we study use a tiled physical design methodology meaning they: (1) tile a homogeneous hard macro across the chip; (2) implement chip top-level routing between hard macros via short wires to neighboring macros; and (3) use timing closure for the hard macro to quickly close timing at the chip top-level. Our results suggest that a concentration factor of four and a ruche factor of two in a 2D-mesh topology can reduce latency by over 2× at similar area and bisection bandwidth for both small and large messages compared to a 2D-mesh baseline.