skip to main content
Primo Search
Search in: Busca Geral

An architecture interface and offload model for low-overhead, near-data, distributed accelerators

Baskaran, Saambhavi ; Kandemir, Mahmut Taylan ; Sampson, Jack

2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO), 2022, p.1160-1177

IEEE

Sem texto completo

Citações Citado por
  • Título:
    An architecture interface and offload model for low-overhead, near-data, distributed accelerators
  • Autor: Baskaran, Saambhavi ; Kandemir, Mahmut Taylan ; Sampson, Jack
  • Assuntos: Computational modeling ; Computer architecture ; Data models ; distributed accelerator ; Distributed databases ; Energy efficiency ; heterogeneous architecture interface ; near-data offload ; Out of order ; Semantics
  • É parte de: 2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO), 2022, p.1160-1177
  • Descrição: The performance and energy costs of coordinating and performing data movement have led to proposals adding compute units and/or specialized access units to the memory hierarchy. However, current on-chip offload models are restricted to fixed compute and access pattern types, which limits software-driven optimizations and the applicability of such an offload interface to heterogeneous accelerator resources. This paper presents a computation offload interface for multi-core systems augmented with distributed on-chip accelerators. With energy-efficiency as the primary goal, we define mechanisms to identify offload partitioning, create a low-overhead execution model to sequence these fine-grained operations, and evaluate a set of workloads to identify the complexity needed to achieve distributed near-data execution. We demonstrate that our model and interface, combining features of dataflow in parallel with near-data processing engines, can be profitably applied to memory hierarchies augmented with either specialized compute substrates or lightweight near-memory cores. We differentiate the benefits stemming from each of elevating data access semantics, near-data computation, inter-accelerator coordination, and compute/access logic specialization. Experimental results indicate a geometric mean (energy efficiency improvement; speedup; data movement reduction) of (3.3; 1.59; 2.4)\times, (2.46; 1.43; 3.5)\times and (1.46; 1.65; 1.48)\times compared to an out-of-order processor, monolithic accelerator with centralized accesses and monolithic accelerator with decentralized accesses, respectively. Evaluating both lightweight core and CGRA fabric implementations highlights model flexibility and quantifies the benefits of compute specialization for energy efficiency and speedup at 1.23\times and 1.43\times, respectively.
  • Editor: IEEE
  • Idioma: Inglês

Buscando em bases de dados remotas. Favor aguardar.