Compiler-Assisted Data Streaming for Regular Code Structures (bibtex)
by Nuno Neves, Pedro Tomás and Nuno Roma
Abstract:
The performance of modern processors is often limited by execution stalls resulting from long memory access latencies. Compile-time optimizations, deep cache hierarchies and prefetching mechanisms already provide significant performance gains, by performing memory accesses in parallel with computation. However, they are reaching a throughput improvement limit. Hence, new solutions that effectively exploit the memory access patterns to improve processing throughput are required. To achieve this objective, a new compiler-assisted data streaming method is proposed. It leverages static analysis and code transformations with an on-chip data streaming support as a viable alternative to prefetching mechanisms for regular code structures. Static analysis is used to identify and encode memory accesses with a dedicated representation. Then, a code transformation algorithm detaches data indexation and address calculation from computation, allowing for a significant code reduction. An on-chip data stream controller, attached to the L1 data cache, is used to autonomously generate memory accesses from the pattern representation and reorganize the data transfers in streams, with the aid of stream buffers. When compared with state-of-the-art prefetchers, the proposed solution provides up to 26% of code reduction, an IPC improvement of 2.4x, and an average performance improvement of 40%.
Reference:
N. Neves, P. Tomás and N. Roma, "Compiler-Assisted Data Streaming for Regular Code Structures", IEEE Transactions on Computers, vol. 70, no. 3, mar 2021, pp. 483–494.
Bibtex Entry:
@Article{tc19,
  author     = {Nuno Neves and Pedro Tomás and Nuno Roma},
  journal    = {IEEE Transactions on Computers},
  title      = {Compiler-Assisted Data Streaming for Regular Code Structures},
  year       = {2021},
  issn       = {1557-9956},
  month      = mar,
  number     = {3},
  pages      = {483--494},
  volume     = {70},
  abstract   = {The performance of modern processors is often limited by execution stalls resulting from long memory access latencies. Compile-time optimizations, deep cache hierarchies and prefetching mechanisms already provide significant performance gains, by performing memory accesses in parallel with computation. However, they are reaching a throughput improvement limit. Hence, new solutions that effectively exploit the memory access patterns to improve processing throughput are required. To achieve this objective, a new compiler-assisted data streaming method is proposed. It leverages static analysis and code transformations with an on-chip data streaming support as a viable alternative to prefetching mechanisms for regular code structures. Static analysis is used to identify and encode memory accesses with a dedicated representation. Then, a code transformation algorithm detaches data indexation and address calculation from computation, allowing for a significant code reduction. An on-chip data stream controller, attached to the L1 data cache, is used to autonomously generate memory accesses from the pattern representation and reorganize the data transfers in streams, with the aid of stream buffers. When compared with state-of-the-art prefetchers, the proposed solution provides up to 26% of code reduction, an IPC improvement of 2.4x, and an average performance improvement of 40%.},
  doi        = {10.1109/TC.2020.2990302},
  keywords   = {Prefetching;Tools;Static analysis;Runtime;Throughput;System-on-chip;Compiler Static Analysis;Data Streaming;Regular Code Structures;Indirect Memory Accesses, read},
  readstatus = {read},
  url        = {nfvr_pubs/tc19.pdf},
}
Powered by bibtexbrowser