The rapid advancement of the internet and data-collecting devices has sparked increasing demand for computing solutions that are both energy-efficient and high-performance. Coarse-grained reconfigurable arrays (CGRAs) have gained traction as this low-power alternative, offering accelerators capable of supporting the compute-intensive process of collecting, processing, and communicating data via mobile devices. This is achieved by mapping compute-intensive loops onto a 2-D grid of Processing Elements (PEs), with functional units that receive instructions from the memory and compute arithmetic operations. A significant limitation, however, with existing CGRA mapping techniques is that, when an attempt to map loops onto the CGRA fails, these techniques either discard the current mapping and restart anew, or backtrack to the previously mapped node. Restarting techniques do not learn anything from the failure, while backtracking recursively unmaps the last mapped node, even when it may not be the root cause of the problem; nevertheless, both methods result in reduced overall mapping quality and significant increases in compilation time.
Researchers at Arizona State University have developed PathSeeker — a mapping algorithm that analyzes mapping failures and performs local adjustments to the schedule to obtain a mapping. Instead of backtracking or restarting the mapping like previous methods, PathSeeker analyzes the predecessor and successor nodes to locate the underlying reason for the failed mapping. It then explores local transformations for the failed node(s) to achieve a valid mapping. If these local transformations fail, PathSeeker iteratively explores different PE positions of the other nodes in the time-slot of the failed node, the predecessor, and successor until a valid mapping is found.
Preliminary results found significantly improved mapping quality and a remarkable reduction in compilation time compared to existing, state-of-the-art approaches, with PathSeeker achieving a 28% better performance at a 550x compilation speedup over GraphMinor and a 3% improvement at a 10x compilation speedup over RAMP.
Related publications: PathSeeker: A Fast-Mapping Algorithm for CGRAs
Potential Applications:
- Coarse-grained reconfigurable arrays (CGRAs)
- Parallel computation compilers
- Machine Learning compilers
- Convolution Neural Networks (CNNs)
- Deep Neural Networks (DNNs)
Benefits and Advantages:
- Improved compilation time
- Higher quality of mapping