Domain-specific systems-on-chip (DSSoCs), a class of heterogeneous many-core systems, are recognized as a key approach to narrow down the performance and energy-efficiency gap between custom hardware accelerators and programmable processors. Reaching the full potential of these architectures depends critically on optimally scheduling the applications to available resources at runtime. Existing optimization-based techniques cannot achieve this objective at runtime due to the combinatorial nature of the task scheduling problem.
The success of DSSoCs hinges on satisfying two intertwined requirements. First, the available processing elements (PEs) must be utilized optimally at runtime to execute the incoming tasks. For instance, scheduling all tasks to general-purpose cores may work, but diminishes the benefits of the special-purpose PEs. Likewise, a static task-to-PE mapping could unnecessarily stall the parallel instances of the same task. Second, to make DSSoCs practical, acceleration of the domain-specific applications must be oblivious to the application developers.
Researchers at Arizona State University, the University of Wisconsin-Madison, the University of Texas at Austin, the University of Arizona, and Carnegie Mellon University, have developed a hierarchical imitation learning (IL)-based scheduler that learns from an Oracle to maximize the performance of multiple domain-specific applications. Unlike common heterogenous many-core systems that emphasize optimization-based scheduling and rely on manual tuning, this innovation implements a runtime scheduling platform using a classification-based approach. Hence, the effectiveness of machine learning (ML) can be exploited for imitation learning, and as a result, enables complex scheduling decisions to be represented as a set of features from which the most suitable PE can be selected for task execution.
• Domain-specific systems-on-chip (DSSoCs) including applications in wireless communications and radar domains
Benefits and Advantages
• Extensive evaluations with six streaming applications show over 99% accuracy in approximating the Oracle with imitation learning policies
• Scheduling decisions are made within 1.1μs on an Arm A53 core, outperforming complete fair schedulers (CFS)
• To the best of the inventors’ knowledge, this is the first imitation learning-based scheduling framework for heterogeneous many-core systems capable of handling multiple applications exhibiting streaming behavior
Related publication: Runtime Task Scheduling using Imitation Learning for Heterogeneous Many-Core System