Background
Multi-agent reinforcement learning (MARL) involves a number of agents interacting within a common environment to jointly maximize a long-term reward. However, the key challenge with MARL is the combinatorial nature, which results in high computational complexity. The behavior of each individual agent relies on what has been learned by the other agents, creating a non-stationary learning problem. Also, when more complex problems are introduced, there are only sparse rewards over a long period of time for each agent, which creates additional challenges for effective learning.
Invention Description
Researchers at Arizona State University have developed a new decentralized graph-based reinforcement learning using reward machines (DGRM) framework that enables a collection of agents to solve complex temporally extended tasks under coupled dynamics with enhanced computation efficiency. This framework uses reward machines (RMs) to describe the environment, track progress through the task, and encode a sparse reward function for each agent. This helps to ensure consistent reward expectations are met and decentralized problem-solving is supported for complex, temporally extended tasks.
Potential Applications:
- Decision-making frameworks for autonomous systems
- Robotics & unmanned vehicles
- Wireless communication networks
Benefits and Advantages:
- Consistent reward expectations – transforms non-Markovian rewards into Markovian setting
- Decentralized approach – overcomes sparse rewards and non-stationary issues
- Enhanced coordination among multiple agents – includes high-level task specifications
- Reduces computational complexity – incorporates localized policy learning and truncated Q-functions
- Improves scalability – can be applied across multiple complex settings
Related Publication: Decentralized Graph-Based Multi-Agent Reinforcement Learning using Reward Machines