Greedy Action Selection and Pessimistic Q-Value Updating in Multi-Agent Reinforcement Learning with Sparse Interaction
Greedy Action Selection and Pessimistic Q-Value Updating in Multi-Agent Reinforcement Learning with Sparse Interaction
Blog Article
Although multi-agent reinforcement learning (MARL) is a promising method for learning a collaborative action policy, enabling each agent to accomplish specified tasks, MARL has a problem of exponentially increasing state-action space.This state-action space can be dramatically reduced by assuming sparse interaction.We previously proposed three methods (greedily selecting actions, switching between Q-value update equations on the bushranger awning basis of the state of each agent in the next step, and their combination) for improving the performance of coordinating Q-learning (CQ-learning), a iphone 13 atlanta typical method for multi-agent reinforcement learning with sparse interaction.We have now modified the learning algorithm used in a combination of these two methods to enable it to cope with interference among more than two agents.
Evaluation of this enhanced method using two additional maze games from three perspectives (the number of steps to a goal, the number of augmented states, and the computational cost) demonstrated that the modified algorithm improves the performance of CQ-learning.