This summer I worked with Professor Esra Kadioglu. Our research is about coverage path planning for one or multiple drones. Finding the shortest path for multiple drones to cover a field is a computationally “hard” problem. Instead of using a conventional path planning algorithm, we posed the following question: Can a drone learn to find a coverage path using Reinforcement Learning (RL)? RL imitates people’s psychology of learning new things. We reward the agent’s good behavior while punishing its bad behavior. In our research, we used OpenAI’s Gym tool to set up an environment such that multiple drones travel and find the coverage paths on a grid world. At each step, we ask each agent to make a move. As the agent is trained, the move becomes less random and more “rewarding”. We give a positive reward if it explores an unvisited grid, and a negative reward if it revisits a grid. Revisiting a grid translates as a longer path and causes waste of battery on the drone. We tested our environment using the Actor-Critic algorithm provided by Stable Baselines. We are able to find a collaborative, non-colliding coverage path by a team of 2 drones on a 10×10 field. The research made me realize that innovative thinking is critical. When the traditional approach does not work, we go on a novel route to attack the challenge.
Ещё видео!