UVM CS302 Modeling Complex Systems, 2018.
Prof. Laurent Hébert-Dufresne.
Using Policy Gradient RL to learn the rules of a stochastic cellular automata.
The agent is rewarded for matching Lotka-Volterra predator-prey dynamics and for transitions that are "spatially appropriate".
For comparison, a hand-engineered stochastic predator-prey CA can be seen here:
[ Ссылка ].
Ещё видео!