Deep reinforcement learning to control Rayleigh-Bénard convection (Part 1/4: baseline convergence)