Most state-of-the-art learning-based monocular depth depth estimators do not consider generalization and only benchmark their performance on publicly available datasets "only after specific fine tuning". Generalization can be achieved by training on several heterogeneous datasets but their collection and labeling is costly. In this work, we propose two Deep Neural Networks (one based on CNN and one on LSTM) for monocular depth estimation for drones, which we train on heterogeneous synthetic datasets (drones flying in forest and urban scenarios), generated using Unreal Engine. We show that, although trained only on synthetic data, the network is able to generalize well across different, unseen real-world scenarios (KITTI and new collected datasets from Zurich, Switzerland, and Perugia, Italy) without any fine-tuning. We achieve comparable performance to state-of-the-art methods, with very small depth errors even with distances of up to 40 meters. We also show that the network copes well with the varying height, yaw, pitch, and roll of the drone, which state-of-the-art depth-estimation methods do not show as they mostly tackle automotive scenarios. Finally, we also show that the LSTM network is able to estimate well the absolute scale with low additional computational overhead. We release the Unreal Engine 3D models and all the collected datasets (from Switzerland and Italy) freely to the public.
Reference:
M. Mancini, G. Costante, P. Valigi, T.A. Ciarfuglia, J. Delmerico, D. Scaramuzza
Towards Domain Independence for Learning-Based Monocular Depth Estimation
IEEE Robotics and Automation Letters (RA-L), 2017.
PDF: [ Ссылка ]
Datasets: [ Ссылка ]
Our research page on deep learning:
[ Ссылка ]
Robotics and Perception Group, University of Zurich, 2017
[ Ссылка ]
![](https://i.ytimg.com/vi/UfoAkYLb-5I/maxresdefault.jpg)