for the complete work space check out my GitHub
The project implements a complete vision-based human-following system on a GoBilda mobile robot using a combination of classical control, imitation learning, and ROS2 integration. A DepthAI camera provides person detections through a Mobilenet-based model, from which the system extracts two key signals: the horizontal bounding-box center cx and bounding-box height s, used as a proxy for distance. A custom ROS2 logging node records synchronized detection data and teleoperation commands at 10 Hz, producing a dataset that is later cleaned, normalized, and augmented with temporal history.
Data preprocessing transforms raw pixel measurements into normalized control errors and constructs a six-dimensional feature vector representing three consecutive timesteps. Teleoperation commands are discretized into four action classes, framing human following as a classification task. Two MLP models are trained and evaluated; a deeper 64–32 architecture is selected based on improved separation between forward and turning actions and a higher overall accuracy.
A deterministic proportional controller is also developed as a baseline for comparison. Both the learned policy and the deterministic controller are deployed as ROS2 nodes, subscribing to detections, computing control actions, and publishing velocity commands in real time. Their performance in indoor corridor tests shows remarkably similar behavior, confirming the effectiveness of the imitation-learning pipeline. Outdoor experiments further validate robustness under challenging lighting and background conditions.
Overall, the project delivers a fully operational human-following pipeline, covering perception, data collection, machine-learning-based policy learning, baseline control, and real-time robotic deployment.