Vision Based Robotics


Vision-based robotics revolves around teaching robots to process and use visual information. A combination of hardware and software allows the robots to use visual data to make choices. The technology evolves and advances every day, pushing robotics forward as well. This promising field provides new solutions and opportunities across many industries.

At SuperDroid Robots, we are integrating state of the art machine vision algorithms on our platforms with the popular Robot Operating System (ROS) framework. We have experience using a broad range of sensor arrays for autonomous applications. If you need a custom autonomous soution, please fill out our custom request form.


These are useful implementations of vision-based algorithms for robotics. Vision-based algorithms offer an affordable alternative to GPS and LiDAR systems.


Intel Vision Tracking Camera.jpg

SLAM (Simultaneous Localization and Mapping) algorithms often use expensive LiDAR sensors and equipment. Research into vision-based alternatives resulted in Visual SLAM, or vSLAM. This low-cost alternative analyzes each video frame, processing the visual data inside. vSLAM identifies, extracts, and tracks features across the video. With a 3D camera, a robot can use vSLAM to determine 3D position and orientation. This helps it better understand its physical surroundings. This is important for autonomous robots to determine a path to reach its destination.

Zed Vision Camera.jpg

You can improve this method using data from IMU's and wheel encoders. This data can help ensure reliable position and orientation, which can prevent drift. When drift occurs, a stored map serves as ground truth while the sensor observes a familiar area. This is often referred to as loop closure.

Free Space Segmentation

Free space segmentation is an important concept in autonomy. It provides the data for navigation algorithms to plan safe paths to a given destination. This is a difficult task for vision systems, but there are ways to address it.

A geometric approach evaluates a point cloud generated by at least one 3D camera. This point cloud helps determine the terrain a robot is capable of traversing. This solution usually divides the 3D points into clusters. A gradient or surface normal can then calculate each cluster. The data then compares to the orientation and capabilities of the robot. From there, the cluster becomes classified as either a free space or an obstacle.

A machine learning approach can use a monocular camera but is a more complicated process. You'll need knowledge of neural networks, such as how to train, test, and deploy them. This approach varies from application to application and requires extensive data collection. Analysis of the operating environment is crucial before deploying the system. We won't go into the specifics of training and deploying neural networks here. The robot learns and predicts where it's allowed to go based on collected data and key markers.

Object Detection

Object identification can provide valuable input so a robot can make intelligent decisions. This is the process of taking camera frames and assigning labels to objects in view. A convolutional neural network is necessary for accurate predictions. This involves a large amount of data collection and annotation to train the network. This vision implementation is key in how the robot reacts to its environment and can open up a variety of uses.

You can use either a 3D camera or a monocular camera for object detection. Using a 3D camera allows the robot to gauge distance from an identified object. This extra datapoint can add to intelligent decision making and improve accuracy.

BoundingBox Vision Police GIF.gif