about

resources

events

contribute

republishing

☰

ΑΙhub.org

The role of computer vision in autonomous vehicles

by Md Ali Azam

10 September 2020

share this:

Recent advances in computer vision have revolutionized many areas of research including robotics, automation, and self-driving vehicles. The self-driving car industry has grown markedly in recent years, in no small part enabled by use of state-of-the-art computer vision techniques. However, there remain many challenges in the field. One of the most difficult problems in autonomous driving is perception. Once autonomous vehicles have an accurate perception of the world around them, planning and control become easier. This article primarily focuses on perception with computer vision and capabilities of computer vision and neural networks for use in fully autonomous self-driving vehicles.

Autonomous driving is a very challenging problem. Researchers and the auto industry have been working on developing autonomous vehicles for decades. One such company is General Motors who, in 1958, produced a self-driving car that was guided by a radio-controlled electromagnetic field. A number of car companies improved the technology based upon this idea. However, the challenge of achieving full autonomy remained. The journey reached an interesting point in 2005 when a few teams of scientists were able to complete the DARPA grand challenge (details in the video below). This challenge involves a 240 kilometre desert course. Continuous efforts from many scientists have made it possible to trial autonomous vehicles on public roads.

Following the DARPA challenge in 2005, researchers highlighted the criticality of perception of the world around the vehicle. Since then, a lot of companies began to develop autonomous cars focusing primarily on perception using vision. Autonomous car companies have varying strategies for achieving perception around an autonomous car. Most companies use some combination of RADAR, LIDAR and SONAR, and cameras. Tesla is the only large company that does not use LIDAR in its autonomous cars, primarily focusing on RADAR and cameras, and also making use of SONAR to detect near field objects. Despite the variation between companies, almost all of them place computer vision technologies to the fore.

Despite recent progress, autonomous driving still faces great challenges in representing the 3D world around a vehicle using computer vision only. It is difficult to achieve accurate representation because cameras generate 2D images and do not directly provide depth information of objects. Although many papers have been published on 3D reconstruction from multiple 2D images from cameras at different locations, 3D reconstruction is computationally expensive [1]. Therefore, some companies are using RADAR and LIDAR for depth perception of objects in the scene.

RADAR is cheap, but only gives us the range of an object. LIDAR, on the other hand, is expensive but provides 3D a point cloud around a vehicle with great accuracy. One benefit of LIDAR is that it has better resolution than RADAR. A disadvantage is that it has performance issues in opaque media. As such, LIDAR cannot be relied on in foggy weather, for example.

Another shortcoming of LIDAR is that it is sometimes difficult or impossible to understand exactly what the object they detect is. For example, if LIDAR sees a lightweight object on the road such as a plastic bag, it gives us just the point cloud. Sometimes it might be difficult to detect whether this is a plastic bag or a heavy object, a rock, or some other heavy object. Action taken by an autonomous vehicle will be significantly different based on the object the vehicle detects it to be. We do not want our vehicle to hit a heavy rock. However, in the case of plastic bags on the road, the vehicle does not even need to slow down. An advantage of using computer vision is that it is possible detect the difference between a plastic bag and a rock.

Let us look at a specific scenario: a biker is riding in the right lane, and the biker is looking at the left side to see if there is a car approaching from behind. The vehicle might be able to understand that this is a biker on the right lane from LIDAR point cloud. However, the advantage of using vision is that it could additionally tell us which direction the biker is looking in. If the vehicle has the information that this biker is looking at the left lane, the vehicle might be able to predict that the biker is planning to merge to the left lane and the vehicle needs to slow down to accommodate enough space for the biker. Vision could also detect if a pedestrian is distracted by their phone and approaching your lane.

Computer vision can give us a lot of information. However, accurate depth perception is still a challenge. There are some techniques for depth estimation and 3D reconstruction from vision only. Using multiple 2D images, it is possible to reconstruct a scene in 3D. One of these approaches is called multi-view stereo (MVS). First, multiple 2D images are analyzed and using structure from motion (SfM), camera poses of each 2D image are generated. SfM also gives point clouds in 3D. The multi-view stereo technique uses this point cloud from different camera poses to make 3D dense point cloud. Some research papers showed that using these techniques scenes can be well represented in 3D [2].

Depth perception can also be achieved using neural networks. The authors of [3] proposed SfM-net to estimate depth of the objects given a sequence of frames. SfM-net can be trained with various degrees of supervision, e.g., self-supervised by the reprojection photometric error, supervised by ego-motion, or supervised by depth. Self-supervised neural networks can be trained using raw videos into neural networks without any labels and it is possible to learn depth [4]. The neural network predicts depth in every single frame of the video. The objective of the network is to be consistent (and correct) over time. The network automatically predicts the depth for all the pixels.

In summary, in order to achieve fully autonomous vehicles accurate computer vision is a necessity, and the neural networks used must provide a complete representation of the surrounding environment. There has been a huge advance during recent years, but there is still much work to be done.

[1] SUN3D: A Database of Big Spaces Reconstructed using SfM and Object Labels, J. Xiao et al (2013).
[2] Building Rome in a day, S. Agarwal et al (2011).
[3] SfM-Net: Learning of Structure and Motion from Video, S. Vijayanarasimhan et al (2017).
[4] Andrej Karpathy speaks at the Tesla Autonomy day 2019

Md Ali Azam is studying Electrical Engineering and working as a research assistant at South Dakota School of Mines and Technology (SDSMT).

AIhub is supported by:

Forthcoming machine learning and AI seminars: July 2025 edition

Lucy Smith 30 Jun 2025

A list of free-to-attend AI-related seminars that are scheduled to take place between 1 July and 31 August 2025.

monthly digest

AIhub monthly digest: June 2025 – gearing up for RoboCup 2025, privacy-preserving models, and mitigating biases in LLMs

Lucy Smith 26 Jun 2025

Welcome to our monthly digest, where you can catch up with AI research, events and news from the month past.

RoboCupRescue: an interview with Adam Jacoff

Lucy Smith 25 Jun 2025

Find out what's new in the RoboCupRescue League this year.

Making optimal decisions without having all the cards in hand

Nathanaël Fijalkow, Hugo Gimbert, Florian Horn, Guillermo Perez and Pierre Vandenhove 24 Jun 2025

Read about research which won an outstanding paper award at AAAI 2025.

Exploring counterfactuals in continuous-action reinforcement learning

Shuyang Dong 20 Jun 2025

Shuyang Dong writes about her work that will be presented at IJCAI 2025.

What is vibe coding? A computer scientist explains what it means to have AI write computer code − and what risks that can entail

The Conversation 19 Jun 2025

Until recently, most computer code was written, at least originally, by human beings. But with the advent of GenAI, that has begun to change.

Gearing up for RoboCupJunior: Interview with Ana Patrícia Magalhães

Lucy Smith 18 Jun 2025

We hear from the organiser of RoboCupJunior 2025 and find out how the preparations are going for the event.

Interview with Mahammed Kamruzzaman: Understanding and mitigating biases in large language models

Lucy Smith 17 Jun 2025

Find out how Mahammed is investigating multiple facets of biases in LLMs.