ΑΙhub.org
 

The role of computer vision in autonomous vehicles


by
10 September 2020



share this:


Recent advances in computer vision have revolutionized many areas of research including robotics, automation, and self-driving vehicles. The self-driving car industry has grown markedly in recent years, in no small part enabled by use of state-of-the-art computer vision techniques. However, there remain many challenges in the field. One of the most difficult problems in autonomous driving is perception. Once autonomous vehicles have an accurate perception of the world around them, planning and control become easier. This article primarily focuses on perception with computer vision and capabilities of computer vision and neural networks for use in fully autonomous self-driving vehicles.

Autonomous driving is a very challenging problem. Researchers and the auto industry have been working on developing autonomous vehicles for decades. One such company is General Motors who, in 1958, produced a self-driving car that was guided by a radio-controlled electromagnetic field. A number of car companies improved the technology based upon this idea. However, the challenge of achieving full autonomy remained. The journey reached an interesting point in 2005 when a few teams of scientists were able to complete the DARPA grand challenge (details in the video below). This challenge involves a 240 kilometre desert course. Continuous efforts from many scientists have made it possible to trial autonomous vehicles on public roads.

Following the DARPA challenge in 2005, researchers highlighted the criticality of perception of the world around the vehicle. Since then, a lot of companies began to develop autonomous cars focusing primarily on perception using vision. Autonomous car companies have varying strategies for achieving perception around an autonomous car. Most companies use some combination of RADAR, LIDAR and SONAR, and cameras. Tesla is the only large company that does not use LIDAR in its autonomous cars, primarily focusing on RADAR and cameras, and also making use of SONAR to detect near field objects. Despite the variation between companies, almost all of them place computer vision technologies to the fore.

Despite recent progress, autonomous driving still faces great challenges in representing the 3D world around a vehicle using computer vision only. It is difficult to achieve accurate representation because cameras generate 2D images and do not directly provide depth information of objects. Although many papers have been published on 3D reconstruction from multiple 2D images from cameras at different locations, 3D reconstruction is computationally expensive [1]. Therefore, some companies are using RADAR and LIDAR for depth perception of objects in the scene.

RADAR is cheap, but only gives us the range of an object. LIDAR, on the other hand, is expensive but provides 3D a point cloud around a vehicle with great accuracy. One benefit of LIDAR is that it has better resolution than RADAR. A disadvantage is that it has performance issues in opaque media. As such, LIDAR cannot be relied on in foggy weather, for example.

Another shortcoming of LIDAR is that it is sometimes difficult or impossible to understand exactly what the object they detect is. For example, if LIDAR sees a lightweight object on the road such as a plastic bag, it gives us just the point cloud. Sometimes it might be difficult to detect whether this is a plastic bag or a heavy object, a rock, or some other heavy object. Action taken by an autonomous vehicle will be significantly different based on the object the vehicle detects it to be. We do not want our vehicle to hit a heavy rock. However, in the case of plastic bags on the road, the vehicle does not even need to slow down. An advantage of using computer vision is that it is possible detect the difference between a plastic bag and a rock.

Let us look at a specific scenario: a biker is riding in the right lane, and the biker is looking at the left side to see if there is a car approaching from behind. The vehicle might be able to understand that this is a biker on the right lane from LIDAR point cloud. However, the advantage of using vision is that it could additionally tell us which direction the biker is looking in. If the vehicle has the information that this biker is looking at the left lane, the vehicle might be able to predict that the biker is planning to merge to the left lane and the vehicle needs to slow down to accommodate enough space for the biker. Vision could also detect if a pedestrian is distracted by their phone and approaching your lane.

Computer vision can give us a lot of information. However, accurate depth perception is still a challenge. There are some techniques for depth estimation and 3D reconstruction from vision only. Using multiple 2D images, it is possible to reconstruct a scene in 3D. One of these approaches is called multi-view stereo (MVS). First, multiple 2D images are analyzed and using structure from motion (SfM), camera poses of each 2D image are generated. SfM also gives point clouds in 3D. The multi-view stereo technique uses this point cloud from different camera poses to make 3D dense point cloud. Some research papers showed that using these techniques scenes can be well represented in 3D [2].

Depth perception can also be achieved using neural networks. The authors of [3] proposed SfM-net to estimate depth of the objects given a sequence of frames. SfM-net can be trained with various degrees of supervision, e.g., self-supervised by the reprojection photometric error, supervised by ego-motion, or supervised by depth. Self-supervised neural networks can be trained using raw videos into neural networks without any labels and it is possible to learn depth [4]. The neural network predicts depth in every single frame of the video. The objective of the network is to be consistent (and correct) over time. The network automatically predicts the depth for all the pixels.

In summary, in order to achieve fully autonomous vehicles accurate computer vision is a necessity, and the neural networks used must provide a complete representation of the surrounding environment. There has been a huge advance during recent years, but there is still much work to be done.

[1] SUN3D: A Database of Big Spaces Reconstructed using SfM and Object Labels, J. Xiao et al (2013).
[2] Building Rome in a day, S. Agarwal et al (2011).
[3] SfM-Net: Learning of Structure and Motion from Video, S. Vijayanarasimhan et al (2017).
[4] Andrej Karpathy speaks at the Tesla Autonomy day 2019




Md Ali Azam is studying Electrical Engineering and working as a research assistant at South Dakota School of Mines and Technology (SDSMT).
Md Ali Azam is studying Electrical Engineering and working as a research assistant at South Dakota School of Mines and Technology (SDSMT).




            AIhub is supported by:



Related posts :



New AI tool helps match enzymes to substrates

  24 Oct 2025
A new machine learning-powered tool can help researchers determine how well an enzyme fits with a desired target.

#AIES2025 social media round-up

  24 Oct 2025
Find out what participants got up to at the Conference on Artificial Intelligence, Ethics, and Society.

Looking ahead to #ECAI2025

  23 Oct 2025
Find out what the programme has in store at the European Conference on AI.

Congratulations to the #AIES2025 best paper award winners!

  21 Oct 2025
The four winners of best paper prizes were announced during the opening ceremony at AIES.

From the telegraph to AI, our communications systems have always had hidden environmental costs

  20 Oct 2025
Drawing parallels between new technologies of the past and today.

What’s on the programme at #AIES2025?

  17 Oct 2025
The conference on AI, ethics, and society will take place in Madrid from 20-22 October.

Generative AI model maps how a new antibiotic targets gut bacteria

  16 Oct 2025
Researchers used a GenAI model to reveal how a narrow-spectrum antibiotic attacks disease-causing bacteria.

What’s coming up at #IROS2025?

  15 Oct 2025
Find out what the International Conference on Intelligent Robots and Systems has in store.



 

AIhub is supported by:






 












©2025.05 - Association for the Understanding of Artificial Intelligence