Computer Vision is a branch of Artificial Intelligence which aims to program computers to perceive and understand visual information in the same way that humans can. So in a nutshell, Computer Vision researchers aim to make computers do what humans do effortlessly with their eyes – see and understand the world. It is the technology that makes autonomous vehicles, robots and image searches possible today.
“Perception” problems, such as Computer Vision, appear easy to some because it involves replicating what humans do so easily. However, this is precisely what makes it so difficult. Since we are able to process visual information without thinking, we do not know what algorithms to program into our computers to make them “see” like us. This is different from a problem like playing chess, where there a set number of rules and pieces to move. Although all the possible moves are very difficult to keep track of for a human, it is trivial for a computer, and as a result, a computer called “Deep Blue” beat the world’s best chess player, Garry Kasparov, as far back as 1996.
Computer Vision is a different problem though – we do not know what the “rules” are for seeing and understanding are, and cannot program explicit knowledge into our programs. As a result, many modern Computer Vision techniques rely on analysing patterns arising in large amounts of visual data (such as photographs and videos) using a method known as “Machine Learning.” And through this process, we hope that computers will learn the “rules” to vision by themselves.
There are numerous Computer Vision systems in use today, and many more related technologies due to be released in the near future. Here are a few examples:
Making Sense of all the Visual Data on the Internet
The explosion of images and videos on the internet, via services such as social media and YouTube, is making Computer Vision increasingly relevant as we need to develop automated algorithms to organise and understand the billions of images and videos out there.
State-of-the-art Computer Vision algorithms are employed every time you search for an image on Google. Initially, the search engine retrieved images based on their textual descriptions. However, Google’s algorithms now analyse the actual pixels of an image to return the best results whilst also filtering out indecent content. Similarly, Facebook has face recognition algorithms which help to identify and tag people in photographs. (Similar face recognition systems are also employed in airports and security systems).
Enabling Autonomous Robots and Vehicles
Computer Vision algorithms are employed extensively in driverless vehicles and autonomous robots, since it enables them to understand their surrounding environment.
Driverless cars need to be able to precisely localise nearby objects in their environment. Here the colour of each pixel represents a different object class (purple = road, red = person etc). (Cityscapes Dataset)
One of the key problems in autonomous navigation is Semantic Image Segmentation. To avoid obstacles, robots need to know precisely where objects are. The task of Semantic Segmentation involves labelling every single pixel in an image with its object category to provide this fine-grained knowledge to robots. An online demo of such a system, where you can try out your own images, is available here. [Disclaimer: This is work done by my research group].
Another problem is that photographs are only a two-dimensional representation of the world. As humans, we are easily able to tell which objects are in front of each other, and have a rough estimate of how far away things are from each other. A big branch of active Computer Vision research is devoted to the field of 3D reconstruction – which involves recreating three-dimensional representations of the world (and this is a field of Computer Vision which does not make much use of Machine Learning but rather theory from optics and mathematics). 3D reconstruction algorithms allow robots to build detailed 3D maps of the environment which they are exploring. If you are interested, the source code for an open-source reconstruction engine can be found here.
Helping the partially sighted to see
If we can program computes to see the world around them, then we can use these algorithms to augment those who have impaired vision. These “smart specs” are equipped with cameras and a portable onboard computer and help to enhance the vision of the partially sighted by highlighting salient objects in the environment. These glasses are currently undergoing field trials and will be on sale in the United Kingdom in 2016.
The onboard computer and cameras on these smart glasses enhances the vision of the partially sighted. (VA-ST)
How can I learn more about Computer Vision?
Computer Vision is generally a postgraduate specialisation, and requires skills in programming, Machine Learning and mathematics. Some useful resources are Andrew Ng’s online course on Machine Learning, this Computer Vision course from Brown University, and this freely available book by Richard Szeliski. These resources assume strong programming skills, Hyperion’s courses are a great place to start to learn programming.
Comment with your views on this article in the comments section below, and follow Hyperion Hub developments in the future if you’d like to see more articles like these for the South African market.