# Introduction to Computer Vision using OpenCV and Python


In this guide, we will introduce a brief overview of Deep Learning. Then, we will discuss the purpose of Computer Vision in Python. After that, we' ll be taught the basics of dealing with data using OpenCV libraries by creating and displaying images. The fundamental tasks of Computer Vision such as object recognition and semantic segmentation will be explained. We will also cover the process of feature extraction, edge and face detection and object classification..

# Prerequisites

Before starting this guide, it is essential to be familiar with the basics of Python programming and Image Processing concepts.

# Guide map

We will provide a structured content according to the following map:

  1. Introduction;
  2. A brief introduction to Deep Learning;
  3. Computer vision tasks;
  4. Computer Vision Systems;
  5. Python libraries for Computer Vision;
  6. OpenCV library on Windows and Ubuntu;
  7. Processing images with OpenCV;
  8. Use cases for Computer Vision;
  9. Conclusion.

# 1. Introduction:

Computer Vision is a branch of Computer Science, which aims to build up intelligent systems that can understand the content in images as they are perceived by humans. The data may be presented in different modalities such as sequential (video) images from multiple sensors (cameras) or multidimensional data from a biomedical camera, and so on. It is the discipline that integrates the methods of acquiring, processing, analyzing and understanding large-scale images from the real world. It is also about depicting and reconstructing the world that we perceive in images, such as edge, lighting, color and pattern. The recognition of images, by decoding them into meaningful information from image-based data using models created by engineering, physics, statistics and learning theories. It is intended to simulate human vision, including the ability to learn, make decisions and react to actions based on visual information. Computer Vision is one aspect of Artificial Intelligence and Image Processing, which generally aims to simulate intelligent human capabilities. In computer Vision concept, object recognition is one of the fundamental tasks, which depends on how these objects are defined, whether in the form of images or video sequences, and human beings are able to recognize many entities, even if these objects, which are images, vary greatly in size and lighting.

Computer Vision

Some examples of Computer Vision applications:

  • Any application that can recognize objects or humans in an image;
  • Automatic control applications (industrial robots, vehicles);
  • Object construction models (industrial inspection, medical image analysis);
  • Applications make it possible to track a moving object.

Useful books for learning various aspects of Computer Vision: Multiple View Geometry in Computer Vision (opens new window), Computer Vision: Algorithms and Applications (opens new window)

# 2. A brief introduction to Deep Learning

# 2.1. What is Deep Learning?

Deep Learning is an Machine Learning strategy that has greatly enhanced performance in many fields such as Computer Vision, Speech Recognition, Machine Tanslation, and so on. The use of deep learning techniques, through raw data, allows many challenges to be solved in many economic sectors such as health, transport, finance, etc.

The favourable conditions that allowed the rise of Deep Learning:

  • Availability of very large spatio-temporal datasets (Big Data);
  • Availability of high-performance computing (GPU);
  • Flexibility of new training models (Deep Neural Networks).

# 2.2. Deep Learning Frameworks

In this section, we present the most popular frameworks for Deep Learning.

Framework Features Supports languages Download
Tensorflow Highly flexible system architecture Python, C++ and R here (opens new window)
Caffe Speed, transposability and applicability in modelling Convolution Neural Networks (CNN) C, C++, Python, MATLAB here (opens new window)
CNTK Easy training and combination of popular model types across servers Python, C++ and the Command Line Interface here (opens new window)
Torch/PyTorch The entire deep modeling process is far more simpler as well as transparent Lua, Python here (opens new window) or here (opens new window)
Keras Provide a simplistic interface for the purpose of quick prototyping by constructing effective neural networks that can work with TensorFlow Python here (opens new window)

# 3. Computer vision tasks:

In this section, we will successively examine some tasks of Computer Vision, in particular Image Recognition, Semantic Segmentation, Image Retrieval, Image Restoration, Object Recognition, Video Tracking, and so on.

CV tasks

# 3.1. Image Recognition

Traditionally, Computer Vision is about deciding whether or not the image contains an object. This task can be solved simply with little effort by human beings, but a certain activity is still not solved effectively and finely by computer in its general state. The only way to solve this issue is to find the best solutions to match certain features (edges, shapes, etc), and in some cases only, often with specific lighting conditions, a background and a certain position for the camera.

# Types of recognition:

A - Identification: Predefined objects are often identified from different viewpoints of the camera in their different locations.

B - Selection: Define a unique identifier in the shape. For example: identify a person's face or identify the specific type of a person or car.

C - Examination: Image data is treated for a specific object. For example: check for the presence of diseased cells in medical form, check if a car is present on a highway.

You will find here some projects and scripts based on Image Recognition task using a Deep Learning paradigm:

# 3.2. Image Retrieval:

Images stored in a visual dataset are retrieved based on the content as well as similar concepts of the database query where an image is inserted, and the output is a similar set of images. Content-based visual information retrieval is the implementation of the computer vision system in order to target images, i.e. the problem of retrieving images from large datasets. Image retrieval systems seek to find images similar to a query image among a dataset. The following figure represents the general process of retrieving images from content.

Image retrieval

You will find here some projects and scripts based on Image Retrieval tasks using a Deep Learning paradigm:

# 3.3. Image Restoration:

This is the process of restoring degraded images that cannot be recovered. Original images can be restored by prior-knowledge of damage or distortions that cause deterioration of images such as scratches, dust and stains. Restoration also includes images taken by sophisticated cameras that have been distorted due to the weather conditions in which they were taken, such as scanned images.

You will find here some projects and scripts based on Image Restoration task using a Deep Learning paradigm:

# 3.4. Object Recognition:

It is a branch of Computer Vision dedicated to the detection of a particular object in an image or video. Humans can recognize many objects in images with little effort, although the image may differ slightly from different aspects, such as variations, or even when they are moved or rotated. Although humans can recognize objects when they are partially hidden, this task remains a challenge for computer vision systems. The object recognition process is given by the following figure:

Object recognition

Unlike Machine Learning, the Deep Learning paradigm consists of an end-to-end feature representation learning from raw data without any prior data processing steps.

You will find here some projects and scripts based on Object Recognition task using a Deep Learning paradigm:

# 3.5. Semantic Segmentation

Semantic segmentation is a Deep Learning algorithm that assigns a label or category for each pixel in an image. It makes it possible to recognize a set of pixels that are in distinct classes. For example, an autonomous vehicle must be able to recognize vehicles, pedestrians, traffic signs, sidewalks and other environmental components of the road network. Semantic segmentation is involved in a wide range of solutions such as computer-controlled driving, autonomous vehicles, diagnostic imaging, industrial controls, and so on. The splitting of images into two classes is a simple example of semantic segmentation. In fact, it has no restriction in terms of categories. The number of classes can be changed in order to classify image content. For example, the image could be segmented into 4 classes: person, sky, sea and background. The example in the following figure is based on ICCV 2015 paper Conditional Random Fields as Recurrent Neural Networks (opens new window), which utilizes deep learning techniques and probabilistic graphical models for semantic image segmentation.

Semantic Segmentation

You will find here some projects and scripts based on Semantic Segmentation task using a Deep Learning paradigm:

# 3.6. Video Tracking

It is the process of locating or tracking a moving object (or several moving objects) using static or mobile cameras, while having many uses, such as human-computer interaction, security, three-dimensional reality, medical images and video editing. Tracking can be time-consuming due to video content and the need to use complex algorithms to identify and track objects. Tracking aims to follow the desired object to be tracked in a sequence of successive images. Tracking is a difficult task when this object moves faster than the capture-rate of these successive images. It is even more difficult when this entity changes direction as it shifts. For this reason, tracking systems apply a motion model that explains how this object's image will change as it moves in different directions. The following figure illustrates the overall scheme of the object tracking process:

Object Tracking

You will find here some projects and scripts based on Video Tracking task using a Deep Learning paradigm:

# 4. Computer Vision Systems

Computer vision systems are very diverse and are divided into large and sophisticated systems that perform general and complete tasks as well as small systems that perform specific and simple ones. Most computer vision systems mainly include the following:

# 4.1. Collecting images

The image is generated by using one or more image sensors. These include many digital camera sensors, distance sensors, radars, and ultrasonic cameras.

# 4.2. Pre-processing operations

Before applying the computer vision algorithm in order to extract valuable information, it is necessary to perform prior data operations to ensure that the data are consistent with the algorithm's specific hypotheses. Some examples of these processes include:

  1. Select the image resolution to confirm that its coordinate system is correct.
  2. Reduce the interference to ensure that the sensor does not provide inaccurate information.
  3. Increase the variance in order to ensure that the required information will be available.

# 4.3. Features extraction

Visual data features are extracted at different levels of abstraction from data raw. These benchmarks are categorized into:

  1. Global features such as color and shape.
  2. Local features such as edges and points. More complex features related to colors and patterns can be obtained.

# 4.4. Segmentation

All zones of the image can be recognized as important locations for subsequent operations. For example: select a set of key points, divide one or more images that contain the region of interest. 3.5. High-level processing operations At this stage, the input data consists of a small set of data, such as a set of points or a portion of the image that is suspected to contain the interest object. The other operations are:

  1. Ensure that the collected data are consistent with the hypotheses of the intended application.
  2. Evaluate the transaction values assigned to the request, such as steering or shape size.
  3. Classify the recognized objects into several classes

# 5. Python libraries for Computer Vision

The main toolkits for image processing in python are OpenCV, scikit-image and Pillow. The most general Python libraries (Numpy and Scipy) also provide some image processing tools. All these libraries can easily dialog with each other due to the common use of Numpy arrays to store images. A grayscale image is usually stored in a 2-dimensional integer or real value Numpy array with H rows and W columns (W=width,H=height). A color image is stored in a 3-dimensional Numpy array (H, W, 3).

  • OpenCV is a library that is written in C++, which is rich and widely used in computer vision.
  • Pillow (opens new window) is a PIL Fork (Python IMage Library). It is a library that is specific to Python, but is mainly written in C. It allows basic operations to be performed on images including read/write, transformations, histograms, filtering.
  • Scikit-Imag (opens new window)e is a fairly recent and actively developed library. The advantage of this library is that it is written in Python and Cython (Python typed and compiled for acceleration) which makes it easy to read its code.
  • [Scipy.ndimage](http://docs.scipy.org/doc/scipy/reference/ndimage. htm): Scipy's ndimage module provides a number of functions for shaping, interpolation, mathematical morphology and statistics.

# 6. OpenCV library on Windows and Ubuntu

Gary Bradsky started OpenCV at Intel in 1999. Compatible with a variety of languages such as C++, Python, etc., OpenCV-Python is an API that allows OpenCV to simultaneously release the power of Python and C++ API. In the case of Python, it is a library of binaries intended to address computer vision challenges. This library is based on NumPy and its array structures. That means we can also integrate it easily into other libraries such as SciPy and Matplotlib.

As we have explained previously, all operations on images are purely mathematical operations. But we can't say that programmers will do all these operations every time they use images, hence the development of OpenCV library, which includes functions that perform the most necessary operations in the images.

# Windows:

In order to download the Python program (x,y), click [here](https://python-xy.github.io/downloads.html, it’s possible to download each file individually). First, download the following file (opens new window) which contains the collection of the OpenCV library. Then, install the python program (x,y) as shown in the figures: Windows

# Ubuntu:

There are two ways to install OpenCV on Linux systems. The first one consists in installing pre-compiled files from repositories. For instance, in the case of the Ubuntu platform, it is sufficient to execute the following command:

sudo apt-get install libopencv-dev python-opencv

The second method consists in compiling the source files immediately beforehand (this method allows you to obtain the latest version of the library).

Open the terminal line and proceed as follows:

sudo apt-get update 
sudo apt-get upgrade
sudo apt-get install build-essential cmake git pkg-config
sudo apt-get install libjpeg8-dev libtiff4-dev libjasper-dev libpng12-dev 
sudo apt-get install libatlas-base-dev gfortran
# install Pip package 
wget https://bootstrap.pypa.io/get-pip.py 
sudo python get-pip.py
sudo pip install virtualenv virtualenvwrapper 
sudo rm -rf ~/.cache/pip
# virtualenv and virtualenvwrapper 
export WORKON_HOME=$HOME/.virtualenvs source /usr/local/bin/virtualenvwrapper.sh
source ~/.bashrc
mkvirtualenv cv
# Install Python2.7 
sudo apt-get install python2.7-dev
# Install Numby libraries
pip install numpy
# Download OpenCV library

cd ~ 
git clone https://github.com/Itseez/opencv.git 
cd opencv 
git checkout 3.0.0

After that, try downloading the opencv_contrib package. It will be used to use some features such as SIFT, SURF, which were in the OpenCV 2.4.2 library, and then deleted in OpenCV 3.0.

cd ~ 
git clone https://github.com/Itseez/opencv_contrib.git 
cd opencv_contrib
git checkout 3.0.0
cd ~/opencv 
mkdir build 
cd build
sudo make install
sudo ldconfig
cd ~/.virtualenvs/cv/lib/python2.7/site-packages/ 
ln -s /usr/local/lib/python2.7/site-packages/cv2.so cv2.so

# 7. Processing images with OpenCV

Now we have successfully installed OpenCV, let's start by doing it.

# 7.1. Reading images in Python

To read an image, we have the imread () function. It should be mentioned that previously, we have moved to the directory that contains the image.

img = cv2.imread ('img.jpg')

As an alternative, it is also possible to pass a value for a flag, which is the second argument

cv2.IMREAD_COLOR: For loading a color image by overlooking existing transparency; cv2.IMREAD_GRAYSCALE: For loading a grayscale image; cv2.IMREAD_UNCHANGED: For loading an image that includes an alpha channel It is possible to use integers 1, 0 or -1:

img = cv2. imread ('img.jpg', 0)

Note that sending an invalid image path does not result in any errors.

# 7.2. Displaying images in Python

The cv2.imshow () function enables to display an image in a frame that can be adjusted to its size. The first argument is the name of the frame and the second one is the image.

img = cv2. imread ('img.jpg')
cv2.imshow('Images', img)

Note that we have two frames at once as we have not attempted to title them in the same way. cv2.destroyAllWindows () function is another function that destroys all the frames that we have already created. cv2.destroyWindow () also destroys a specific frame.

# 7.3. Creating images in Python

To do this, there is the cv2.imwrite ()function. The first argument is the file name and the second one is the image to be saved.

cv2.imwrite('img_gray.png', img)

This will store the grayscale image named "img_gray.png" in the current location.

# 7.4. Displaying images using Matplotlib

By using Matplotlib (opens new window) library, we can display that image.

import matplotlib.pyplot as plt
plt.imshow(img, cmap = "gray", interpolation = "bilinear")
plt.xticks([]), pl.ticks ([])
(([], ), ([], ))
plt.display ()

# 7.5. Core operations on images

Let's now look at the basic operations applicable on the image.

import cv2
img = cv2.imread ('img.jpg')
y, x = 100,50

Reading of color values at positions y, x:

(b, g, r) = img[y,x]

Region of interest at (x, y) whose dimensions are 100x100:

roi = img[y:y+100,x:x+100] 
cv2.imshow ('image', img)
cv2.imshow('ROI', roi)

Pixelization of the new color :

roi[:,:]= (55,44,87) 
cv2.imshow('New image', img)

# 8. Use cases for Computer Vision

In this section, we will look at some tasks related to computer vision such as *edge detection, face detection, *feature detection and description, object classification performed by OpenCV and Python.

# 8.1.Edge detection

In OpenCV we can choose only to display the edges of objects with the Canny () function:

import numpy as np
img = cv2.imread('img.jpg')
cv2.imwrite ('edge_img.jpg', cv2.Canny (img, 512, 415))
cv2.imshow ('edges', cv2.imread('edge_img.jpg'))

# 8.2. Face detection

OpenCV will also enable to detect faces in images. Let's now use Haar's cascading classifier.

Now, there is one last point that we would really like to address, and that is the face detection. The Haar classifier is used. It is a matter of locating the position of faces in an image in order to standardize the size of the face area.

import sys, os
import cv2
def face_detection(image, image_out, show = False):
    # Load the image in memory
    img = cv2.imread(image)
    # Load the face detection model
    face_model = cv2.CascadeClassifier("haarcascade_frontalface_alt2.xml")
    # detection of the face(s)
    faces = face_model.detectMultiScale(img)
    # we place a bounding boxe around the faces
    print ("number of faces", len(faces), "image size", img.shape, "image", image)
    for face in faces:
        cv2.rectangle(img, (face[0], face[1]), (face[0] + face[2], face[0] + face[3]), (255, 0, 0), 3)
    # we store the final result
    cv2.imwrite(image_out, img)
    # to see the image, press ESC to exit
    if show :
        if cv2.waitKey(5000) == 27: cv2.destroyWindow("face")
if __name__ == "__main__":
    # wall lamp 
    for file in os.listdir("."") :
        if file.startswith("face") : continues # already processed
        if os.path.splitext(file)[-1].lower() in [".jpg",".jpeg",".png" ] :
            face_detect (file, "face_" + file)

As you can see, it drew a blue square (bounding boxe) around the face in the image.

# 8.3. Feature Detection and Description

In this section, we will present a brief description of the SIFT (Scale-Invariant Feature Transform) algorithm. The main idea of this approach is to transform an image into feature vectors (feature maps), which should ideally be invariant to geometric transformations (rotation and scaling). This involves the detection of interest points, which will make it possible to detect an object. The detection of these points leads to the implementation of feature vectors whose components are specific to the point under consideration.

SIFT in OpenCV and Python:

import cv2
import numpy as np

img = cv2.imread('my_img.jpg')
gray= cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)

sift = cv2.SIFT()
kp = sift.detect(gray,None)



# 8.4. Object Classification

To correctly identify an object in an image, it may be interesting to simply detect its edges and shapes when extracting features. How will we proceed to recognize objects? These are the 3 steps that we will perform: (1) extracting features in the image, (2) estimating each feature and (3) classifying of edges.

  • Let's start by importing and loading an image.
import numpy as np 
import cv2 
image = cv2.imread('my_image.bmp')

  • Step 1: Edge detection In order to improve edge detection, we will convert the color image to grayscale before performing a thresholding.
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
ret,thresh = cv2.threshold(gray,250,255,cv2.THRESH_BINARY_INV)


  • Step 2: Edge estimation
for cnt in edges:
perimeter =cv2.arcLength(cnt,True)
approx = cv2.approxPolyDP(cnt,0.01* perimeter,True)

M = cv2.moments(cnt)
cX = int(M["m10"] / M["m00"])
cY = int(M["m01"] / M["m00"])

  • Step 3 : Pattern classification
All you need to do is to recall how many peaks there are in each shape.
if len(approx)==3:
shape = "triangle"
elif len(approx)==4:
(x, y, w, h) = cv2.boundingRect(approx)
ratio = w / float(h)
if ratio >= 0.95 and ratio <= 1.05:
shape = " square"
shape = "rectangle"
elif len(approx)==5:
shape = " pentagon"
elif len(approx)==6:
shape = " hexagon "
shape= "circle"
cv2.putText(image, shape, (cX, cY), cv2.FONT_HERSHEY_SIMPLEX,0.5, (255, 255, 255), 2)

We just have to display the result to check out our work:


# 9. Conclusions

In this guide, we discussed the topic of Computer Vision using OpenCV and Python. We presented some fundamental tasks of Computer Vision such as Object Recognition and Semantic Segmentation. We also examined some case studies about the process of edge and face detection, feature extraction and object classification.