# Introduction to Computer Vision using OpenCV and Python
In this guide, we will introduce a brief overview of Deep Learning. Then, we will discuss the purpose of Computer Vision in Python. After that, we' ll be taught the basics of dealing with data using OpenCV libraries by creating and displaying images. The fundamental tasks of Computer Vision such as object recognition and semantic segmentation will be explained. We will also cover the process of feature extraction, edge and face detection and object classification..
Before starting this guide, it is essential to be familiar with the basics of Python programming and Image Processing concepts.
# Guide map
We will provide a structured content according to the following map:
- A brief introduction to Deep Learning;
- Computer vision tasks;
- Computer Vision Systems;
- Python libraries for Computer Vision;
- OpenCV library on Windows and Ubuntu;
- Processing images with OpenCV;
- Use cases for Computer Vision;
# 1. Introduction:
Computer Vision is a branch of Computer Science, which aims to build up intelligent systems that can understand the content in images as they are perceived by humans. The data may be presented in different modalities such as sequential (video) images from multiple sensors (cameras) or multidimensional data from a biomedical camera, and so on. It is the discipline that integrates the methods of acquiring, processing, analyzing and understanding large-scale images from the real world. It is also about depicting and reconstructing the world that we perceive in images, such as edge, lighting, color and pattern. The recognition of images, by decoding them into meaningful information from image-based data using models created by engineering, physics, statistics and learning theories. It is intended to simulate human vision, including the ability to learn, make decisions and react to actions based on visual information. Computer Vision is one aspect of Artificial Intelligence and Image Processing, which generally aims to simulate intelligent human capabilities. In computer Vision concept, object recognition is one of the fundamental tasks, which depends on how these objects are defined, whether in the form of images or video sequences, and human beings are able to recognize many entities, even if these objects, which are images, vary greatly in size and lighting.
Some examples of Computer Vision applications:
- Any application that can recognize objects or humans in an image;
- Automatic control applications (industrial robots, vehicles);
- Object construction models (industrial inspection, medical image analysis);
- Applications make it possible to track a moving object.
Useful books for learning various aspects of Computer Vision: Multiple View Geometry in Computer Vision (opens new window), Computer Vision: Algorithms and Applications (opens new window)
# 2. A brief introduction to Deep Learning
# 2.1. What is Deep Learning?
Deep Learning is an Machine Learning strategy that has greatly enhanced performance in many fields such as Computer Vision, Speech Recognition, Machine Tanslation, and so on. The use of deep learning techniques, through raw data, allows many challenges to be solved in many economic sectors such as health, transport, finance, etc.
The favourable conditions that allowed the rise of Deep Learning:
- Availability of very large spatio-temporal datasets (Big Data);
- Availability of high-performance computing (GPU);
- Flexibility of new training models (Deep Neural Networks).
# 2.2. Deep Learning Frameworks
In this section, we present the most popular frameworks for Deep Learning.
|Tensorflow||Highly flexible system architecture||Python, C++ and R||here (opens new window)|
|Caffe||Speed, transposability and applicability in modelling Convolution Neural Networks (CNN)||C, C++, Python, MATLAB||here (opens new window)|
|CNTK||Easy training and combination of popular model types across servers||Python, C++ and the Command Line Interface||here (opens new window)|
|Torch/PyTorch||The entire deep modeling process is far more simpler as well as transparent||Lua, Python||here (opens new window) or here (opens new window)|
|Keras||Provide a simplistic interface for the purpose of quick prototyping by constructing effective neural networks that can work with TensorFlow||Python||here (opens new window)|
# 3. Computer vision tasks:
In this section, we will successively examine some tasks of Computer Vision, in particular Image Recognition, Semantic Segmentation, Image Retrieval, Image Restoration, Object Recognition, Video Tracking, and so on.
# 3.1. Image Recognition
Traditionally, Computer Vision is about deciding whether or not the image contains an object. This task can be solved simply with little effort by human beings, but a certain activity is still not solved effectively and finely by computer in its general state. The only way to solve this issue is to find the best solutions to match certain features (edges, shapes, etc), and in some cases only, often with specific lighting conditions, a background and a certain position for the camera.
# Types of recognition:
A - Identification: Predefined objects are often identified from different viewpoints of the camera in their different locations.
B - Selection: Define a unique identifier in the shape. For example: identify a person's face or identify the specific type of a person or car.
C - Examination: Image data is treated for a specific object. For example: check for the presence of diseased cells in medical form, check if a car is present on a highway.
You will find here some projects and scripts based on Image Recognition task using a Deep Learning paradigm:
- Fast MPN-COV: here (opens new window);
- Fine-Grained Representation Learning and Recognition by Exploiting Hierarchical Semantic Embedding: here (opens new window);
- Fine grained classification: here (opens new window).
# 3.2. Image Retrieval:
Images stored in a visual dataset are retrieved based on the content as well as similar concepts of the database query where an image is inserted, and the output is a similar set of images. Content-based visual information retrieval is the implementation of the computer vision system in order to target images, i.e. the problem of retrieving images from large datasets. Image retrieval systems seek to find images similar to a query image among a dataset. The following figure represents the general process of retrieving images from content.
You will find here some projects and scripts based on Image Retrieval tasks using a Deep Learning paradigm:
- Deep Local Feature (DeLF): here (opens new window);
- MILDNet: here (opens new window);
- MultiGrain: here (opens new window).
# 3.3. Image Restoration:
This is the process of restoring degraded images that cannot be recovered. Original images can be restored by prior-knowledge of damage or distortions that cause deterioration of images such as scratches, dust and stains. Restoration also includes images taken by sophisticated cameras that have been distorted due to the weather conditions in which they were taken, such as scanned images.
You will find here some projects and scripts based on Image Restoration task using a Deep Learning paradigm:
- Image Super Resolution using in Keras 2+: here (opens new window);
- RED-net: here (opens new window);
- Noise2Noise: here (opens new window).
# 3.4. Object Recognition:
It is a branch of Computer Vision dedicated to the detection of a particular object in an image or video. Humans can recognize many objects in images with little effort, although the image may differ slightly from different aspects, such as variations, or even when they are moved or rotated. Although humans can recognize objects when they are partially hidden, this task remains a challenge for computer vision systems. The object recognition process is given by the following figure:
Unlike Machine Learning, the Deep Learning paradigm consists of an end-to-end feature representation learning from raw data without any prior data processing steps.
You will find here some projects and scripts based on Object Recognition task using a Deep Learning paradigm:
- TF-slim: here (opens new window);
- DenseNet: here (opens new window);
- DeepBeliefSDK: here (opens new window).
# 3.5. Semantic Segmentation
Semantic segmentation is a Deep Learning algorithm that assigns a label or category for each pixel in an image. It makes it possible to recognize a set of pixels that are in distinct classes. For example, an autonomous vehicle must be able to recognize vehicles, pedestrians, traffic signs, sidewalks and other environmental components of the road network. Semantic segmentation is involved in a wide range of solutions such as computer-controlled driving, autonomous vehicles, diagnostic imaging, industrial controls, and so on. The splitting of images into two classes is a simple example of semantic segmentation. In fact, it has no restriction in terms of categories. The number of classes can be changed in order to classify image content. For example, the image could be segmented into 4 classes: person, sky, sea and background. The example in the following figure is based on ICCV 2015 paper Conditional Random Fields as Recurrent Neural Networks (opens new window), which utilizes deep learning techniques and probabilistic graphical models for semantic image segmentation.
You will find here some projects and scripts based on Semantic Segmentation task using a Deep Learning paradigm:
- PSPNet: here (opens new window);
- TorchSeg: here (opens new window);
- Deeplab: here (opens new window).
# 3.6. Video Tracking
It is the process of locating or tracking a moving object (or several moving objects) using static or mobile cameras, while having many uses, such as human-computer interaction, security, three-dimensional reality, medical images and video editing. Tracking can be time-consuming due to video content and the need to use complex algorithms to identify and track objects. Tracking aims to follow the desired object to be tracked in a sequence of successive images. Tracking is a difficult task when this object moves faster than the capture-rate of these successive images. It is even more difficult when this entity changes direction as it shifts. For this reason, tracking systems apply a motion model that explains how this object's image will change as it moves in different directions. The following figure illustrates the overall scheme of the object tracking process:
You will find here some projects and scripts based on Video Tracking task using a Deep Learning paradigm:
- GOT-10k Python Toolkit: here (opens new window);
- SiamMask: here (opens new window);
- Deep SORT: here (opens new window);
- Object tracking (tutorial): here (opens new window).
# 4. Computer Vision Systems
Computer vision systems are very diverse and are divided into large and sophisticated systems that perform general and complete tasks as well as small systems that perform specific and simple ones. Most computer vision systems mainly include the following:
# 4.1. Collecting images
The image is generated by using one or more image sensors. These include many digital camera sensors, distance sensors, radars, and ultrasonic cameras.
# 4.2. Pre-processing operations
Before applying the computer vision algorithm in order to extract valuable information, it is necessary to perform prior data operations to ensure that the data are consistent with the algorithm's specific hypotheses. Some examples of these processes include:
- Select the image resolution to confirm that its coordinate system is correct.
- Reduce the interference to ensure that the sensor does not provide inaccurate information.
- Increase the variance in order to ensure that the required information will be available.
# 4.3. Features extraction
Visual data features are extracted at different levels of abstraction from data raw. These benchmarks are categorized into:
- Global features such as color and shape.
- Local features such as edges and points. More complex features related to colors and patterns can be obtained.
# 4.4. Segmentation
All zones of the image can be recognized as important locations for subsequent operations. For example: select a set of key points, divide one or more images that contain the region of interest. 3.5. High-level processing operations At this stage, the input data consists of a small set of data, such as a set of points or a portion of the image that is suspected to contain the interest object. The other operations are:
- Ensure that the collected data are consistent with the hypotheses of the intended application.
- Evaluate the transaction values assigned to the request, such as steering or shape size.
- Classify the recognized objects into several classes
# 5. Python libraries for Computer Vision
The main toolkits for image processing in python are OpenCV, scikit-image and Pillow. The most general Python libraries (Numpy and Scipy) also provide some image processing tools. All these libraries can easily dialog with each other due to the common use of Numpy arrays to store images. A grayscale image is usually stored in a 2-dimensional integer or real value Numpy array with H rows and W columns (W=width,H=height). A color image is stored in a 3-dimensional Numpy array (H, W, 3).
- OpenCV is a library that is written in C++, which is rich and widely used in computer vision.
- Pillow (opens new window) is a PIL Fork (Python IMage Library). It is a library that is specific to Python, but is mainly written in C. It allows basic operations to be performed on images including read/write, transformations, histograms, filtering.
- Scikit-Imag (opens new window)e is a fairly recent and actively developed library. The advantage of this library is that it is written in Python and Cython (Python typed and compiled for acceleration) which makes it easy to read its code.
- [Scipy.ndimage](http://docs.scipy.org/doc/scipy/reference/ndimage. htm): Scipy's ndimage module provides a number of functions for shaping, interpolation, mathematical morphology and statistics.
# 6. OpenCV library on Windows and Ubuntu
Gary Bradsky started OpenCV at Intel in 1999. Compatible with a variety of languages such as C++, Python, etc., OpenCV-Python is an API that allows OpenCV to simultaneously release the power of Python and C++ API. In the case of Python, it is a library of binaries intended to address computer vision challenges. This library is based on NumPy and its array structures. That means we can also integrate it easily into other libraries such as SciPy and Matplotlib.
As we have explained previously, all operations on images are purely mathematical operations. But we can't say that programmers will do all these operations every time they use images, hence the development of OpenCV library, which includes functions that perform the most necessary operations in the images.
In order to download the Python program (x,y), click [here](https://python-xy.github.io/downloads.html, it’s possible to download each file individually). First, download the following file (opens new window) which contains the collection of the OpenCV library. Then, install the python program (x,y) as shown in the figures:
There are two ways to install OpenCV on Linux systems. The first one consists in installing pre-compiled files from repositories. For instance, in the case of the Ubuntu platform, it is sufficient to execute the following command:
sudo apt-get install libopencv-dev python-opencv
The second method consists in compiling the source files immediately beforehand (this method allows you to obtain the latest version of the library).
Open the terminal line and proceed as follows:
sudo apt-get update sudo apt-get upgrade sudo apt-get install build-essential cmake git pkg-config sudo apt-get install libjpeg8-dev libtiff4-dev libjasper-dev libpng12-dev sudo apt-get install libatlas-base-dev gfortran # install Pip package wget https://bootstrap.pypa.io/get-pip.py sudo python get-pip.py sudo pip install virtualenv virtualenvwrapper sudo rm -rf ~/.cache/pip # virtualenv and virtualenvwrapper export WORKON_HOME=$HOME/.virtualenvs source /usr/local/bin/virtualenvwrapper.sh source ~/.bashrc mkvirtualenv cv # Install Python2.7 sudo apt-get install python2.7-dev # Install Numby libraries pip install numpy # Download OpenCV library cd ~ git clone https://github.com/Itseez/opencv.git cd opencv git checkout 3.0.0
After that, try downloading the opencv_contrib package. It will be used to use some features such as SIFT, SURF, which were in the OpenCV 2.4.2 library, and then deleted in OpenCV 3.0.
cd ~ git clone https://github.com/Itseez/opencv_contrib.git cd opencv_contrib git checkout 3.0.0 cd ~/opencv mkdir build cd build cmake -D CMAKE_BUILD_TYPE=RELEASE \ -D CMAKE_INSTALL_PREFIX=/usr/local \ -D INSTALL_C_EXAMPLES=ON \ -D INSTALL_PYTHON_EXAMPLES=ON \ -D OPENCV_EXTRA_MODULES_PATH=~/opencv_contrib/modules \ -D BUILD_EXAMPLES=ON .. make sudo make install sudo ldconfig cd ~/.virtualenvs/cv/lib/python2.7/site-packages/ ln -s /usr/local/lib/python2.7/site-packages/cv2.so cv2.so
# 7. Processing images with OpenCV
Now we have successfully installed OpenCV, let's start by doing it.
# 7.1. Reading images in Python
To read an image, we have the
imread () function. It should be mentioned that previously, we have moved to the directory that contains the image.
img = cv2.imread ('img.jpg')
As an alternative, it is also possible to pass a value for a flag, which is the second argument
cv2.IMREAD_COLOR: For loading a color image by overlooking existing transparency; cv2.IMREAD_GRAYSCALE: For loading a grayscale image; cv2.IMREAD_UNCHANGED: For loading an image that includes an alpha channel It is possible to use integers 1, 0 or -1:
img = cv2. imread ('img.jpg', 0)
Note that sending an invalid image path does not result in any errors.
# 7.2. Displaying images in Python
cv2.imshow () function enables to display an image in a frame that can be adjusted to its size. The first argument is the name of the frame and the second one is the image.
img = cv2. imread ('img.jpg') cv2.imshow('Images', img)
Note that we have two frames at once as we have not attempted to title them in the same way.
cv2.destroyAllWindows () function is another function that destroys all the frames that we have already created.
cv2.destroyWindow () also destroys a specific frame.
# 7.3. Creating images in Python
To do this, there is the
cv2.imwrite ()function. The first argument is the file name and the second one is the image to be saved.
This will store the grayscale image named "img_gray.png" in the current location.
# 7.4. Displaying images using Matplotlib
By using Matplotlib (opens new window) library, we can display that image.
import matplotlib.pyplot as plt plt.imshow(img, cmap = "gray", interpolation = "bilinear") plt.xticks(), pl.ticks () ((, ), (, )) plt.display ()
# 7.5. Core operations on images
Let's now look at the basic operations applicable on the image.
import cv2 img = cv2.imread ('img.jpg') y, x = 100,50
Reading of color values at positions y, x:
(b, g, r) = img[y,x]
Region of interest at (x, y) whose dimensions are 100x100:
roi = img[y:y+100,x:x+100] cv2.imshow ('image', img) cv2.imshow('ROI', roi)
Pixelization of the new color :
roi[:,:]= (55,44,87) cv2.imshow('New image', img)
# 8. Use cases for Computer Vision
In this section, we will look at some tasks related to computer vision such as *edge detection, face detection, *feature detection and description, object classification performed by OpenCV and Python.
# 8.1.Edge detection
In OpenCV we can choose only to display the edges of objects with the
Canny () function:
import numpy as np img = cv2.imread('img.jpg') cv2.imwrite ('edge_img.jpg', cv2.Canny (img, 512, 415)) cv2.imshow ('edges', cv2.imread('edge_img.jpg'))
# 8.2. Face detection
OpenCV will also enable to detect faces in images. Let's now use Haar's cascading classifier.
Now, there is one last point that we would really like to address, and that is the face detection. The Haar classifier is used. It is a matter of locating the position of faces in an image in order to standardize the size of the face area.
import sys, os import cv2 def face_detection(image, image_out, show = False): # Load the image in memory img = cv2.imread(image) # Load the face detection model face_model = cv2.CascadeClassifier("haarcascade_frontalface_alt2.xml") # detection of the face(s) faces = face_model.detectMultiScale(img) # we place a bounding boxe around the faces print ("number of faces", len(faces), "image size", img.shape, "image", image) for face in faces: cv2.rectangle(img, (face, face), (face + face, face + face), (255, 0, 0), 3) # we store the final result cv2.imwrite(image_out, img) # to see the image, press ESC to exit if show : cv2.imshow("face",img) if cv2.waitKey(5000) == 27: cv2.destroyWindow("face") if __name__ == "__main__": # wall lamp for file in os.listdir("."") : if file.startswith("face") : continues # already processed if os.path.splitext(file)[-1].lower() in [".jpg",".jpeg",".png" ] : face_detect (file, "face_" + file)
As you can see, it drew a blue square (bounding boxe) around the face in the image.
# 8.3. Feature Detection and Description
In this section, we will present a brief description of the SIFT (Scale-Invariant Feature Transform) algorithm. The main idea of this approach is to transform an image into feature vectors (feature maps), which should ideally be invariant to geometric transformations (rotation and scaling). This involves the detection of interest points, which will make it possible to detect an object. The detection of these points leads to the implementation of feature vectors whose components are specific to the point under consideration.
SIFT in OpenCV and Python:
import cv2 import numpy as np img = cv2.imread('my_img.jpg') gray= cv2.cvtColor(img,cv2.COLOR_BGR2GRAY) sift = cv2.SIFT() kp = sift.detect(gray,None) img=cv2.drawKeypoints(gray,kp) cv2.imwrite('sift_img.jpg',img)
# 8.4. Object Classification
To correctly identify an object in an image, it may be interesting to simply detect its edges and shapes when extracting features. How will we proceed to recognize objects? These are the 3 steps that we will perform: (1) extracting features in the image, (2) estimating each feature and (3) classifying of edges.
- Let's start by importing and loading an image.
import numpy as np import cv2 image = cv2.imread('my_image.bmp')
- Step 1: Edge detection In order to improve edge detection, we will convert the color image to grayscale before performing a thresholding.
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) ret,thresh = cv2.threshold(gray,250,255,cv2.THRESH_BINARY_INV) img,edges,h=cv2.findEdges(thresh,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)
- Step 2: Edge estimation
for cnt in edges: perimeter =cv2.arcLength(cnt,True) approx = cv2.approxPolyDP(cnt,0.01* perimeter,True) M = cv2.moments(cnt) cX = int(M["m10"] / M["m00"]) cY = int(M["m01"] / M["m00"]) cv2.drawEdges(image,[cnt],-1,(0,255,0),2)
- Step 3 : Pattern classification
All you need to do is to recall how many peaks there are in each shape.
if len(approx)==3: shape = "triangle" elif len(approx)==4: (x, y, w, h) = cv2.boundingRect(approx) ratio = w / float(h) if ratio >= 0.95 and ratio <= 1.05: shape = " square" else: shape = "rectangle" elif len(approx)==5: shape = " pentagon" elif len(approx)==6: shape = " hexagon " else: shape= "circle" cv2.putText(image, shape, (cX, cY), cv2.FONT_HERSHEY_SIMPLEX,0.5, (255, 255, 255), 2)
We just have to display the result to check out our work:
cv2.imshow('Final_image',image) cv2.waitKey(0) cv2.destroyAllWindows()
# 9. Conclusions
In this guide, we discussed the topic of Computer Vision using OpenCV and Python. We presented some fundamental tasks of Computer Vision such as Object Recognition and Semantic Segmentation. We also examined some case studies about the process of edge and face detection, feature extraction and object classification.