Review of the current state of solving computer vision problems
Today, computer vision is experiencing a period of significant progress in solving problems on which researchers have been working for decades. Meaningful advances have been made in detection, segmentation, classification of objects, image search by pattern, image generation, and other tasks.
For many years, most approaches of image analysis have been based on the transition from an image to a feature space, which contains basic information about the whole image or its characteristics, and has a denser representation than the image. After the transition, such features are compared with the appropriate method, and then a decision is made according to the problem which is being solved. In addition to being informative and having reduced dimension, the obtained feature should be as insensitive as possible, i.e., invariant to image geometric transformations, changes in lighting, noise, local occlusion, for example, when objects are partially overlapped, or an object part extends beyond an image.
There isn’t any universal way to describe images with a feature set. The choice of feature space and their processing method depends on a specific problem. The weakness of the classical feature approaches is the need for complex configuration, where it is necessary to set various parameters on the basis of heuristic information. Values of such parameters affect the final result significantly.
In recent years, neural networks have provided a significant breakthrough in computer vision. In 2012, the AlexNet neural network took part in the annual ImageNet competition and showed the best results in solving the object classification problem with a number of errors of 15.3% against 26.2% of the runnerup. Then, in 2019, the classification quality with neural networks equaled human capabilities. The combination of achievements of classical analytical feature approach with a neural network one has a great perspective.
But even now, there are a lot of tasks where the classical feature approach is necessary. Such tasks, like cartography, stitching of panoramic images, etc., depend on solving the problem of normalization (compensation) of present geometric transformations.
Image normalization problem and approaches to solve it
In this article, image normalization means the process of compensation of geometric transformations that distinguish one image from another. This matter has been investigated for a long time. The fundamental works about normalization propose two main approaches: tracking and parametric, and also consider some methods for each of these approaches.
The tracking approach implies the gradual compensation of geometric transformations with many steps. The processed image is compared with the pattern at each step, and then it undergoes a tiny geometric transformation that compensates only a part of the whole geometric transformation, bringing the processed image closer to the pattern. As a result of all steps, the processed image will become a pattern, and the parameters of a general geometric transformation will be defined. This approach is applied to tracking and targeting tasks.
The parametric approach is aimed to determine the parameters of an entire geometric transformation at once. Then the found transformation is compensated, and the processed image turns into the pattern. This approach is used more widely.
For both approaches, some researchers offer to put an integrated method of construction of functionals, which are based on the moments of a different order. Still, only cases with simple geometrical transformations are considered. Also, for all integrated methods, a significant problem is the background that can be partially or completely changed. In other research papers, to solve the problem of normalization in the conditions of complex geometric transformations and local occlusions, it is proposed to use the method of onedimensional normalizations and decomposition of complex groups of transformations into compositions of simple ones. However, such methods solve the problem only partially. This article is devoted to analyzing the normalization on the basis of the descriptors of image key points.
Construction of image features based on descriptors
In the classical approach, a solution of a big amount of tasks is based on the key points definition and description of their neighborhoods by a feature vector with further processing of obtained vectors. So, there is a transition from an image to a space of key points feature vectors.
An algorithm that gets key points is a detector, and an algorithm that gets a description of the found points is called a descriptor. Also, a descriptor means a feature vector of a key point.
Over a long period of existence of computer vision tasks, a significant number of algorithms have been developed to detect and describe key points, which differ in varying degrees of invariance to geometric transformations, changes in lighting, angles of view, and time costs values. The implementation of most of these algorithms can be seen in popular software libraries. For instance, the open library OpenCV contains SURF, SIFT, ORB, BRISK, KAZE, AKAZE, LATCH, VGG, LUCID, DAISY, FREAK and other descriptors.
Research of descriptorbased image normalization
1. Normalization of geometrical transformations based on the descriptors.
The normalization method based on descriptors uses the basic property of projective transformation
$$ H = \begin{pmatrix}h_{11} & h_{12} & h_{13} \\ h_{21} & h_{22} & h_{23} \\ h_{31} & h_{32} & h_{33}\end{pmatrix}$$
about the possibility of obtaining parameters from coordinates of 4 points before and after the transformation:
$$ \left\{ \begin{aligned} h_{13} &= x_{A^{'}} \space ; \\\\ h_{31} &= {d_1(y_{C^{'}}y_{B^{'}})d_2(x_{C^{'}}x_{B^{'}}) \over AB(d_3  d_4)}; \\\\ h_{11} &= {x_{D^{'}} \space (h_{31}AB+1)h_{13} \over AB}; \\\\ h_{21} &= {y_{D^{'}} \space (h_{31}AB+1)h_{23} \over AB}; \\\\ h_{23} &= y_{A^{'}} \space ; \\\\ h_{32} &= {d_1(y_{C^{'}}y_{D^{'}})d_2(x_{C^{'}}x_{D^{'}}) \over AD(d_4  d_3)}; \\\\ h_{12} &= {x_{B^{'}} \space (h_{31}AD+1)h_{13} \over AD}; \\\\ h_{22} &= {y_{B^{'}} \space (h_{32}AD+1)h_{23} \over AD};\end{aligned}\right. $$
where h_{11}, h_{12}, h_{13}, h_{21}, h_{22}, h_{23}, h_{31}, h_{32}, h_{33} — parameters of projective transformation; A(x_{A}, y_{A}), B(x_{B}, y_{B}), C(x_{C}, y_{C}), D(x_{D}, y_{D}) and A'(x_{A'}, y_{A'}), B'(x_{B'}, y_{B'}), C'(x_{C'}, y_{C'}), D'(x_{D'}, y_{D'}) — 4 points before and after the transformation on B_{1} and B_{2} images respectively, AB, AD — segment lengths and
d_{1} = x_{D'}  y_{C'} + x_{B'}  h_{13}
d_{2} = y_{D'}  y_{C'} + y_{B'}  h_{23}
d_{3} = (x_{C'}  x_{D'})( y_{C'}  y_{B'})
d_{4} = (y_{C'}  y_{D'})( x_{C'}  x_{B'})
But in practice, the corresponding points on B_{1} and B_{2} images are unknown. Descriptors that detect and describe the key points can be used to solve the problem of establishing the corresponding points. Key points will be found with some inaccuracy because detectors and descriptors are sensitive to significant geometric and lighting transformations. The search of corresponding points that is defined based on the descriptor similarity will also make false pairs. Therefore, in practice, it is desirable to use more than 4 pairs of found corresponding points to determine the geometric transformation parameters more accurately.
The normalization algorithm used in the work consists of the following steps:

key points search and their description with feature vectors, i.e., the descriptors, for B_{1} and B_{2} images;

definition of the correspondence between the key points on B_{1} and B_{2} images;

determination of geometric transformation parameters, which distinguishes B_{2} image from B_{1} image;

B_{2} image normalization to B_{1} image (direct normalization), or B_{1} image normalization to B_{2} image (inverse normalization).
Further, let’s consider each step in detail.
Step 1. Key points definition and their description
To define key points and their description, the work considers fullcycle algorithms (detectordescriptor algorithms), which perform both key points detection and description.
Based on the multisource analysis, the detectordescriptor algorithms SURF128, SURF64, SIFT, BRISK, ORB, ORB (1000), KAZE, AKAZE were chosen as the most perspective and interesting for normalization. Table 1 shows brief information about these algorithms.
However, a large number of existing algorithms and the lack or inconsistency of information about their comparison, recommendations for their use in different conditions make it challenging to understand the strengths and weaknesses of certain descriptors and to choose the best for solving a specific task. Thus, there is a need for further descriptor research in the comparative aspect and obtainment of sound recommendations for their usage.
Table 1 — Brief information about the considered fullcycle descriptors (detectordescriptor algorithms)