### Vol. 3, №1, 2017

*Kulunchakov, A. S. * Creation of parametric rules to rewrite algebraic expressions in Symbolic Regression // Machine Learning and Data Analysis, 2017, 3(1):6-19. doi:10.21469/22233792.3.1.01 This paper investigates the problem of bloat in Symbolic Regression. It develops a procedure to simplify superpositions generated by SR. Our approach borrows ideas of equivalent decision simplification and apply them to create parametric rules of rewriting. Except eliminating excessive parts of superpositions, these rules reduce the dimensionality of parameter space of generated superpositions. The computational experiment is conducted on a dataset related to Brent Crude Oil options. There we approximate volatility of options prices by its strike prices and expiration date.

*Izmailov P.A., Kropotov D.P. * Faster variational inducing input Gaussian process classification // Machine Learning and Data Analysis, 2017, 3(1):20-35. doi:10.21469/22233792.3.1.02 Background: Gaussian processes (GP) provide an elegant and effective approach to learning in kernel machines. This approach leads to a highly interpretable model and allows using the Bayesian framework for model adaptation and incorporating the prior knowledge about the problem. The GP framework is successfully applied to regression, classification, and dimensionality reduction problems. Unfortunately, the standard methods for both GP-regression and GP-classification scale as O(n^3), where n is the size of the dataset, which makes them inapplicable to big data problems. A variety of methods have been proposed to overcome this limitation both for regression and classification problems. The most successful recent methods are based on the concept of inducing inputs. These methods reduce the computational complexity to O(nm^2) where m is the number of inducing inputs with m typically much less than n. The present authors focus on classification. The current state-of-the-art method for this problem is based on stochastic optimization of an evidence lower bound (ELBO) that depends on O(m^2) parameters. For complex problems, the required number of inducing points m is fairly big, making the optimization in this method challenging.

Methods: The structure of variational lower bound that appears in inducing input GP classification has been analyzed. First, it has been noted that using quadratic approximation of several terms in this bound, it is possible to obtain analytical expressions for optimal values of most of the optimization parameters, thus sufficiently reducing the dimension of optimization space. Then, two methods have been provided for constructing necessary quadratic approximations: one is based on Jaakkola--Jordan bound for logistic function and the other is derived using Taylor expansion.

Results: Two new variational lower bounds have been proposed for inducing input GP classification that depend on a number of parameters. Then, several methods have been suggested for optimization of these bounds and the resulting algorithms have been compared with the state-of-the-art approach based on stochastic optimization. Experiments on a bunch of classification datasets show that the new methods perform the same or better results than the existing one. However, new methods do not require any tunable parameters and can work in settings within a big range of n and m values, thus significantly simplifying training of GP classification models.

Mikheeva, A.V., I.I. Kalinnikov. 2017. The GIS-ENDDB algorithms and methods for geoinformation-expert data analysis 3(1):36 - 49. doi: 10.21469/22233792.3.1.03 The software of the geographical information system for studying the Earth's natural disasters (GIS-ENDDB) is focused on the research into cause-and-effect relations of catastrophic events in our planet's history. It contains data on the Earth's seismic activity, anomalies of heat flows (HF), gravitational field and tomography layers, detailed geographical relief, as well as data on cosmogenic structures distribution. To develop methods for analyzing these data, it has been added into the subsystems of information and mathematical software updates such as: the algorithm for building seismicity lineaments in terms of the Great circles (GC) of the Earth; the algorithms for constructing the contours of a maximum earthquake magnitudes and of the averaging earthquake mechanisms; the functions of geophysical fields visualization and the cross-section visualization of different seismicity characteristics; and tomography data. All these updates help to extend the capabilities of classical methods for geotectonic studies by a complex scientific-experimental approach allowing one to reveal tectonically active faults, to study the spatial relationship of seismicity and cosmogenic paleostructures (related to the historical past of the Earth), and, eventually, to interpret the data in terms of constructing seismic-geodynamic models of the lithosphere.

Dvoenko, S.D., D.O. Pshenichny. 2017. The conditionality of matrices of pairwise comparisons after metric corrections 3(1):50 - 60. doi:10.21469/22233792.3.1.04 In modern intelligent data analysis and data mining, results of investigations are usually represented by mutual pairwise comparisons of similarity or dissimilarity of objects. It needs to immerse results of pairwise comparisons into some metric space for correct using of machine learning algorithms. One of the conditions of the correct immersion is the nonnegative definite matrix of pairwise similarities of set elements. In this case, nonnegative similarities represent scalar products of vectors in the positive quadrant of an imaginary feature space and corresponding dissimilarities represent distances. Various similarity and dissimilarity measurements are used in practice. Nevertheless, not all of them are correct as metric functions. Therefore, it needs to use metric corrections of real experimental matrices of pairwise comparisons to reach the positive definiteness of the corresponding matrices of standard scalar products. Unfortunately, the natural best limit to minimize deviations of corrected values from initial ones leads to ill-conditioned matrices of scalar products with the large condition number. A way to improve the conditionality of matrices of pairwise comparisons is investigated.

Karkishchenko, A.N., V.B. Mnukhin. 2017. Gaussian rotations for graphic information protection 3(1):61 - 75. doi: 10.21469/22233792.3.1.05 Digital images over “finite complex planes” are considered jointly with transformations of Gaussian rotations. It is proved that under some special conditions, results of such transformations seem to be formed by several zoomed out copies of the rotated original, though all such “copies” are formed by different pixels of the original image. Based on Gaussian rotations, some methods for tamper resistent protection of graphic information are considered. A method for verification of protected information is also introduced.

Ganebnykh, S.N., M.M. Lange. 2017. On efficiency of fusion schemes for pattern recognition in ensemble of images 3(1):76 - 89. doi: 10.21469/22233792.3.1.06 In an ensemble of image sources of different modalities, some metric multiclass classifiers are studied. The classifiers make the collective decisions for the composite objects that are produced by collections of the images with one from each source. The discriminant functions of the multiclass classifiers are produced by the binary “class-vs-all” NN or SVM classifiers. Two original fusion schemes that use the discriminant functions based on the different compositions are suggested. The first scheme uses the compositions of the dissimilarity measures between the images within each source (General Measure, GM) whereas the second scheme uses the compositions of the soft decisions for the images in the submitted composite object (General Similarity, GS). In terms of error rates, the proposed GM and GS fusion schemes are compared with the known MV (Majority Vote) scheme which is based on majority voting the compositions of the hard decisions for the individual source images. The comparative efficiency of the above fusion schemes is supported by the error rates for recognition of the RGB face images given by the ensemble of their three decorrelated components. For the NN and SVM classifiers, the experimental estimations of the error rates show a profit of the GM and GS schemes in comparison with the MV scheme.

### Vol. 3, №2, 2017

Chukanov, S.N., S.V. Leykhter. 2017. Learning on affine groups for tracking images of objects 3(2):96 - 106. doi: 10.21469/22233792.3.2.01 Algorithms for tracking of objects and recognition of the behavior of objects based on the control of spatial and time changes of parameters using the learning methods are considered in the paper. Tracking algorithms, in which Lie groups are used for aﬃne transformations, are proposed. The parameters of the object’s motion determined by means of the exponentialmapping between the Lie group and its algebra are analyzed. The parameters are optimized on the manifold. Algorithms for joint learning and estimation with the help of the Luenberger observer for tracking problems on manifolds are presented.

Genrikhov, I.E., E.V. Djukova, and V.I. Zhuravlyov. 2017. Construction and investigation of full regression trees in regression restoration problem in the case of real-valued

information 3(2):107 - 118. doi: 10.21469/22233792.3.2.02 Background. The regression restoration problem with real data is considered. Typically, this type of information is most often encountered in practice (for example, in problems of medical diagnosis or banking scoring). The approach based on the construction of regression trees is highlighted among the existing approaches. The most known among algorithms of regression trees synthesis (for example, algorithms CART (classification and regression tree) and Random Forest) are based on use of the elementary trees, namely, binary regression trees. As a rule, in this case, in the synthesis of regression trees, the current values of the feature are splitted by only one threshold at each step of constructing the inner node of the tree. Sometimes, a splitting into several thresholds is performed (interval splitting), while searching for optimal splitting intervals computationally is a complex task. During the synthesis of such trees, only one feature and the corresponding set of thresholds are selected at each step, which satisfies the selected branch criterion, and branching is performed based on it. However, if several different pairs (a feature - a set of transcoding thresholds) satisfy the branching criterion in equal or almost equal measure when building a tree, then only one of them (in fact, randomly) is selected. Thus, depending on the selected feature and the set of thresholds, the constructed trees can differ significantly, both in the composition of the features used and in their recognizing qualities. Methods. An approach to the construction of regression trees based on the construction of the so-called full decision tree is applied. In addition, various ways of selecting a set of thresholds for feature at each step of tree synthesis have been investigated. Previously, this approach was investigated only on regression restoration problem with integer data and showed an improvement in the quality of the solution in comparison with the known methods of synthesis of regression trees. In a full regression tree, a so-called full node on each iteration for a problem with real features is constructed. A set of pairs of feature-threshold transcoding corresponds to it, in which each pair of feature-threshold satisfies the selected branching criterion. Further, a simple inner node is constructed for each pair from this set, from which branching is performed. Compared to the classical construction and the standard method of selecting only one threshold for the feature, full regression tree allows fuller use of the available information, while the description of the recognized object can be generated not only by one branch as in a classical tree, but by several branches. Results. Two synthesis algorithms of regression trees - DFRTree (defined full regression tree) and RFRTree (random full regression tree) - are developed. The RFRTree algorithm constructs a full regression tree in which the best set of thresholds for the feature is selected at each step of the synthesis by using a statistical criterion of estimating various random partitions. The DFRTree algorithm also constructs a full regression tree, but selects the set of thresholds for the feature for which approximately the same number of training objects fall into the resulting intervals. It is shown that the best results were obtained using the RFRTree algorithm. A comparison of 15 real problems of RFRTree and DFRTree algorithms with known regression trees synthesis algorithms, such as the Random Forest, Decision Stump, REPTree, and CART, is carried out. It is shown that quality of the RFRTree algorithm is higher than the quality of the Decision Stump, REPTree, and CART algorithms, and it is as good as the Random Forest algorithm and, in some cases, shows the best results. Concluding Remarks. It is shown that the developed algorithms for the synthesis of full regression trees for solving the regression restoration problem with real data are not inferior to the known algorithms for the synthesis of regression trees and can be successfully applied on a par with other modern approaches to constructing regression trees.

Kniaz, V.V., O.V. Vygolov, V.V. Fedorenko, and V.S. Sevrykov. 2017. Deep convolutional autoencoders: stereo matching for 3D model reconstruction of low-textured objects 3(2):119 - 134. doi: 10.21469/22233792.3.2.03 Methods: A new method for stereo matching based on deep neural convolutional autoencoders is presented. An autoencoder reduces the image dimensions and produces the code that could be used to perform an effective search of the corresponding image patch for low-textured object. Results: An architecture of a new autoencoder was developed. The autoencoder performs coding and decoding of color images with resolution 32 × 32 pixels. A comparison of the performance of the developed method and modern image patch descriptors is presented. The method was applied to process images and to reconstruct three-dimensional (3D) models of archaeological excavations organized by the Bosphorus expedition of the Russian State Historical Museum. Concluding Remarks: The analysis of an application of the developed method proves that it outperforms the existing image descriptors in the matching of image patches of low-textured objects.

Murashov, D.M. , F.D. Murashov. 2017. Method for localizing informative regions with texture of a special type 3(2):135 - 150. doi: 10.21469/22233792.3.2.04 The paper deals with a problem for localizing informative regions with a specific texture in digital images. This type of texture is characterized by uniformly oriented elongated elements and varying spatial frequency. Such a structure, in particular, can be generated by groups of brushstrokes in the images of paintings. Existing techniques, for example, based on the Haralick’s features, Laws energy features, and Gabor filters cannot completely solve the problem with the required quality. In this paper, the task of localization of informative areas will be addressed as a problem of segmentation of texture images. A method for solving the problem based on modified superpixel segmentation algorithm with a postprocessing procedure is proposed. Vector of image pixel description is expanded by texture features computed using components of the structure tensor. The selected features involve the peculiarities of the considered texture type. Application of superpixel algorithm with an extended feature description of images will permit to take into account spatial, color, and textural properties of image regions. To obtain an acceptable quality of segmentation, the condition of minimum information redundancy measure is used. A computational experiment has been carried out on textural test images. The results of segmentation of the image of the texture mosaic by the proposed method have been compared with the well-known method based on the Laws’s energy features. Comparison demonstrates the advantage of the proposed method. The developed technique has been used to localize informative areas in the images of paintings. The results of the experiment show the efficiency of the proposed method.

Nedel'ko, V.M.. 2017. Estimation of feature importance for quantile regression 3(2):151 - 159. doi: 10.21469/22233792.3.2.05 There are a large number of approaches to estimating the significance of variables in problems of constructing decision functions. One of the most important approaches is based on of the ROC (relative operating characteristics) curve (error curve). Initially, the ROC curves were introduced for classification models. The extension of ROC curves for regression problems has also been investigated. Notable examples are the so-called regression error characteristics (REC) curves and the Regression ROC (RROC) curves. However, these generalizations require the explicit specification of the predicted values of the target variable, while for constructing the ROC curve in the classification problem, one needs only the ordering of objects. There are also some other differences in essential properties of such regression ROC curves and classification ROC curves. The present author proposes some natural generalizations of the concept of the ROC curve for regression analysis, which reproduce more fully the properties of the ROC curve as compared to the known extensions. The most important of these properties is that the ROC curves move to a straight line, when built on random prediction. The deviations from the line allow one to estimate the importance of a variable. The proposed variants of the ROC curve for regression were found to be close to the construction of the empirical bridge.

Gasanov, E.E., A.P. Motrenko. 2017. Creation of approximating scalogram description in a problem of movement prediction 3(2):160 - 169. doi: 10.21469/22233792.3.2.06 The paper addresses the problem of a thumb movement prediction using electrocorticographic (ECoG) activity. The task is to predict thumb positions from the voltage time series of cortical activity. The scalograms are used as input features to this regression problem. Scalograms are generated by the spatio-spectro-temporal integration of voltage time series across multiple cortical areas. To reduce the dimension of a feature space, local approximation is used: every scalogram is approximated by parametric model. The predictions are obtained with partial least squares regression applied to local approximation parameters. Local approximation of scalograms does not significantly lower the quality of prediction while it efficiently reduces the dimension of feature space.

### Vol. 3, №3, 2017

Fedotov, N.G., A.A. Syemov, A.V. Moiseev. 2017. Perfomance investigation of 3D image recognition by stohastic geometry methods in dependent on the number of reference points on the sphere 3(3):176 - 192. doi: 10.21469/22233792.3.3.01 Background: A new developed approach to the three-dimensional (3D) images recognition, giving the object invariant description for any its spatial orientations is proposed. This method has many advantages and 3D images data mining capabilities. In particular, in parallel with the spatial object recognition, it is possible to analyze the original image. Due to building a rigorous mathematical model, it is possible to design analytically features with predetermined properties. Methods: The suggested approach is based on the modern methods of stochastic geometry and functional analysis. Hypertrace transform creates a 3D trace-image of the original spatial object due to scan of the parallel planes grid from different view angles. Created on this trace-image basis, hypertrace matrix is a convenient tool for analyzing 3D images in contrast to other mathematical methods. Results: Stochastic scan with random parameters is more efficient than the determinate scan in terms of the 3D images recognition “reliability-performance” relation. The conducted experiments results are shown. These results demonstrate both theoretical and practical significance and effectiveness of the proposed method. Concluding Remarks: The evaluation task of 3D image recognition performance independent on the number of reference points on the sphere with the use of various kinds of scanning are analyzed. Potential further ways to accelerate the recognition system are proposed.

Starozhilets V.M., U.V. Chehovich. 2017. About identification of a statistical model of traffic flows using vehicle groups 3(3):193 - 202. doi: 10.21469/22233792.3.3.02 A statistical model of traffic flows for modeling speed and number of cars on highways identified on data from heterogeneous sources is proposed. The model simulates movement of car groups along the highway using corresponding to the selected road segment fundamental diagram to calculate the car groups speed. Computational experiments are provided to confirm the adequateness of the model. Also, its behavior in situation of blocking one of the lanes of the highway is analyzed. The criterion of quality is the root-mean-square error between the predicted number of passed vehicles and the actual number of vehicles. Data from traffic detectors, provided by Traffic Management Center, and data obtained by video recording are used in this study.

Samsonov N.A., A.N. Gneushev. 2017. Textural descriptor in the Hough accumulator space of the gradient field for detecting pedestrians 3(3):203 - 215. doi: 10.21469/22233792.3.3.03 The problem of selecting features for recognizing pedestrians on an image is considered. The most popular and effective approach to selecting features is using descriptors based on Histograms of Oriented Gradients (HOG). In this paper, it is proposed to use the Hough accumulator space to generalize the HOG descriptor by obtaining projection not only of orientations, but also of the positions of the boundaries in the local area of the image Hough Accumulator Histograms (HAH). The Hough accumulator space is built on the basis of the beam Radon transform of the gradient field of the image. The proposed methods were tested together with linear support vector machine (SVM) classifier on the INRIA pedestrian database. The results of the experiment have shown the best separating ability of new descriptors and reduction of false detections in comparison with HOG..

### Vol. 3, №4, 2017

Sarmanova O.E., S.A. Burikov, S.A. Dolenko, I.V. Isaev, V.A. Svetlov, K.A. Laptinskiy, T.A. Dolenko. 2017. Estimation of the perspective of using machine learning methods for the purpose of monitoring of the excretion of theranostic fluorescent nanocomposites out of the organism 3(4):222 - 238. doi: 10.21469/22233792.3.4.01 Background: At present, development of new nanomaterials that can be used for diagnostics and medical treatment simultaneously is utterly relevant in biomedicine. While using such agents, one has to control their excretion out of the body. Methods: The results of the estimation of the perspective for application of machine learning methods for monitoring of the excreted theranostic nanocomposites (carbon dots, covered by copolymer and folic acid) and their components by their fluorescence spectra in urine are presented. The problem was solved as a clusterization problem (by k-means and by the algorithm of adaptive construction of hierarchical neural classifiers, developed by the authors), and as a classification problem (by neural networks). None of the clusterings revealed sensitivity to the types of nanoparticles contained in the suspension. Results: The best results of the solution of the classification problem were provided by a perceptron with 8 neurons in the single hidden layer, trained on the set of significant input features selected by cross-correlation. Recognition accuracy averaged over all five classes was 72.3%.

Djukova, E.V., G.O. Maslyakov, P.A. Prokofjev. 2017. About product over partially ordered sets 3(4):239 - 249. doi: 10.21469/22233792.3.4.02 One of the central, intractable problems of logical data analysis - the dualization over the product of partial orders - is considered. An important special case is investigated, when each order is a chain. The relevance of this case is determined by the large number of applications, of which, first of all, it is necessary to allocate such areas as machine learning and search for associative rules in databases. If the number of elements in each chain is two, then the dualization over the product of chains reduces in a well-known way to the construction of irreducible coverings of the Boolean matrix (the dualization of the Boolean matrix). In this paper, it is shown that in general case, when the number of elements in each chain is greater than two, the posed problem reduces to finding a subset of the set of irreducible coverings of a special Boolean matrix the size of which increases with increasing number of elements in the chain (the number of columns of the matrix grows linearly). The results of numerical experiments based on the effective “in the typical case” (for almost all variants of the problem) asymptotically optimal search for irreducible coverings of the Boolean matrix are presented. The algorithm of dualization of boolean matrix Runc-M, constructed by E. V. Djukova and P. A. Prokofjev earlier, is modified for the experiments. Runc-M is currently the world leader in counting speed. Previously, to solve the problem of dualization over the product of chains, an approach was proposed of interest mainly for the theory and is aimed at constructing incremental algorithms with quasi-polynomial time estimates “in the worst case” (E. Boros,K. Elbassioni, V. Gurvich, L. Khachiyan, and K. Makino, 2002).

Karyakina, A.A., A.V. Melnikov. 2017. Comparison of methods for predicting the customer churn in Internet service provider companies 3(4):250 - 256. doi: 10.21469/22233792.3.4.03 The possibility of forecasting the churn of customers based on the data of the Russian Internet service providers (ISP) has been considered. The basic approaches to preprocessing of archived data are defined. For comparison, classification algorithms are used: decision trees, random forest, naive Bayesian algorithm, gradient boosting, and the method of k-nearest neighbors for prediction. As the first sample, an experimental array of input data of size 6 × 400 000 was formed, which contains the fields from the calls (id, type of service, feature, reason, type of result, and leaving). As the second sample, an array of input data of size 13 × 400 000 was formed. For it, there have been selected the following features: id, count of calls for each type of service, for each type of result, total count of calls from the client, and leaving. The models for prediction with the best parameters have been constructed. In the tables, the results of the research with different data sets for various classifiers are shown.

Chigrinskiy V.V., I.A. Matveev. 2017. Iris structure motion analysis via optical flow method 3(4):257 - 266. doi: 10.21469/22233792.3.4.04 Nonlinear movements of elements of human iris during pupil size variations is studied. Tracking of iris elements is done with the help of optical flow methods. The aim is to estimate a radially symmetric function which describes positions of iris structural elements with respect to pupil size. The quality of the method is assessed by applying it to the synthetic data, which is built from preselected deformation model and after that, obtained function is matched against the model. To test the algorithm on real data, video of human’s eye reaction on flashlight is used, which was recorded by a special device.