Vol. 1, №11, 2015
Kuznetsov, M.P. and Ivkin N.P. Time series classification algorithm using combined feature description // Machine Learning and Data Analysis. 2015. V. 1, № 11. Pp. 1471 - 1483. We consider a problem of time series multiclass classification. We regard time series as complex-structured objects having no explicit feature description. In general, complex objects classification problem can be divided into the two stages: to form feature space and to construct a decision rule on this space. In this paper we focus on the first stage, namely to construct the feature space where the points of different classes are separable, or close to separable. Having constructed such space we use a simple linear or polynomial decision rule to discriminate the classes. We investigate various methods of time series feature space construction. The first method is the expert definition of basic functions. Namely, for the time series data we consider mean value, deviation, absolute deviation and empirical distribution of values. The second method involves data generation hypothesis and uses optimal estimations of generation parameters as the considered features. Furthermore, we consider a combined feature description of a time series. The computations show that using the extended feature space allows to significantly improve the classification quality. We use the proposed approach for the accelerometer time series classification. The problem is to classify each time series segment to one of six classes-actions: Jogging, Walking, Upstairs, Downstairs, Sitting and Standing. We show that our combined approach achieves very good accuracy comparing with the separate feature construction methods.
Bakhteev O.Y. Handling missing values in mixed-scale datasets with large amount of missing values // Machine Learning and Data Analysis. 2015. V. 1, № 11. Pp. 1484 - 1499. The paper investigates the problem of missing values handling in datasets with large amount of missing values. The one of the problem of missing values filling methods is their instability. The order of missing values filling can seriously change the efficiency of the method. The paper considers the case when the dataset has significant amount of features with discrete scales with low cardinality. There are some different methods of missing values handling. The paper focuses on the filling missing values using the metric properties of the dataset. The paper proposes some definitions and statements in order to formalize the problem of instability. The method using k-nearest neighbours is considered. The paper considers a variation of the method that uses already filled missing values as values of nearest neighbours for new fill. Also some theoretical aspects of this method implementation are considered. In order to analyse the behaviour and efficiency of the considered method 2 experiments were conducted. The results were compared with other missing values filling techniques such as filling with decision trees and filling with average value of the scale. The proposed mathematical framework can be used for further research of missing values filling methods.
Savchenko A. V. Statistical pattern recognition based on segment homogeneity testing // Machine Learning and Data Analysis. 2015. V. 1, № 11. Pp. 1500 - 1516. Background: This paper is focused on the small-sample size problem in statistical recognition of audiovisual objects with the nearest neighbor method. Its accuracy depends on the applied similarity measure. Moreover, the computing efficiency is insufficient if thousand of classes are available. Methods: We introduce an approach to design classifiers of audiovisual objects by testing of segment homogeneity based on the probabilistic model of composite object represented by a sequence of independent identically distributed segments. The asymptotic properties of this approach allow us to implement sequential hierarchical classification with approximate search of the nearest neighbor to speed-up the decision process. Results: Experimental study in constrained face recognition with HOG features shows that the proposed approach allows to increase accuracy in 1-10% in comparison with conventional image recognition techniques (k-NN, SVM, SIFT, histogram of local binary patterns, eigenfaces). Moreover, it is 2-3-times faster than the pyramid HOG hierarchical classifier. Conclusions: Described methodology with segment homogeneity testing allows to achieve high accuracy with sufficient performance in case of small-sample-size and medium-sized number of classes.
Nizhibitsky E. A. Feature composition in video tracking using particle filters // Machine Learning and Data Analysis. 2015. V. 1, № 11. Pp. 1517 - 1528. This work aims on analysis of probability models based on similarity measures based on features extracted from images which are widely used in the field of video tracking using particle filters. New computationally optimal methods for multiple feature extraction from several regions of the same image are proposed. The optimization is performed by using integral images, first prominently used in CV within Viola–Jones object detection framework for Haar rectangles, for other studied features. It is experimentally demonstrated that feature compositions can be used even in the tasks where each of them are useless by itself. The performance achieved using proposed compositions is greater than one in the similar study and comparable to the performance of more complicated models based on ensemble boosting.
Zyuzin, V. V., Porshnev S. V., Bobkova A. O., Mukhtarov A. A., and Bobkov V. V. The analysis of results of the left ventricle contouring using automatic algorithm on ultrasound images // Machine Learning and Data Analysis. 2015. V. 1, № 11. Pp. 1529 - 1538. In the article the features of automatic contouring of the left ventricle (LV) on echographic sequences are discussed. The automatic algorithm contouring of the LV of the heart on frames, which contain the image of the apical four-chamber projection of the human heart, is proposed. The algorithm is based on the LV area selection using morphological operations; finding the points of attachment of the mitral valve to the heart muscle (the point of the LV base); building a signature LV of the heart in a polar coordinate system centered at the midpoint of the segment connecting points of the LV base; using the piecewise polynomial approximation of the signature, built in a polar coordinate system; calculate the coordinates of the contour points in the coordinate system of the source frame ultrasound image of a contour by converting the signature from the polar to the Cartesian coordinate system. The quality assessment of automatic contouring is obtained with comparison of expert contours and automatically generated contours. It is shown that using parameters such as precision and recall traditionally used in assessing the quality of information search, when comparing expert and automatically generated contours, can not obtain adequate from the physical point of view, assessments of the quality of the automated algorithm. The study results of the kinematics of the mass center of the LV region of the heart, which allowed to propose a criterion for the automatic evaluation of the proper construction of a LV contour on separate frames of the video sequence. Identified areas for further research aimed at improving the quality of contouring.
Petrov, G. E. and Chehovich Y. V. Identification of traffic flow simulation using dissimilar information sources // Machine Learning and Data Analysis. 2015. V. 1, № 11. Pp. 1539 - 1554. In the last few years automobilization has grown rapidly. One of the most important problems is to collect information about traffic flows. Unfortunately, there do not exist absolutely full and reliable sources of this information. There are some approaches such as using traffic detectors, GPS technology, photocapture and video recording but they all have disadvantages. For example, traffic detectors determine traffic flow parameters in limited part of road net- works, GPS technology has low spatial accuracy and penetration rate, photocapture and video recording depend on daylight and weather conditions. In this paper, traffic flow simulation is used to understand bounds of data quality and data size needed to determine traffic flow parameters such as density and method of conducting the experiment. Single lane encircling highway is considered with installed traffic detectors and running vehicles. Many experiments are carried out with different values of GPS penetration rate and spatial accuracy. Traffic flow density is computed from vehicle speed by using Tanaka’s, Greenshield’s and Greenberg’s models. According to experimental results, there is a clear bound of 7% of vehicles carrying the traffic monitoring equipment when the quality of determining traffic flow parameters such as density can not be improved significantly with more penetration rate. In addition to this, GPS samples spatial accuracy of 100 m leads to 2% measurements relative error.
Djukova, E. V., Zhuravlev Yu. I., and Prokofjev P. A. Methods to improve the effectiveness of logical correctors // Machine Learning and Data Analysis. 2015. V. 1, № 11. Pp. 1555 - 1583. Background: One of the key concepts used to build the correct recognition procedures is the concept of elementary classifier. Elementary classifier is an elementary conjunction defined on integer attributive descriptions of objects. Elementary classifier is correct if it highlights only objects of the same class. Classical correct logical recognition procedures is based upon the construction of correct elementary classifier families. There are challenges that can not find a sufficient number of correct informative elementary classifiers. One way to solve the problem is build the recognition procedures based on the construction of the families of the correct sets of elementary classifiers (logical correctors). The elementary classifiers of the sets of these families are not necessarily correct. Methods: Some new results concerning the improvement of recognition quality and learning rate of logical correctors are presented. The model of the logical corrector based on a more general concept of the correct set of elementary classifiers are built. Results: New design allows more succinctly describe the patterns in the classes of objects. New logical correctors have a higher quality of recognition in almost all test problems. Learning rate of the logical correctors increases due to the pre-selection of high-informative elementary classifiers (local basis). Conclusions: The proposed methods allow to apply logical correctors for the large-size problems as well as the well-known logical classifiers. Further refinement of the proposed models can be produced by introducing on the sets of feature values the partial orders.
Aysina R.M. Survey of visualization tools for topic models of text corpora // Machine Learning and Data Analysis. 2015. V. 1, № 11. Pp. 1584 - 1618. Topic modeling is an important tool for statistical analysis of text collections. A visual representation of a topic model enables researchers to study cluster structure of the collection and estimate quality of the topic model. Visualization tools are especially important for graphical user interfaces as they facilitate search and navigation across documents of the collection. In this survey we describe visualization tools for topic models, including hierarchical, temporal and multimodal models. We give examples of graph and network visualization, and categorize visualization tools according to their functionality.
Dvoenko, S.D. and Pshenichny D.O. On metric characteristics of the Kemeny’s median // Machine Learning and Data Analysis. 2015. V. 1, № 11. Pp. 1619 - 1631. Background: For aggregating of expert’s opinions, it needs to find the final ranking, which is the least different from others and represents the group opinion. The Kemeny’s median appears to be the good idea of the average for scales of (quasi-)orderings and is free of some contradictions concerning the building of a group opinion based on the majority rules (Arrow’s paradox). The well-known locally optimal algorithm to find the Kemeny’s median depends on pairwise distances between rankings and calculates so called the loss matrix. Methods: It is assumed, the rankings represented by pairwise distances between them are immersed as a set in some Euclidean metric space. According to it, we can define the average element as the center of this set. Such the central element is a ranking too and needs to be similar to the Kemeny’s median. To be the Kemeny’s median the mathematically correct center of the set of rankings, it needs to be like the center by its distances to other elements. The procedure is developed to build the modified loss matrix and find the metric Kemeny’s median, which coincides with the average element of the given set. Results: In general, such the center element differs by its distances from the corresponding distances of the Kemeny’s median to other set elements. We find the metric Kemeny’s median, which coincides with the average element of the given set. Such the ranking coincides with the classic Kemeny’s median and proves its metric property, or differs from it, if the metric violations in the set configuration appear to be significant. Conclusions: The metric Kemeny’s median is the correct center of the set of rankings and can be used for the correct version of k-means algorithm and others for ordering scales.
Borisova, I.A. and Kutnenko O.A. Outliers detection in datasets with misclassified objects// Machine Learning and Data Analysis. 2015. V. 1, № 11. Pp. 1632 - 1641. Background: The problem of outliers detection is one of the important problems in Data Mining. Outliers here are considered as initially misclassified objects of the dataset. Such objects in small datasets can seriously interrupt the process of classification. This paper describes an algorithm of censoring such data, focusing only on the local characteristics of objects in the dataset. Methods: Censoring procedure in a fixed feature space consists of sequential removals of objects, which deteriorate the quality of dateset description (a value of classes’ separability) in a strongest way. This value depends on the number of objects in the dataset and similarity of objects with their class in competition with the rival class. To evaluate the similarity of the object z with the class A in competition with the class B the ternary relative measure called the function of rival similarity (FRiS-function) is used. Results: The proposed algorithm was tested on a wide range of model problems. Accuracy of k nearest neighbors classification before and after outliers elimination from the datasets was in use to estimate efficiency of the censoring algorithm. In the most tasks it is appeared to be improvement in classification accuracy after censoring. Analysis of objects which were recognized as outliers showed up 96% sensitivity and 99% specificity.
Vol. 1, №12, 2015
Florinsky, I.V. and Pankratov A.N. Digital terrain modeling with orthogonal polynomials // Machine Learning and Data Analysis. 2015. V. 1, № 12. Pp. 1647 - 1659. doi:10.21469/22233792.1.12.01 Background: Mathematical problems of digital terrain analysis includes interpolation of digital elevation models (DEMs), DEM generalization and denoising, as well as computation of morphometric variables by calculation of partial derivatives of elevation. Traditionally, these procedures are based on numerical treatments of DEMs, two-dimensional discrete functions of elevation. Methods: We developed a spectral analytical method and algorithm based on high-order orthogonal expansions using the Chebyshev polynomials of the first kind with the subsequent Fej´er summation. The method and algorithm are intended for analytical treatment of regularly spaced DEMs, such as, DEM global approximation, generalization, and denoising as well as computation of morphometric variables by analytical calculation of partial derivatives. Results: To test the method and algorithm, we used a DEM of the Northern Andes containing 230,880 points (the elevation matrix 480×481). DEMs were reconstructed with 480, 240, 120, 60, and 30 expansion coefficients. The first and second partial derivatives of elevation were analytically calculated from the reconstructed DEMs. Models of 14 local morphometric variables were then computed with the derivatives. A set of maps of elevation and horizontal curvature (kh) related to different number of expansion coefficients well illustrates data generalization effects, denoising, and removal of artifacts contained in the original DEM. Concluding Remarks: The test results demonstrated a good performance of the developed method and algorithm. They can be utilized as a universal tool for analytical treatment in digital terrain modeling.
Chochia P.A. Two-dimensional variation as an image complexity assessment // Machine Learning and Data Analysis. 2015. V. 1, № 12. Pp. 1660 - 1676. doi:10.21469/22233792.1.12.02 The questions of image complexity assessment and using of two–dimensional variations are studied. The image complexity is interpreted as some attribute which is specified by the quantity, the sizes and the visibility of image details. Different known two–dimensional variation methods are considered in their application to digital images. The modified assessment is proposed, that is named as the component size index. The changings of variations under different image transformations are analyzed. Theoretical conclusions are confirmed by the experimental explorations. The proposed combination of two–dimensional variations was demonstrated to reflect morphological structure of an image and to assess its complexity.
Gracheva, I.A. and Kopylov A.V. Fast image processing algorithms based on the gamma-normal model of hidden field // Machine Learning and Data Analysis. 2015. V. 1, № 12. Pp. 1677 - 1685. doi:10.21469/22233792.1.12.03 Within the Bayesian approach the problem of reconstruction can be expressed as the problem of finding the hidden Markov component of random field of two-component, where observed component is the analyzed image. However, for certain types of image processing problems this formulation of the problem not the solution, such as image haze removal, HDR image compression, structure - transferring of image. In this paper, we propose an extension of problem Bayesian approach to the problem of image processing.
Kornilov F. A. Research of the impact of misregistration input images on the accuracy change detection // Machine Learning and Data Analysis. 2015. V. 1, № 12. Pp. 1686 - 1695. doi:10.21469/22233792.1.12.04 Background: The paper is devoted to studying the impact of misregistration of the input images on the performance of the structural changes detection algorithms for multitemporal Earth’s surface satellite images. Here, the structural changes mean appeared or disappeared ground’s objects. Methods: Algorithms performance is estimated using a pair of images, where the second image is a geometrically shifted copy of the first image; such kind of testing lets to obtain degree of robustness of the methods to geometrical misregistration of the input data without influence of the structural changes or random noise. Results: The new method for comparing images structures is intoduced; applied together with other structural changes algorithm, it leads to decrease the number of false alarms in the presence of misregistration of input images. Concluding Remarks: The experiments on the image pairs, obtained by shifting originally aligned satellite images, show that considered algorithms modification is valuable for real applications.
Medvedeva, E.V., Karlushin K.A., and Kurbatova E. E. Method of detection of moving objects in video stream on the basis of object boundaries estimation // Machine Learning and Data Analysis. 2015. V. 1, № 12. Pp. 1696 - 1705. doi:10.21469/22233792.1.12.05 The purpose of the research is to develop a new method of detection of moving objects in the frame sequence using practically stationary background. The method is based on the estimation of moving object boundaries by calculation of the value of information quantity, and requires less computational resources than the existing well-known methods. The method uses approximation of the digital halftone images (DHI) sequence by a three-dimensional Markov chain with several states and representation of the DHI by g-digit binary images (DBI). To find contours the value of information quantity in each DBI element is calculated in accordance with various combinations of neighborhood elements. Then calculated value of information quantity is compared with a threshold to define whether the pixel belongs to the contour. To define an object of interest by obtained contour points is used DBSCAN density clustering algorithm. The proposed method of moving objects contours definition requires small computational re-sources, as for each element only operations of comparison with three neighboring elements are carried out. The developed method faster in comparison with known method in 2–5,6 times. Results of modeling of the developed method are shown. The gain in root mean-squared error of coordinates determination accuracy for developed method in comparison with a known subtraction method is 1,5 ÷ 2,5. The developed method requires small computational resources, thus it can be applied in real time data processing. Range of variation of object dimensions in video sequence can be wide, and number of moving objects can be priori unknown.
Shibzukhov, Z.M. and Cherednikov D.Y. About models of neurons of aggregation type // Machine Learning and Data Analysis. 2015. V. 1, № 12. Pp. 1706 - 1716. doi:10.21469/22233792.1.12.06 A new class of models of artificial neurons is described in this work. These models are based on the following principles: (i) contributions of synapses are summed with the help of certain aggregation operation; and (ii) contribution of complex synapse or synaptic cluster is computed with the help of another aggregation operation on the set of simple synapses. These models include a big part of the known functional models of neurons. For a class of the aggregating neurons generalizing model SigmaPi-neuron, it is shown that they can be correctly trained on the final sets of precedents.
Trekin, A.N., Matveev I. A., Murynin A.B., and Bochkareva V. G. A method for upsampling of remote sensing images using vector data for reserving edges // Machine Learning and Data Analysis. 2015. V. 1, № 12. Pp. 1717 - 1730. doi:10.21469/22233792.1.12.07 A method for image upsampling was developed. The method makes use of vector data about geometry of objects contained in image. A-priori information about high-contrast boundaries helps to preserve sharp illuminance change from blurring due to upsampling procedure. The developed method was tested over a set of remote sensing images and vector map of water bodies.
Cheprasov, D.N., Malenichev A.A., Sulimova V.V., Krasotkina O.V., Mottl V.V., and Markov A.A. Recovering missing data on ultrasonic rail defectograms via semi-global warping // Machine Learning and Data Analysis. 2015. V. 1, № 12. Pp. 1731 - 1751. doi:10.21469/22233792.1.12.08 Background: The paper deals with the actual problem of automatic recovering missing data on ultrasonic rail defectograms, occurring due to, e.g., bad weather conditions. Methods: The proposed approach is based on retrieving missing data for current ultrasonic inspection from ultrasonic defectogram of previous inspection. At this work we update our previous method, making it more accurate and appreciably more fast one. We propose a special 3-windows model for fast localization of bolt-on joint areas and semi-global warping procedure based on special dissimilarity measure of defectogram elements for more precise definition of bolt-on joint area positions. Results: Experiments show that the proposed approach allows for finding place of the area of interest with accuracy about 3.5 sm. So, it possesses good possibility to recover missing data from the previous defectogram.
Novikov, E. A., Vakoliuk I. A., Akhapkin R. D., Varchak I. A., Shalanginova I. G., Shvaiko D. A., and Budenkova E. A. Automation Method of Computer Oculoghaphy for Research of the Central Nervous System Based on Passive Video Analysis // Machine Learning and Data Analysis. 2015. V. 1, № 12. Pp. 1752 - 1761. doi:10.21469/22233792.1.12.09 Data of changing pupil center position in time are called oculogram. The oculogram allows us to define functional condition of the brain divisions involved in the programming process and regulation of eye movements. This article describes new version of registration automation for process and subsequent analysis of arbitrary and provoked eye movements, that in gen- eral called computer oculography. Basically, computer oculography rely on an active infrared eye tracking by the rigid fixation of the head. Such methods are quite costly and unpopular decision. However, due to development of digital images registration technologies and general increase in computing power of personal computers and portable devices, methods of passive scanning images are starting to gain popularity. The proposed in this paper method relies on standard digital cameras. Our method can be applied to analyze qualitative oculogram based on video images taken from at least 30 frames per second.
Petrov, E. P., Kharina N. L. , and Sukhikh P. N. Fast lossless image compression method // Machine Learning and Data Analysis. 2015. V. 1, № 12. Pp. 1762 - 1770. doi:10.21469/22233792.1.12.10 A suggested digital image compression method is characterized by simplicity of implementation and the lack of computing operations at the stage of prediction. Background: Miniature space watch facilities (small satellites) cannot provide a continu- ous data transmission because of the severely constrained requirements for energy resources usage efficiency, that produces the necessity of new energy efficient low-cost digital images compression methods, which would no not be inferior to the known multidigit digital images compression methods with high resolution but surpass them. Methods: The algorithm consist of the following procedures, splitting the digital images into binary images, predicting of each element of binary images being based on the theory of the conditional Markov processes with discrete states, and coding using any known algorithm (here Huffman method is used). Results: To prove the efficiency of the proposed method the compression of the Earth surface space pictures (group A), and photos (group B) is done. In each group there are 50 one-type images. The lossless compression algorithms know, such as PNG, JPEG-LS, JPEG 2000, BMF, Qlic and ImageZero, are used as analogs. The obtained research results indicate that the proposed method performance and compression ratio comparable with analogues. Concluding Remarks: Method has the following advantages: the capability of simultaneous binary images processing, the capability of digital images processing with digit capacity, the lack of computational operations.
Vol. 1, №13, 2015
Krymova E. A. Aggregation of ordered smoothers in colored noise // Machine Learning and Data Analysis. 2015. V. 1, № 13. Pp. 1775 - 1785. doi:10.21469/22233792.1.13.01 The paper is devoted to the problem of recovery of one-dimensional functions given a set of noisy observations. Suppose that in addition we are given a fixed set of a finite number of function estimates. Based on this set of estimates it is necessary to construct a new estimator, the risk of which would be close to the risk of the “best” estimate (so-called oracle) in a given set. The “best” estimator is a minimizer of the risk over the given set of function estimators. We prove new oracle inequalities for aggregation of regression function estimates in assumption of heteroscedasic Gaussian noise, namely correlated Gaussian noise with different variances at each design point.
Medvedeva, E. V., Trubin I. S., Ustyuzhanina E. A., and Laletin A. V. Multidimensional nonlinear filtration of multicomponent images // Machine Learning and Data Analysis. 2015. V. 1, № 13. Pp. 1786 - 1795. doi:10.21469/22233792.1.13.02 The goal of this paper is to develop a method of nonlinear multidimensional multicomponent images filtering based on mathematical apparatus of Markov chains. The method allows efficient use of the statistical redundancy of the image to improve the quality of image distorted by white Gaussian noise. Multidimensional signals of multicomponent images have a much greater statistical redundancy than single image. This redundancy would be appropriate for use to improve the quality of the restoration of noisy images. Special cases of multicomponent images are RGB image, each color component of which is a g-bit half-tone digital image (HTDI). The nature of the statistical relationship between elements within the HTDI and among the elements of color components (RG, GB, BR), allows us to offer as an approximation for the three-dimensional color images of a Markov chain with several states, and for bit binary image (bit planes) of two color components of the three-dimensional Markov chain with two states. This approximation makes it possible to apply the theory of filtration of conditional Markov processes for the development of filtering method of multicomponent images. Realistic images contain regions with varying degrees of detail and different statistical characteristics. We propose improving the accuracy of calculation of the statistical characteristics of each local region within the image, and between the color components to improve the quality of the reconstructed image. We have used a sliding window to estimate local statistical characteristics of the image. We present the results of modeling of the offered algorithm of a three-dimensional nonlinear filtration with use of the sliding window and earlier developed algorithm of a two-dimensional filtration of color (RGB) images. The developed three-dimensional filter taking into account the sliding window provided to reducing quantity of the artifacts similar to influence of pulse hindrances, to provide allocation of borders and small-sized objects more exact. The gain in the MSE is from 30 to 70% respectively in the range of the signal/noise relations, ρ^2_{in} = − 9 · · · − 3 dB .
Fedotov, N. G., Syemov A. A., and Moiseev A. V. Feature space minimization of 3D image recognition based on stochastic geometry and functional analysis // Machine Learning and Data Analysis. 2015. V. 1, № 13. Pp. 1796 - 1814. doi:10.21469/22233792.1.13.03 Background: In recent decades, the emphasis in the analysis and pattern recognition shifts from 2D to 3D images, because three-dimensional design allows to use more information about the object. 3D modeling gives possibility to see object from different angles, in particular, allows to analyze its spatial form. Methods: In this article a new approach to the 3D objects’ recognition based on modern methods of stochastic geometry and functional analysis is proposed. This method has many advantages; in particular, it allows to describe 3D objects metric properties. Thus, due to building a rigorous mathematical model, the analyst can construct analytical and not intuitive features, describing object form and their characteristics (in particular, constructing geometric features). Results: Hypertrace transform allows to create invariant description of spatial object, which is more resistant to distortion and coordinate noise than the description obtained as a result of the object normalization procedure. The proposed method reliability and efficiency is confirmed both an adequate constructed mathematical model by using modern approaches of 3D images analysis and recognition and the practical experiments results and also the developed software package registration. Conclusions: In the article detailed description of hypertrace transform scan technique and its mathematical model is provided. The main approaches to construct and distinguish informative features are analyzed. Own method to minimize the feature space and its appropriate decision procedure are proposed. The practical experiment results of comparing for stochastic and deterministic scan methods are presented.
Beklaryan, L.A. and Khachatryan N.K. Dynamic model of organization of cargo transportation // Machine Learning and Data Analysis. 2015. V. 1, № 13. Pp. 1815 - 1826. doi:10.21469/22233792.1.13.04 The model describing the process of cargo transportation, realized through a number of tech- nologies is investigated. Four versions of the model are considered. The first version of the model describes the transnational cargo transportation without dedicated initial departure station and the final station cargo distribution. This version of the model describes the cargo, for which both the first and the last stations are not nodes. For such cargo transportation it is important to describe the rule of interaction of intermediate stations. The second version of the model describes the transport cargo with a dedicated initial departure station. This version of the model describes the cargo on the long section of the route where the initial departure station is nodular. The role of the station is the most significant problem in the organization of cargo and, therefore, it has extra capacity. For such cargo transportation it is important to describe the rule of interaction between the first station and intermediate stations, as well as the rule of interaction between intermediate stations. The third version of the model describes the cargo transportation between dedicated initial departure station and final station. This version of the model describes the cargo on the long section of the route between the two node stations. In the problem of transport cargo organization node stations play the most important role, therefore, they have additional capacity. For such cargo transportation is important to describe rules of interaction of nodal stations with intermediate stations, and rules of interaction between intermediate stations. The fourth version of the model describes the cargo transportation in a circular chain of stations. For all versions of the model modes of freight satisfying given control system are studied. Such regimes are described by traveling wave type solutions of nonlinear finite-difference analogue of a parabolic equation. The possible modes of freight are described, the issue of stability of stationary regimes are investigated.
Novikov, E. A. and Padalko M. A. The Use of Radon and Fourier Transformation of Raster Images for Description and Tracking of Predefined Objects // Machine Learning and Data Analysis. 2015. V. 1, № 13. Pp. 1827 - 1843. doi:10.21469/22233792.1.13.05 As a rule, currently existing algorithms for describing and objects identification are aimed at solving the problem of definite type of objects in desired condition. However, search for all-in-one or more general approach to it is still quite interesting in the context of academic researches and from the perspective of implementation. Proposed in this article approach allows the object identification on a wide range of characteristics. The method is presented in the form of a general algorithm description and results of the experimental test of its efficiency. The main task of method development — is fast and quality processing of graphical data as dynamic images or video streams. Available for comparison methods are mainly used for object search in freeze-frame images while author's method is primarily aimed at working with video stream. There are no generally available videos or data on their processing with similar methods for comparative study as of this writing. Considering method allows a new way of getting a key features set of the image and a function for their comparison. It is based on a combination of classical methods of direct Radon conversion of image matrix, one-dimensional Fourier conversion to the corresponding projections and statistical analysis of integral Fourier coefficients, considered as objects' features of images.
Petrov, E. P., Kharina N. L., and Rzhanikova E. D. Combined nonlinear filtration of digital halftone high bitness images // Machine Learning and Data Analysis. 2015. V. 1, № 13. Pp. 1844 - 1852. doi:10.21469/22233792.1.13.06 The requirement for transfer of a large volume of information, such as a multibit digital images (DI), more quickly is an actual task and demands perfecting of radiocommunication means. One of the ways of a reduction of a DI transfer time is transition to a multiphase frequency modulation (FM) signals. However, their application is limited because of a noise stability loss at each division of a phase in comparison with binary FM signals. At the transfer of DI by the eight-phase FM signals, the time is reduced by four times, but with partial compensation of a noise stability loss. In this work, the algorithm of restoration of a multibit DI distorted by white Gaussian noise (WGN) is developed. The statistical redundance of the DI is efficiently used for compensation of a noise stability loss at the transfer of a digital images by multiphase FM signals. For example, the time of the DI transfer by four-phase signals was reduced twice without a noise stability loss in comparison with the DI transfer by the binary FM signals. The combined algorithm of a filtration of multidigit DI is constructed. It consists of two algorithms: a nonlinear filtration of DI distorted by WGN and the median filter for restoration of DI distorted by salt–pepper impulse noise. Due to separation of impulse noise and WGN, the impulse noise is efficiently suppressed by the median filter. The results of such combination allow to reduce transfer time of a multibit DI and to strive successfully against WGN and impulse noise.
Djukova, E.V. and Nikiforov A.G. On efficient parallelizing of the algorithms for discrete enumeration problems // Machine Learning and Data Analysis. 2015. V. 1, № 13. Pp. 1853 - 1865. doi:10.21469/22233792.1.13.07 Background: Approach to construction of efficient parallel algorithms for discrete enumeration problems is introduced in previous works of the authors. This approach is based on statistical estimations for computational tasks size. The approach is demonstrated on dualization, which is an intractable problem and consists in enumeration of irreducible coverings of a given boolean matrix. The main disadvantage of formerly suggested parallel schemes for asympotically optimal dualization algorithms is time-costly tasks size estimation method which considers only problem size. Methods: In this paper we develop a new parallel scheme for asymptotically optimal dualization algorithms, reducing time costs on statistical data collection. Statistical data is obtained via processing of a given matrix submatrices. Results: Task distribution is performed according to schedule calculated in advance. For this purpose a distribution of random variable, used for tasks size estimation, is fitted and the processor load level is optimized. A parallel scheme is applied to an asymptotically optimal algorithm RUNC-M. Conclusions: A new parallel scheme works no worse than formerly suggested ones, demonstrates an almost maximal speedup and makes it possible to dualize matrices of big size. However, this scheme is efficient only if the number of processors is significantly smaller than number of matrix columns.
Bakhmutova, I. V., Gusev V. D., Miroshnichenko L. A., and Titkova T. N. Parallel texts in the problem of deciphering of ancient Russian chant // Machine Learning and Data Analysis. 2015. V. 1, № 13. Pp. 1866 - 1876. doi:10.21469/22233792.1.13.08 The ancient Russian chants of XII–XVII centuries are presented in the neume writing form. The problem of chant translation into modern note writing is of deciphering character and, in the general case (chants without special marks that explain their singing value), is not yet solved. The number of “unreadable” ancient hymnals runs into the hundreds. The main difficulties of deciphering are connected with the polysemy of correspondence “neume–note.” The known examples of deciphering are few in number, made by manually, and refer to the separate hymn. The authors develop a new computer-oriented approach to the solution of this problem using the dvoyeznamenniks of the end of XVII – beginning of XVIII centuries where the chants are written in three (synchronized between each other) parallel text: in neumes, in notes, and in old Slavonic verses. The emphasis places on revealing in texts of dvoyeznamenniks not very long repeating chains of neumes that are interpreted either unambiguously (invariants) or with admissible deviations (quasi-invariants). On the basis of rather extensive learning material, the electronic dictionaries of invariants and quasi-invariants were constructed. The algorithm of deciphering of neumatic notation using these dictionaries was developed. The experiments on the control material have shown that at this stage (without appellation to the structural organization of neumatic hymnals), these dictionaries provide the deciphering of 60%–70% of neumatic text. The main features of the presented approach are: use of dvoyeznamenniks of the golden age period of Russian chant in different genres and orientation, in general, toward the neumatic notation without special marks from XVI–XVII centuries.
Molchanov, D. A., Kondrashkin D. A., and Vetrov D. P. Relevance tagging machine // Machine Learning and Data Analysis. 2015. V. 1, № 13. Pp. 1877 - 1887. doi:10.21469/22233792.1.13.09 In many classification or regression problems, there may be a lot of irrelevant features. Bayesian automatic relevance determination (ARD) is a popular approach to feature selection. However, the application area of this approach has been limited. In this paper, this approach is utilized in a more general case and it is applied to a binary classification problem with binary features. Also, a new binary classification model and a learning algorithm that can purge unwanted features from the model have been developed.
Vol. 1, №14, 2015
Chernykh, V. Y. and M.M. Stenina. 2015. Forecasting nonstationary time series under asymmetric loss. Machine Learning and Data Analysis 1(14): 1893-1909. doi:10.21469/22233792.1.14.01 The problem of forecasting time series under asymmetric loss functions is considered. A new two-step forecasting algorithm ARIMA + Hist is presented. At the first step, autoregression integrated moving average algorithm ARIMA with seasonal components is used. The parameters of the model are selected according to Box–Jenkins methodology. At the second step, the residuals are analyzed and optimal addition to the forecast of the first step which minimize the expected value of losses is found. Expected loss is estimated by convolution of loss function with the histogram of regression residuals. The performance of the algorithm is demonstrated on time series with different types of nonstationarity (i.e., trend or seasonality) and for different symmetric and asymmetric loss functions. The results obtained during this experiment show that the quality of the forecast of two-step ARIMA+Hist exceed the quality of usual ARIMA in case of asymmetric loss functions.
Ryazanov, V. V., A. P. Vinogradov, and Yu. P. Laptin. 2015. Using generalized precedents for big data sample compression at learning. Machine Learning and Data Analysis 1(14):1910-1918. doi:10.21469/22233792.1.14.02 In this paper the role of intrinsic and introduced data structures at constructing efficient recognition algorithms is analyzed. We investigate the concept of generalized precedent as representation of stable local regularity in data, and based on its use methods of reduction of the dimension of tasks. Two new approaches to the problem based on positional data representation and on cluster means for elementary logical regularities are proposed. Results of computational experiment with data compression in parametric spaces for several practical tasks are presented.
Solomatin, I. A. and I. A. Matveev. 2015. Detecting visible areas of iris by qualifier of local textural features. Machine Learning and Data Analysis 1(14): 1919-1929. doi:10.21469/22233792.1.14.03 A person recognition by the image of the iris is an actual problem. To increase the accuracy of recognition, usually areas of occlusion are detected in addition to locating of the iris as an annular region. The problem of occlusion detection can be set as the classification of pixels from annular region into two classes: ”iris” and ”occlusion”. In the annular region the segment with minimum dispersion of brightness is selected, which usually contains no occlusion (in this article this segment is not calculated and supposed to be a part of the input data). Then a classifier based on multivariate Gaussian is built and then it is trained on the training set, which is is set by local textural features of the pixels from this sector. The parameters of the classifier were optimized using genetic algorithm. The problems with noise and errors of classification in particular areas of the image are solved by applying morphological post-processing. A computational experiment was carried out and it allowed to obtain the distribution of the functional of quality of the algorithm.
Yankovskaya, A. E., A. V. Yamshanov, and N. M. Krivdyuk. 2015. 2-simplex prism — a cognitive tool for decision-making and its justifications in intelligent dynamic systems. Machine Learning and Data Analysis 1(14): 1930-1938. doi:10.21469/22233792.1.14.04 The cognitive tool 2-simplex prism is first proposed for application at decision-making and its justification in intelligent dynamic systems for different problem areas: medicine, biomedicine, ecogeology, education, road building etc. The idea of n-simplex application, the theorem for decision-making and its justification for intelligent systems proposed by A. Yankovskaya in 1990 year. Usage of 2-simplex prism for application at decision-making and its justification in intelligent dynamic systems based on test methods of pattern recognition and methods of fuzzy and threshold logics is described.
Bakhteev O. Y. 2015. Panel matrix and ranking model recovery using mixed-scale measured data. Machine Learning and Data Analysis 1(14): 1939-1960. doi:10.21469/22233792.1.14.05 We solve a decision-making problem in the field of operational research education. The paper presents a method for recovery of changes in ratings of student employees. These ratings are based on interviews at the IT training center. We consider a dataset consisting of expert estimates for assessments for different years and the overall rating for these students. The scales of the expert estimates vary from year to year, but the scale of the rating remains stable. One must recover the time-independent ranking model. The problem is stated as the object–feature–year panel matrix recovery. It is a map from student descriptions (or their generalized portraits) to expected ratings for all years. We also research a stability of the ranking model produced by the panel matrix. We propose a new method of panel matrix recovery. It is based on a solution of multidimensional assignment problem. To construct a ranking model we use an ordinal classification algorithm with partially ordered feature sets and an algorithm based on support vector machine. The problem is illustrated by the dataset containing the expert assessment of the student interviews at the IT center.
Sologub, R. A. 2015. Methods of the nonlinear regression model transformation. Machine Learning and Data Analysis 1(14): 1961 - 1976. doi:10.21469/22233792.1.14.06 The problem of the nonlinear regression models automatic construction and simplification has been addressed. The models describe the results of measurements and forecasting experiments. The generated models are designed for the approximation, analysis, and forecasting of the experimental results. To generate the models, the expert requirements in the subject field have been considered. This approach allows to get the interpretable models, adequately describing the given measurements. The goal of the paper is to investigate the problem of generation and simplification of the nonlinear regression models. The models are supposed to be the superpositions of the given parametric functions. A method of the function superposition transformation has been suggested. The superpositions category defined over the set of directed acyclic graphs corresponding to the superpositions have been considered. The isomorphic superpositions notion have been introduced and a method of their detection has been developed. An algorithm of finding the isomorphic subgraphs corresponding to the generated superpositions has been developed.
Vlasova, K.V., V.A. Pachotin, D.M. Klionskiy, and D.I. Kaplun. 2015. Estimation of radio impulse parameters using the maximum likelihood method. Machine Learning and Data Analysis 1(14): 1977 - 1990. doi:10.21469/22233792.1.14.07 The paper is devoted to the development of an algorithm for resolution and parameter estimation of radio impulses with partially overlapping spectra in the area of their non-orthogonality (the correlation coefficient varies from 0 to 0.9). The implementation of the suggested algorithm makes it possible to design filters for resolving frequency-dependent signals and, therefore, to increase the capacity of a communication channel. The maximum likelihood method has been used to obtain analytical expressions and to perform model investigations for frequency resolution of non-orthogonal signals. The dynamic range of signal parameter estimates has been found as a function of signal-to-noise ratio and correlation coefficient. It has been shown that the likelihood functional value in its global minimum allows one to estimate noise variance and the number of radio impulses in a received signal.
Efimov, Y. S. and I. A. Matveev. 2015. Iris border detection using a method of paired gradients. Machine Learning and Data Analysis 1(14): 1991 - 2002. doi:10.21469/22233792.1.14.08 Circular object detection is one of the challenging problems of modern computer vision systems. In this study, to search for circular representations of inner and outer boundaries of the iris, a method of paired gradients is used which is a modification of the Hough methodology. Image is processed with Canny filter and from the resulting boundaries, pairs of pixels are selected which have high probability to belong to one circle. Selection criteria and probability coefficients of likelihood are introduced for reduction of number of these pairs. The Hough transform uses two accumulators: the two-dimensional isomorphic to the original image in which voting is done by centers of segments defined by pixel pairs and one-dimensional histogram of the diameters where lengths of these segments are collected. Computational experiment is performed to check the efficiency of the algorithm on data from the public iris image databases and to compare the proposed method of paired gradients to the resembling antigradient voting method, which is also based on the Hough methodology and used for the eye center search. Drawbacks of the algorithm that may cause incorrect handling of some of the input images are identified. Further analysis of the proposed algorithm and increasing its stability are required.