Vol. 4, №4, 2018
Beklaryan L. A., Beklaryan A. L. On the existence of soliton solutions for systems with a polynomial potential and their numerical realization // Machine Learning and Data Analysis, 2018, 4(4):220-234. doi:10.21469/22233792.4.4.01 The problem of existence of soliton solutions (solutions of the traveling wave type) for the Korteweg-de Vries equation with a polynomial potential is considered on the basis of the approach within which the presence of a one-to-one correspondence of such solutions with solutions of the induced functional differential equation of pointwise type is demonstrated. On this path, conditions for the existence and uniqueness of solutions of the traveling wave type, with the growth restrictions both in time and in space, arise. It is very important that the conditions for the existence of a traveling wave solution are formed in terms of the right-hand side of the equation and the characteristics of the traveling wave, without using either the linearization and spectral properties of the corresponding equation in variations. Conditions for the existence of periodic soliton solutions are considered separately, and the possibility of transition from systems with a quasilinear potential to systems with a polynomial potential with conservation of corresponding existence theorems is demonstrated. Numerical implementation of such solutions is given.
Murynin A. B., Richter A. A. Features of the application of methods and algorithms for the reconstruction of the three-dimensional shape of rigid objects according to the panoramic survey // Machine Learning and Data Analysis, 2018, 4(4):235-247. doi:10.21469/22233792.4.4.02 The paper discusses the methods for restoring the shape of three-dimensional objects of the earth's surface using periodic features of the structure of surfaces of rigid objects, applicable to both space and panoramic images of these objects. A brief review of previously developed methods for the restoration of three-dimensional objects in one image (based on metadata, based on standards, based on coordinate grids) is given. The main features of the panoramic shooting, the region and the limits of its applicability are given. Considered the possibility of joint use of space and panoramic shooting. A technique based on the selection of geometric periods on the surface of a rigid object and the evaluation of their geometric parameters is described. The example of the building shows the main structural elements, the geometrical parameters of the object, assessed during a panoramic survey. An example of the restoration of the three-dimensional model of the object.
Mandrikova O. V., Zalyaev T. L., Geppener V. V., Mandrikova B. S. Analysis of the neutron monitor data and allocation of the sporadic features on the basis of neural networks and wavelet transform // Machine Learning and Data Analysis, 2018, 4(4):248-265. doi:10.21469/22233792.4.4.03 Galactic cosmic rays observations are used in a number of fundamental and applied studies related to monitoring and forecasting space weather. The complex structure of cosmic rays data and incomplete a priori knowledge of processes in near-earth space make it difficult to construct effective methods for their analysis. The traditional spectral and averaging methods currently used allow to distinguish stable characteristics of cosmic rays dynamics, but are ineffective for studying thin sporadic changes. Modern global methods such as global survey method make it possible to identify dynamic features in CR with more accuracy, but they require laborious calculations and their automation is very difficult.
The present paper proposes a method and computational algorithms for analysis of cosmic ray data and detection of sporadic effects. The method is based on the use of the neural networks and the wavelet transform. The neural networks of vector quantization and multilayer perceptron are used. The efficiency of the application of the neural networks of vector quantization for the problem of classification of neutron monitor data in automatic mode is shown. A method for approximating of the cosmic ray time course is presented. The method is based on the neural network of a multilayer perceptron and the fast wavelet transform. A computational algorithm for the detailed analysis of neutron monitor data and detection of multiscale sporadic effects is described.
The results of the experiments showed the effectiveness of the application of the proposed methods for the analysis of GCR data and the allocation of sporadic effects. The proposed method can be implemented in an automatic mode for processing of the registered neutron monitor data and an operational assesment of the GCR level, which determines its applied significance.
The results of application of various NN architectures have shown the promise of using both feedforward multilayer NN and NN of vector quantization. In the future, the authors plan to carry out the approbation of constructed NN architectures on more representative statistics with the expansion of the number of analyzed data recording stations.
Voznesenskaya T. V., Lednov D. A. Automatic text summarization system using a stochastic model // Machine Learning and Data Analysis, 2018, 4(4):266-279. doi:10.21469/22233792.4.4.04 This paper is toward the system of automatic text summarization developed by «DC – Systems» company in cooperation with the faculty of computer science at HSE. The summary is a concise description of the text in terms of its content and meaning, i.e. from the point of view of its semantics. The purpose of the summarization is to reduce the text as much as possible while maintaining the main content. A summary in this article is built using syntactically correlated word combinations. In this case, the possible additional meanings of separate fragments of the text are neglected. The quality of the summary is evaluated by a matching to the source text in terms of semantics.
The main problem is split into two parts: an evaluation of the whole text semantics, without subdivision into parts, and the text transformation to derive an annotation. The architecture of the developed system and the main algorithm are described. An example of summary derived by the system and its quality evaluation has been provided. The current version of the system has following restrictions: it does not permit any formulas and special signs.
Ivanova A. S., Dvurechensky P. E., Gasnikov A. V. Composite optimization for the resource allocation problem // Machine Learning and Data Analysis, 2018, 4(4):280-290. doi:10.21469/22233792.4.4.05 In this paper we consider resource allocation problem stated as a convex minimization problem with linear constraints. To solve this problem, we use subgradient method and gradient descent applied to the dual problem and prove the convergence rate both for the primal iterates and the dual iterates. We also provide economic interpretation for these two methods. This means that iterations of the algorithms naturally correspond to the process of price and production adjustment in order to obtain the desired production volume in the economy. Overall, we show how these actions of the economic agents lead the whole system to the equilibrium.
Vol. 4, №3, 2018
Kirilyuk I. L., Senko O. V. Studies of the relationship between non-stationary time series using the production functions // Machine Learning and Data Analysis, 2018, 4(3):142-151. doi:10.21469/22233792.4.3.01 False regression occurs if the standard means of detecting patterns in models, such as the magnitude of the coefficient of determination, indicate the existence of a relationship between variables, while in reality it is absent.
The phenomenon of false regression was studied using the Cobb-Douglas model of production functions for time series data for regions of Russian Federation for the period of time 1996-2014.
Classical methods of linear regression analysis, such as the F-test or Student's test, were used together with methods of estimating stationarity and cointegration of variables belonging to multidimensional time series. Additional analysis was implemented by using Monte-Carlo techniques to simulate stationary time series as sequences of independent normally distributed independent random variables. Non-stationary time series, corresponding to unit root autoregretgression process, were generated, using independent equally distributed increments between previous and subsequent elements of time series. Statistical validation of regression models received at true data sets is based on comparing their quality with quality of models at stationary or non-stationary multivariate time series, generated under mutual independence of variables.
It is shown that the reliability of the dependence expressed by the Cobb-Douglas function is verified not only when using imitation processes of Gaussian white noise, but also when imitating non-stationary autoregression processes with a single root.
However using the described method not only to the initial data, but also to data with subtracted time dependence trends showed that effect has no other significant cause than presence of similar linear trends in time series associated with each factor.
Naumov V. A., Nelyubina E. A., Ryazanov V. V., Vinogradov A. P. Analysis and prediction of hydrological series based on generalized precedents // Machine Learning and Data Analysis, 2018, 4(3):152-164. doi:10.21469/22233792.4.3.02 The paper presents a new approach to the use of the apparatus of generalized precedents in the problems of analysis and prediction of hydrological series. Generalized precedents are computational tools that yield to use various a priori, directly observed or preferred for some or another reason local regularities in the data on a unified basis. The main stages of the scheme for applying generalized precedents are presented, and a close relationship is shown with the Hough transform scheme. The possibilities of comparison and joint analysis of meteorological data and actual data on the volume of river flow are investigated. In this case, the generalized precedents are typical nonlinear relationships between certain hydrological parameters. The goal is to identify the differentiation of the regions of the river basin by their accumulating capabilities. We show how this can be done on the basis of an analysis of time-limited contemporary statistics. Obtained flow characteristics in the regions can be further used for short-term forecasting of river level variations and other hydrological processes and phenomena, including flood and drought situations. These characteristics can also serve as an important factor in the study of ecosystems, geology of the region and other similar purposes.
Lange M. M., Lange A. M. On Information Theoretical Model for Data Classification // Machine Learning and Data Analysis, 2018, 4(3):165-179. doi:10.21469/22233792.4.3.03 A data classification model based on the average mutual information between a set of objects under classification and a set of decisions about classes of the objects is developed. An optimization of the model consists in minimization of the average mutual information over conditional distributions for the decisions subject to a given error rate. Finding this minimum is equivalent to calculation of the rate-distortion function in a scheme of coding the random discrete class labels that are transformed into the appropriate objects by a continuous observation channel with the known class-conditional probability densities. For the classification schemes by the decision rules without and with a reject option, the lower bounds to the rate-distortion functions are calculated. These bounds allow us to compare the potential attained error rates using the different sets of submitted objects and the different observation channels. The theoretical results are supported by experimental error rates for face recognition within the decorrelated components of RGB images.
Nosova S. A., Turlapov V. E. GLCM, kNN and Meanshift for neuron detection on Nissl-stained brain slice images // Machine Learning and Data Analysis, 2018, 4(3):180-191. doi:10.21469/22233792.4.3.04 The method for neuron detection on Nissl-stained brain slice images is proposed. The method uses textural features of neurons extracted from 4 GLC-matrices. The method includes the following steps: image preprocessing, kNN classification by the textural features and Meanshift clustering of neuron pixels.
Preprocessing includes the following steps: grayscale conversion, histogram equalization, histogram quantization. Gray conversion by blue component gives the best result. It is shown that using 2-,4- bin histogram gives close detection quality with 8-bin histogram (F1 =0,83..0,85).
For pixel classification kNN algorithm was used. The results demonstrate that kNN is better choice for current task in comparing with NBC.
The reached detection quality for given approach is precision=0,82, recall=0,92, F1=0,86. Is is shown that our results are near the same or some better in recall characteristic in comparing with other neuron detection method.
In our future work we'll prolong this investigation for great volume of dataset and special dataset for important diseases.
Tleubaev A. T., Stupnikov S. A. Application of machine learning methods for subject classification of the internet domains // Machine Learning and Data Analysis, 2018, 4(3):192-214. doi:10.21469/22233792.4.3.05 The paper is devoted to the application of machine learning methods for the automation Subject classification of the Internet"= domains. The specific task is to automatically assign the Internet"= domain to a category from a predefined hierarchical category tree. Various classifiers were used in the work, they proved themselves well in the work with strongly discharged feature spaces of large dimension. The characteristic spaces were formed on the basis of texts from the main pages of domains using the TF-IDF and N-gram approaches. Two approaches to the application of classification methods for solving the problem are developed: direct and multilevel. With a direct approach, a single classifier is used, for each domain its category is predicted, the category can be of any level in the category tree. At the multilevel approach the set of classifiers is applied: to each set of categories with one parent there corresponds the separate classifier. Classifiers are applied hierarchically~--- from root to leaf categories. A combination of the proposed approaches is also used. One of the practical applications of the work is user profiling based on the sites visited by them and further offering personalized advertising.
Vol. 4, №2, 2018
Mandrikova O. V., Fetisova N. V., Polozov Yu. A. Modeling and analysis of natural time series on the basis of general multicomponent model // Machine Learning and Data Analysis, 2018, 4(2):74-88. doi:10.21469/22233792.4.2.01 The work is focused on the development of methods for modelling and analysis of natural time series and the construction of automated systems on their basis. The present paper proposes a general multicomponent model (GMCM) of complex time series that allows describing irregular variations in the data. The GMCM recurrent component is represented by a parametric form and describes the regular time course of the data. The GMCM anomalous components are represented by nonlinear approximating schemes and describe irregular variations. On the example of the ionospheric critical frequency time series of F2-layer (data from the world network of ionospheric stations were used), the implementation of the model is described, and the results of its application are presented. A comparison with the IRI international empirical model and the median method confirmed the efficiency of the GMCM. The proposed GMCM, in contrast to analogs, allows us to detect anomalous changes in the data and to estimate their characteristics in automatic mode. The model is implemented numerically and is available from the Internet (http://aurorasa.ikir.ru:8580). The results of the research are important in the tasks of geophysical monitoring and operational forecast of space weather.
Vasiliev E., KomuroT., Turlapov V., Nikolsky A. One hand aerial gesture control for AR-based cardiac interventions // Machine Learning and Data Analysis, 2018, 4(2):89-96. doi:10.21469/22233792.4.2.02 We consider the problem of interaction of the operating surgeon with medical software during the operations.
We propose an interface that allows the user to interact with a cg model using one hand gestures, executing operations that can be performed with one hand are moving, zooming and rotating.
Using only one hand to interact with the model is more convenient for a person than using two hands.
We implemented three type of cursor: Without pointer (This option is suitable for touch panels, because in this case the user's finger acts as pointer), simple dot (Easy to create, but not very descriptive, especially when moving in three dimensions), virtual hand (Very descriptive in 3D space, but very difficult to create, off the shelf implementation is not available)
We created a demo application to show the advantages of this approach.
Chukanov S. N., Leykhter S. V. The matching of diffeomorphic images based on topological data analysis // Machine Learning and Data Analysis, 2018, 4(2):97-107. doi:10.21469/22233792.4.2.03 The problem of matching of a initial and terminal images, which is solved on the basis of the construction of a minimized functional characterizing the evolution of the diffeomorphic transformation of the image from a initial to terminal image, and the penalty for deviating the image path from the required trajectory, is considered in this paper. The form of the object is analyzed when recognizing object images using persistent homology methods. The shape characteristics determined by topological methods do not depend on the coordinate representation of the form under consideration and are invariant under diffeomorphic transformations. A distinctive feature of using persistent homologies with respect to methods of algebraic topology is obtaining more information about the form of the object.
Pyt'ev Yu. P., Falomkina O. V., Shishkin S. A., Chulichkov A. I. Mathematical formalism for subjective modeling // Machine Learning and Data Analysis, 2018, 4(2):108-121. doi:10.21469/22233792.4.2.04 The mathematical formalism for subjective modeling (MFSM) of uncertainty, which reflects the unreliability of the subjective information and its matter fuzziness, is created. The MFSM allows the researcher-modeler (r-m) to construct models using the unformalized, incomplete and inconsistent data ranging from the ``absolute ignorance'' up to the ``complete knowledge'' of the model of the research object (RO). Since the ``complete knowledge'' of the model is equivalent to the condition of the applicability of the ``standart'' modeling, the proposed MFSM significantly generalizes the ``standart'' mathematical modeling. If data related to the RO is available, the MFSM allows the r-m to use them to test the adequacy of the subjective model to the research objective, to correct the subjective model, and under certain conditions -- to empirically reconstruct the RO model.
Murashov D. M., Berezin A. V., Ivanova E. Yu. Painting canvas thread counting from images obtained in raking light // Machine Learning and Data Analysis, 2018, 4(2):122-135. doi:10.21469/22233792.4.2.05 This paper deals with the problem of painting thread counting from images. It is necessary to determine the characteristics used by art historians for dating works of art. In the last few years, automated algorithms for calculating canvas characteristics from x-ray and high-quality terahertz images have been developed. To control the fabric density in textile industry, microscopic photographs, obtained when the fabric sample is illuminated by a light transmitted source, are used. The peculiarity of our research is acquiring canvas images in raking light. This way of acquiring images allowed to emphasize the texture of the canvas in the specified direction. For the analysis of canvas sample images we propose modifications of known algorithm based on a filtering in the Fourier domain and thresholding, and the new algorithm based on localizing grayscale image ridges.
In known works, the number of threads is determined by the Fourier spectrum peaks or by the baselines in the canvas image. In this paper, the counting of threads is performed over all rows / columns of the image matrix, and a histogram is constructed based on the results. The desired number of threads is determined by the maximum of the histogram obtained. The use of histograms allows to reduce inaccuracy produced by artifacts obtained during image processing. For thresholding, Otsu and Niblack methods are applied. A computing experiment on the study of canvases of five portraits by Russian artists of the 18th century was carried out. The results of the experiment show the following.
The algorithm based on the Otsu method does not require parameters and has acceptable accuracy and high speed. However, on several images this algorithm gave an unacceptable result. The algorithm based on the Niblack method requires setting up two parameters and computationally expensive compared to the algorithm with the global threshold method, but showed on average a higher density measurement accuracy. The measurement algorithm based on localizing grayscale image ridges requires setting more parameters and significantly higher computational costs than other algorithms, but has shown the best result in measuring accuracy within the error limit acceptable for expertise and attribution of paintings.The researched algorithms provide the accuracy of measuring the canvas density from within one thread per centimeter for $70\--97$ percents of the sample images.
The results of the computing experiment correspond to the results of known algorithms for measuring canvas density from x-ray images of paintings.To improve the reliability the canvas density measurements in painting analysis is preferable to use several algorithms. Further research will be aimed at improving the accuracy and speed of the algorithms.
Vol. 4, №1, 2018
Voronov, A. D., Gromov, A. N., Inyakin, A. S., Zamkovoy, A. A. Verification of the expert assessments in revealing of the relevant exogenous factors affecting cargo transportation demand // Machine Learning and Data Analysis, 2018, 4(1):6-15. doi:10.21469/22233792.4.1.04 This research reveals exogenous factors that affect forecast amounts of railroad transportation to improve their relevance. The authors propose to include exogenous factors’ influence on making a forecast model is proposed. The expert assessments determine the relevance of the factors. This research proposes reliability assessment methods and methods of revealing structure and type of influence of exogenous factors on the amount of cargo transportation. Results systematize the expert review on the influence of exogenous factors on forecast amounts of cargo transportation. The technique of conducting an expert analysis of the importance and the type of influence on the cargo transportation demand was described.
Voronov, A. D., Gromov, A. N., Inyakin, A. S., Zamkovoy, A. A. Forecasting amount of demand for cargo transportation for stationar time series // Machine Learning and Data Analysis, 2018, 4(1):16-35. doi:10.21469/22233792.4.1.05 The properties of prognostic models of volumes of demand for freight rail transportation with the purpose of structuring processes in the field of management and planning of freight rail transportation are investigated. The paper proposes four models for forecasting the volumes of demand for freight rail transportation, taking into account the specificity of the measured data, business processes and standards of the industrial partner. When constructing models, multivariate statistical analysis and forecasting of interdependent time series are used. The properties of the constructed models are analyzed. Forecasts are made in the sections of day, week, month for stations and regions. The proposed prognostic models are compared by the criteria of the average absolute and average percentage error.
Dulin, S. K., Yakushev, D. A. Developing of maps, based on mobile laser scanning data, for security locomotive devices and control systems to manage electric trains traffic // Machine Learning and Data Analysis, 2018, 4(1):36-43. doi:10.21469/22233792.4.1.01 The task of forming common electronic maps for various locomotive safety devices and con- trolling the movement of electric trains is extremely urgent, its implementation is designed to improve traffic safety. New possibilities of map formation are provided by the complex system of spatial data of the railway transport infrastructure (CSSD RTI), into which the coordinates of all technogenic objects are entered based on the results of processing mobile laser scanning data obtained in a high-precision coordinate system. As information, the points of reflection from all objects, measured with sub-centimeter accuracy and annotated with photographs, coordinated in three-dimensional space, make it possible to identify all the technogenic objects that are significant from the point of view of traffic safety in the railway transport.
Yakushev, D. A. 3D-modeling of technical condition of railway engineering objects with Bentley Systems software // Machine Learning and Data Analysis, 2018, 4(1):44-51. doi:10.21469/22233792.4.1.02 The lack of a unified measurement system and the low accuracy of design documentation, which establishes requirements only for the minimum size and inter-positions, as well as the current system of assessing the state of technogenic infrastructure facilities, which determines only the indicator of grossness, not related to the spatial position, does not leave even the theoretical possibility to implement design solutions during construction and to maintain infrastructure in the design position during operation. The technology of informational modeling of technogenic objects of the railway transport infrastructure in the three-dimensional coordinate space is intended to change the situation. For example, created in 2016, three-dimensional models in the MCC sites allowed to identify serious differences of the constructed object with project documentation.
Koltsov, P. P., Osipov, A. S., Sotnezov, R. M., Chekhovich, Yu. V., Yakushev, D. A. П. П. Кольцов, А. С. Осипов, Р. М. Сотнезов, Ю. В. Чехович, Д. А. Якушев Fundamental problems of empirical estimations for computer vision // Machine Learning and Data Analysis, 2018, 4(1):52-68. doi:10.21469/22233792.4.1.03 The paper deals with the comparative study of image processing analysis algorithms implemented in the software and hardware based security systems. The main principles of EDEM methodology, implemented for this purpose, are considered with the focus on elements of the fuzzy set theory used for the comparative evaluation. In particular, the concepts of fuzzy ground truth images and fuzzy similarity measures are considered. Some examples of application of EDEM methodology, including the evaluation of algorithms used for solving some rail security tasks are given.