Vol. 2, №1, 2016
Lange, M. M., S. N. Ganebnykh, and A. M. Lange. 2016. Algorithm of approximate search for the nearest digital array in a hierarchical data set. Machine Learning and Data Analysis 2(1): 6-16. doi:10.21469/22233792.2.1.01 An algorithm of approximate fast search in a given set of multidimensional digital arrays for the nearest neighbor of a submitted array is suggested. A search error is defined by a ratio of a difference of distances from a submitted array to the really found array and to the nearest neighbor relative to the distance to the nearest neighbor. The proposed algorithm uses pyramid-based multiresolution representations of the arrays and a hierarchical search strategy. For a large linear size of the arrays and a large cardinality of the data set, an asymptotic computational gain of the approximate search algorithm with respect to the exact search algorithm is estimated. Given data set of grayscale handwritten digit images taken from the MNIST database, a mean search error, a standard deviation of the search errors, and a computational complexity of the algorithm as the appropriate functions of the search parameter are experimentally estimated. Using these estimates, a dependence of the mean search error on the computational complexity is calculated.
Isachenko, R. V. and A. M. Katrutsa. 2016. Metric learning and dimensionality reduction in clustering. Machine Learning and Data Analysis 2(1): 17-25. doi:10.21469/22233792.2.1.02 This paper investigates incorporation of metric learning approach in clustering problem. Distance metric is a key issue in many machine learning algorithms, especially in unsupervised learning where distance between objects is the only known information. The metric learning procedure modifies distances between objects to make objects from the same cluster closer and from the different clusters more distant. In this paper, Mahalanobis distance is used as a distance between objects. The goal of the paper is to learn Mahalanobis metric by optimizing the covariance matrix of objects according to their cluster labels. In this case, metric learning procedure is formulated as optimization problem. For clustering, k-means were used as baseline algorithm and Adaptive Metric Learning (AML) algorithm. To solve the problem, AML algorithm uses iterative EM (expectation-maximization) procedure to find the optimum. To compare these algorithms, the computational experiment was carried out in MatLab on synthetic data and real data from UCI repository and conclusions about performance of these algorithms have been made.
Senko, O.V., A.M. Morozov, A.V. Kuznetsova, and L. L. Klimenko. 2016. Evaluating of multiple testing effect in method of optimal valid partitioning. Machine Learning and Data Analysis 2(1): 26-38. doi:10.21469/22233792.2.1.03 Development of methods for statistically valid regularities discovery is one of the most important data mining problems. One of the possible techniques of regularities search is method of optimal valid partitioning (OVP), using permutation test for statistical verification. In high-dimensional tasks, verification becomes more complicated task due to the problem of multiple testing. Standard Bonferroni correction is based on very strong validity thresholds that rarely are practically achievable when dimension is greater than 100. Set of Monte-Carlo experiments was conducted to evaluate true validity of found regularities in the following biomedical task: study of relationship between vessels growth factor (VEGF) levels and wide set of biological indicators. Set of regularities found in initial data set was compared with sets of regularities that were found in 50 random data sets. At that random data sets were generated from initial data set by random permutations of the target variable positions with fixed positions of explanatory variables vectors. It was shown in experiments that fraction of two-dimensional regularities that are valid at uncorrected significance level is 10-30 times less than Np where Np is the number of enumerated pairs of explanatory variables. Some ways to soft validity thresholds are discussed.
Bernshtein, J.D., O. S. Brusov, and I.A. Matveev. 2016. Methods for in vitro determination of coagulation and fibrinolysis characteristics using the blood plasma images sequence. Machine Learning and Data Analysis 2(1): 39-48. doi:10.21469/22233792.2.1.04 The problem of quantifying the characteristics of a fibrin clot in the trombodynamics method is being solved. The initial data of the method are the sequences of digital images of the cell filled with blood plasma and located in the thrombodynamics registrar, in which the clot is growing and resorbing from the activator, made at regular intervals. Activator’s boundaries, speed of growth and resorption of the fibrin clot, time changes of its size and density, and the moment of the clot’s separation from the activator are determined. Methods of binarization, mathematical morphology, and image projections are used to select clots in the image. The set of measured parameters and their temporal dynamics may be used for medical diagnostic potential of fibrinolysis and coagulation.
Genrikhov, I. E. 2016. Synthesis of full decision tree with using heterogeneous systems on the basis of CUDA technology. Machine Learning and Data Analysis 2(1):49-69. doi:10.21469/22233792.2.1.05 The article is devoted to classification algorithms based on full decision trees. Due to the decision tree construction under consideration, all the features satisfying a branching criterion are taken into account at each special vertex. The main drawback of full decision tree is significantly more time for synthesis of tree compared with the classical decision tree. The issues are considered of reducing the time for synthesis of full decision tree using CUDA technology. This technology allows of the used large number of GPU cores to speed up the execution of complex computing. The results of the testing are given on model and real tasks. It is shown that the use of CUDA technology allows to significantly reduce the time for synthesis of the full decision tree (more than 10 times) only in cases where the training set contains large number of attributes and/or learning objects and the information are either real-valued or integer large complexity.
Lange, M.M., S.N. Ganebnykh, and A.M. Lange. 2016. Multiclass Pattern Recognition in a Space of Multiresolution Representations. Machine Learning and Data Analysis 2(1):70-88. doi:10.21469/22233792.2.1.06 For a multiclass source of patterns given by images, a metric classification scheme in a space of tree-structured pattern representations is suggested. At the successive resolution levels, in a set of the pattern representations, both a family of dissimilarity measures at the successive levels and discriminant functions (class likelihoods) by the appropriate measures are defined. A decision of the multiclass classifier is made by voting the values of the discriminant functions. Also, a reject is available. A learning procedure that includes a selection of the template patterns in classes as well as an optimization of the classifier parameters is developed. A parametric decision algorithm that includes both hierarchical and exhaustive search strategies for the decision in the multilevel network of the templates is constructed. Analytical estimation of a computational complexity of the algorithm is obtained. For a composite source of the patterns given by signatures, hand gestures, and faces, an efficiency of the classifiers with different parameters is shown by the appropriate ROC curves as well as by empirical dependencies of the error rate on the computational complexity of the decision algorithm.
Mandrikova O.V., Zalyaev T. L., Polozov Yu.A., and I. S. Solovev. 2016. Modeling and analysis of cosmic ray variations during periods of increased solar and geomagnetic activity. Machine Learning and Data Analysis 2(1):89-103. doi:10.21469/22233792.2.1.07 In the present paper, we describe a method for analysis of cosmic ray variations, which allows us to allocate anomalous changes and obtain quantitative estimates of their occurrence time, duration and intensity. The method includes decomposition of neutron monitor data based on wavelet transform and their approximation based on adaptive variable structure neural networks. By using this method, we performed analysis of cosmic ray variations during periods of increased solar and geomagnetic activity and allocated anomalous changes that occurred a few hours before geomagnetic storms. Long and deep Forbush decreases took place during the storms (neutron monitor data from Apatity and Cape Schmidt stations were analyzed). Cosmic ray data were analyzed together with geomagnetic field variations and ionospheric parameters, which processing was performed on the basis of methods proposed by the authors.
Beklaryan, L.A., A. S. Akopov, A. L. Beklaryan, and A.K. Saghatelyan. 2016. Agent-based simulation modelling for regional ecological-economic systems. A case study of the Republic of Armenia. Machine Learning and Data Analysis 2(1):104-115. doi:10.21469/22233792.2.1.08 In the article actual problems of modeling of ecologic-economic systems on the example of the Republic of Armenia (RA) are considered. Based on methods of agent modeling and system dynamics the simulation model of ecological-economic system which has allowed constructing the RA Ecological Map was created. The important purpose of the offered approach is search of scenarios of rational modernization of the agent-enterprises, which are the main sources of emissions of emissions with simultaneous definition of effective strategy of the government regulation. The bi-criterial optimization problem for the ecological-economic system of RA is formulated and solved with the help of the developed genetic algorithm.
Genrikhov, I.E., E.V. Djukova, and V. I. Zhuravlyov. 2016. About full regression decision trees. Machine Learning and Data Analysis 2(1):116-126. doi:10.21469/22233792.2.1.09 Background: The regression restoration problem is considered. The approach, based on the construction of regression trees is highlighted among the existing approaches. The most known among algorithms of regression trees synthesis (e.g. algorithms CART and Random Forest) are based on use of the elementary trees, namely binary regression trees. Rarely k-ary regression trees are used. Only one feature which is meeting the selected criteria of branching is selected in the synthesis of such trees, and the branching is carried out using this feature. However, only one feature is chosen (randomly) in case when several features are equally or almost equally meeting the selected criteria in construction of regression trees. Thus, the constructed trees depending on the selected feature can significantly vary both on structures of the used features and on its recognition qualities. Methods: A new approach to the construction of regression trees based on the construction of the so-called full decision tree is applied. Originally, the approach to the synthesis of full decision trees was investigated only on the precedents classification problems and presented improved quality in comparison with the known methods of synthesis of decision trees. Socalled full node is built on every iteration in the full decision tree. A set of features corresponds to the full node, and each feature meets the selected branching criterion. Then, the simple internal node from which the branching is carried out is built for each feature of this set. In comparison with the classical construction the full decision tree allows to use more fully the available information. Herewith, the description of the recognizable object may be generated not by only one branch, as in a classical tree, but by several branches. Results: Two synthesis algorithms of regression trees — NBFRTree and NBRTree are developed. The NBRTree algorithm builds classic k-ary regression tree using a statistical criterion selection feature. The NBFRTree algorithm is an improvement of the NBRTree with the approach to the full decision trees synthesis — at each step a set of features, which are equally or nearly equally meet a statistical criterion, on which branching is carried out is selected. It is shown that the best results were received when the NBFRTree algorithm was used. A comparison of 18 real problems of NBFRTree and NBRTree algorithms with known regression trees synthesis algorithms, such as the Random Forest, Decision Stump, REPTree and CART is carried out. It is shown that quality of the NBFRTree algorithm is higher than quality of the Decision Stump, REPTree and CART algorithms, and it doesn’t concede on quality to the Random Forest algorithm, and in some cases also shows the best results. Concluding Remarks: It is shown that the applied in this work approach to the regression trees synthesis for the solving of the regression restoration problem — full regression trees, can be successfully used on an equal basis with other known approaches to regression trees synthesis.
Vol. 2, №2, 2016
Dvoenko, S.D. and D.O. Pshenichny. 2016. Feature Grouping Based on the Optimal Sequence of Correlation Matrix Minors 2(2):?? - ??. doi:10.21469/22233792.2.2.01 Background: It is known, data analysis problems usually arise in early stages of investigations, when a model of a phenomenon in researching has not been developed yet. Hence, it is too early to introduce a problem of a model identification. It needs to collect and study a lot of miscellaneous information about most significant characteristics of a phenomenon under investigation in this case. Such a situation forces us to use inconsistent approach, since we do not know what characteristics are important, and what knowledge needs to be collected. Therefore, data analysis methods must resolve the contradiction and focus on the correct description of the phenomenon. The problem of informal interpretation of factors and groups arises in the grouping problem. Factors are synthetic features and difficulties can arise in informal interpretation of them. Therefore, after groups and corresponding factors have been built the representative usually is defined for each group as a feature, the most correlated with the group factor. As a result, it is possible to name groups informally as such initial features. Methods: The new approach to specify a feature subset is proposed to represent correctly hidden factors. In this approach, it does not need to define eigenvectors or centroid ones as intermediate transformations. It is based on the optimal sequence of correlation matrix minors, since the less correlated features are placed at the beginning of the sequence, and the more correlated ones are placed closer to the end of it. Results: As it is shown, the proposed approach can produce initial partitioning for other grouping algorithms, and additionally can be used to evaluate a number of groups and to get informal partitions. Concluding Remarks: As it is evident, the natural hidden regularity in the phenomenon under investigation appears undoubtedly because of processing data by different techniques and algorithms targeted to uncover it. All such results as a whole will support the correct result. Therefore, it needs to support and develop the diversity of data processing intelligent methods. In this paper, we try to do it. It is the relevant attempt since collected large volume of experimental data, and developed methods for pairwise comparisons.
Talipov, K. I. and I.A. Matveev. 2016. Eyelids and eyelash detection based on clusterization of vector of local features 2(2):?? - ??. doi:10.21469/22233792.2.2.02 We attempt to solve the problem of extracting areas where the iris is occluded by various objects. Initial data consists of an image of iris and a circle approximating the boundary between the sclera, the iris and the pupil. Calculation of local texure features and clusterizing the data based on the extracted information is proposed as a solution method. Two main goals of this particular work are to introduce an effective algorithm for occluded point detection and to study the possibility of their segmentation without a preset texture model. The algorithm’s performance is illustrated with results on various iris image datasets.
Chigrinskiy, V.V., Y. S. Efimov, and I. A. Matveev. 2016. Fast algorithm for determining pupil and iris boundaries 2(2):?? - ??. doi:10.21469/22233792.2.2.03 The paper presents a method of pupil and iris boundaries determining on eye images. The purpose is to find out the parameters of approximating circles, namely coordinates of centers and radiuses. To solve the problem several steps are implemented: morphological processing and a binarization of the input image, determining the pupil parameters, detecting image edges with the Canny edge detector and determining the iris parameters using a density of a points distribution by these distances to the just found pupil center. The mixture of the 2331 different iris images is used to test the algorithm.
Ianina, A.O. and K.V. Vorontsov. 2016. Multimodal topic modeling for exploratory search in collective blog 2(2):?? - ??. doi:10.21469/22233792.2.2.04 Exploratory Search is a new paradigm in information retrieval focused on the acquisition and systematization of knowledge by professionals, unlike major Web search engines that answer short text queries of mass users. We develop an exploratory search engine based on probabilistic topic modeling for seeking information thematically relevant to the long text queries. We use Additive Regularization for Topic Modeling (ARTM) in order to combine many requirements such as sparsity, diversity, and interpretability of topics and incorporate heterogeneous modalities such as authors, tags, and categories into the model. We use the parallelized online implementation of ARTM in open source library BigARTM (bigartm.org). The thematic search is implemented by maximizing cosine similarity between query and document both represented by their sparse distributions over topics. We evaluate precision and recall of the thematic search by a two-step procedure. First, human assessors perform exploratory search tasks manually using any available search utilities (it takes them about 30 minutes per task in average). Second, they evaluate the relevance of search results found by our thematic search engine for the same tasks. The experiments on the collection of 132K articles from habrahabr.ru collective blog showed that thematic search provides comparable precision and better recall, also reducing search time from half an hour to seconds. With data labeled by assessors, we determine the optimal number of topics and show that the joint use of all modalities (authors of articles, authors of comments, tags, and hub categories) significantly improves the search quality.
Chirkova, N. A. and K.V. Vorontsov. 2016. Additive Regularization for Hierarchical Multimodal Topic Modeling 2(2):?? - ??. doi:10.21469/22233792.2.2.05 Probabilistic topic models uncover the latent semantics of text collections and represent each document by a multinomial distribution over topics. Hierarchical models divide topics into subtopics recursively, thus simplifying information retrieval, browsing and understanding of large multidisciplinary collections. The most of existing approaches to hierarchy learning rely on Bayesian inference. This makes difficult the incorporation of topical hierarchies into other types of topic models. We use non-Bayesian multi-criteria approach called Additive Regularization of Topic Models (ARTM), which enables to combine any topic models formalized via log-likelihood maximization with additive regularization criteria.
In this work we propose such formalization for topical hierarchies. Hence we can easily adapt the hierarchical ARTM to a wide class of text mining problems, e. g. for learning topical hierarchies from multimodal and multilingual heterogeneous data of scientific digital libraries or social media.
We focus on topical hierarchies that allow a topic to have several parent topics, which is important for multidisciplinary collections of scientific papers. The regularization approach allows us to control the sparsity of the parent-child relation and automatically determine the number of subtopics for each topic. Before learning the hierarchy we need to fix the number of topics for each layer. The additive regularization does not complicate the learning algorithm, so this approach is well scalable on large text collections.
Chuvilin, K.V. 2016. Parametric approach to the construction of syntax trees for partially formalized text documents 2(2):?? - ??. doi:10.21469/22233792.2.2.06 This article investigates the possibility of logical structure (abstract syntax tree) automatic construction for text documents, the format of which is not fully defined by standards or other rules common to all the documents. In contrast to the syntax described by formal grammars, in such cases there is no way to build the parser automatically. Text files in LATEX format are the typical examples of such formatted documents with not completely formalized syntax markup. They are used as the resources for the implementation of the algorithms developed in this work. The relevance of LATEX document analysis is due to the fact that many scientific publishings and conferences use LATEX typesetting system, and this gives rise to important applied task of automation for categorization, correction, comparison, statistics collection, rendering for WEB, etc.
The parsing of documents in LATEXformat requires additional information about styles: symbols, commands and environments. A method to describe them in JSON format is proposed in this work. It allows to specify not only the information necessary to pars, but also meta information that facilitates further data mining. This approach is used for the first time. The developed algorithms for constructing a syntax tree of a document in LATEX format, that use such information as an external parameter are described.
The results are successfully applied in the tasks of comparison, auto-correction and categorization of scientific papers. The implementation of the developed algorithms is available as a set of libraries released under the LGPLv3. The key features of the proposed approach are: flexibility (within the framework of the problem) and simplicity of parameter descriptions.
The proposed approach allows to solve the problem of parsing documents in LATEX format. But it is required to form the base of style element descriptions for widespread practical use of the developed algorithms.
Bondur,V.G., A.B. Murynin, and V.Yu. Ignatiev. 2016. Parameters optimization in the problem of sea-wave spectra recovery by airspace images 2(2):?? - ??. doi:10.21469/22233792.2.2.07 The problem of the sea surface spectra reconstruction on aerospace images over a wide wavelength range is considered. Within the described non-linear model of the brightness field, registered by remote sensing equipment, we propose a modification of recovery operator, which acts in the whole spatio-spectral domain. We present the iterative process of selecting the optimal values of the modified parameters for the operator using the ground truth measurements for validation. The results of the test performance for constructed operator under different registration conditions of the sea surface images are presented.
Murashov, D.M. 2016. Application of information-theoretical approach for image segmentation 2(2):?? - ??. doi:10.21469/22233792.2.2.08 In this paper, a problem of segmentation quality of digital images is considered. The developed technique is based on the information-theoretical approach and applied to a modified superpixel segmentation algorithm.
In one of the conventional techniques weighted uncertainty index is used for measuring segmentation quality. The index is calculated using normalized mutual information of color channels in given and segmented images. The uncertainty index varies monotonously depending on parameter of the segmentation algorithm. This caused application of learning technique and iterative procedure for choosing parameter value.
In this work, information redundancy measure is proposed as a criterion for optimizing segmentation quality. This criterion provides the best result in terms of visual perception. It is shown that proposed method of constructing the redundancy measure provides it with extremal properties. Experiment was conducted using the images from the database Berkeley Segmentation Dataset. The experiment confirmed that the segmented image corresponding to a minimum of redundancy measure, produces the minimum difference in the informationtheoretical dissimilarity measure when compared with the original image. In addition, the segmented image that was selected using the proposed criteria, gives the highest similarity with the groundtruth segmentations, available in the database.
Efimova, V.A., A.A. Filchenkov, and A.A. Shalyto. 2016. Reinforcement-based Simultaneous Classification Model and its Hyperparameters Selection 2(2):?? - ??. doi:10.21469/22233792.2.2.09 Many algorithms for data analysis exist, especially for classification problem. To solve a data analysis problem, a proper algorithm should be chosen, and also its hyperparameters should be selected. These two problems, algorithm selection and hyperparameter optimization, are commonly solved independently. The full-model selection process requires unacceptable time budgets. Thus, this is one of the factors preventing the spread of automated model selection methods.
The goal of this work is to suggest a method for simultaneous algorithm and its parameters selection to reduce full-model selection time. In order to do so, we reduced this problem to a multi-armed bandit problem. We consider an algorithm as an arm and algorithm hyperparameters search during a fixed time a as the corresponding arm play. We also described several reward functions.
We held experiments on ten popular labeled datasets from the UCI repository. To compare the proposed method, we used several well-known classification algorithms from WEKA library and algorithm for hyperparameter optimization from Auto-WEKA library. We compared the proposed method with the brute force search implemented in WEKA library and a random time budget assignment policy. The results show significant time reduction of selecting proper algorithm and its hyperparameters for processing given dataset. The proposed method often produces classification results much better than Auto-WEKA state-of-the-art automatic algorithm selection and hyperparameter optimization tool.
Vol. 2, №3, 2016
Starozhilets V.M. and Yu.V. Chekhovich. 2016. Aggregation of data from different sources in traffic flow tasks 2(3):?? - ??. doi:10.21469/22233792.2.3.01 In this paper we study data aggregation problem, where data are taken from GPS-tracks and traffic detectors. Aggregated data are used to state and solve finite differences equation corresponding to the chosen traffic flow mathematical model. We separate the problem in two ones: the first one is about highway data and the second one is about entrances and exits data.
We propose to use a linear model to estimate speed and number of cars used highway data taken from GPS-tracks and traffic detectors. The quality criteria are mean squared error and correlation coefficient. Note that, the built model can be used on highway data, which do not have data from traffic detectors, but have only data from GPS-tracks. We develop for entrances and exits data a method to recover summary total flow. This method is based on the preservation of cars in transport network.
We provide computational experiment for both problems on real data and demonstrate performance of the proposed approaches. Data from GPS-tracks were provided by Yandex.Traffic, data from traffic detectors were provided by Moscow traffic management centre. Moscow Ring Road was used as a highway.
Sulimova, V.V., O. S. Seredin, and V.V. Mottl. 2016. Metrics on the basis of optimal alignment of biomolecular sequences 2(3):?? - ??. doi:10.21469/22233792.2.3.03 Background: It is important for biomolecular sequences analysis to have an appropriate way for comparing them. From the point of view of advanced methods of data analysis the most preferred way for comparing objects is a dissimilarity measure, possessing metric’s properties. From the other side, from the point of view of the molecular biology, it is important to take into account biological features of compared objects. Besides, the computational effectiveness and the possibility of further using convenient instruments of data analysis are also important. There are a number of ways for comparing biomolecular sequences. Though, no one of them possess the all required properties.
Methods: This paper proposes a simple enough way for computing metrics for biomolecular sequences. The proposed approach, following traditional ways for biomolecular sequences comparing, is based on finding an optimal pair-wise alignment and on the model of mutual changes of amino acids at the process of evolution.
Concluding Remarks: At this paper we prove that the proposed dissimilarity measure is a metric. So it can be used at the advanced methods of data analysis, saving computational advantages of SVM without introducing features of objects or(and) an inner product. The experimental results confirm usability of the proposed metric for membrane glycoprotein classification.
Nedel’ko, V.M. 2016. Investigation of effectiveness of several linear classifiers by using synthetic distributions 2(3):?? - ??. doi:10.21469/22233792.2.3.04 The most common way to compare the effectiveness of data analysis methods is testing on tasks from UCI repository. However, this approach has several disadvantages, in particular, the incompleteness of the set of tasks and limited sample sizes. In this paper we consider the possibility of building a repository of probabilistic distributions. The distributions are constructed purposefully in such a way as to reveal properties of the studied methods. We will call such distributions as probabilistic models. We choose some linear classification methods for research: logistic regression, Fisher discriminant and SVM. We constructed several probabilistic models to investigate properties of these methods, in particular, for each method we build a model on which this method outperforms the other methods. In addition, these models allow us to explain why a particular method was best.
Fedotov, N.G., A.A. Syemov, and A.V. Moiseev. 2016. New method for 3D images intelligent analysis and recognition: description and examples 2(3):?? - ??. doi:10.21469/22233792.2.3.05 Background: In this article a new approach to the 3D objects’ recognition is proposed. A detailed mathematical description of method developed on the above approach basis is shown. Hypertrace transform technique scan is described, and the scanning element choice is substantiated. The principles of 3D images intellectual analysis and recognition built on its basis are analyzed.
Methods: The suggested method is based on the theories elements of stochastic geometry and functional analysis. Hypertrace transform has many advantages and data mining capabilities. For example, one of the suggested method intellectual capabilities is the construction of different structure hypertriplet features ("long" and "short" features). Different types of features is reflected in the principles of 3D images intelligent analysis and recognition (verifiability and falsifiability of images).
Results: Due to only theoretical and conceptual article orientation practical results are missing. The theoretical examples description of verification "long" features and falsification "short" features of images is given. Their differences and practical application specificities are substantiated.
Concluding Remarks: Hypertrace transform has a unique ability which is a similar possibility of human visual system when at sufficiently brief glance people quickly can distinguish from each other two spatial objects. This fact increases the scanning system speed and the image recognition system reliability in general, improving the intellectual abilities hypertrace transform.
Vladimirova, M. R. and M. S. Popova. 2016. Bagging of Neural Networks for Analysis of Nuclear Receptor Biological Acivity 2(3):?? - ??. doi:10.21469/22233792.2.3.06 The paper is devoted to the multitask classification problem. The main purpose is building an adequate model to predict whether the object belongs to a particular class. Precisely, whether the ligand binds to a specific nuclear receptor. Nuclear receptors are a class of proteins found within cells. These receptors work with other proteins to regulate the expression of specific genes, thereby controlling the development, homeostasis, and metabolism of the organism. The regulation of gene expression generally only happens when a ligand — a molecule that effects the receptor’s behavior — binds to a nuclear receptor. Two layer neural network is used as a classification model. The paper considers the problems of linear and logistic regressions with squared and cross-entropy loss functions. To analyze the classification result we propose to decompose the error into bias and variance terms. To improve the quality of classification by reducing the error variance we suggest the composition of neural networks: the bagging procedure. The proposed method improves the quality of investigated sample classification.
Odinokikh, G.A., V. S. Gnatyuk, M.V. Korobkin, and V.A. Eremeev. 2016. Eyelid Position Detection Method for Mobile Iris Recognition 2(3):?? - ??. doi:10.21469/22233792.2.3.07 Information about eyelid position in an image is used during iris recognition the for eyelid and eyelash noise removal, iris image quality estimation and other purposes. Eyelid detection is usually performed after iris-sclera boundary localization which is a fairly complex operation itself. If the authentication is working on a hand-held device, this order is not always justified, mainly because of the device limited performance, user interaction difficulties and highly variable environmental conditions. In this case the eyelid position information could be used to determine whether the image should be passed for the further complex processing operations. This paper proposes a method of eyelid position detection for iris image quality estimation and further complete eyelid border localization and compares its performance with several similar existing methods on four open datasets.
