1.
Alexandre Varnek and Igor Baskin
 Journal of Chemical Information and Modeling , 2012
This paper is focused on modern approaches to machine learning, most of which are as yet used infrequently or not at all in chemoinformatics. Machine learning methods are characterized in terms of the “modes of statistical inference” and...
This paper is focused on modern approaches to machine learning, most of which are as yet used infrequently or not at all in chemoinformatics. Machine learning methods are characterized in terms of the “modes of statistical inference” and “modeling levels” nomenclature and by considering different facets of the modeling with respect to input/ouput matching, data types, models duality, and models inference. Particular attention is paid to new approaches and concepts that may provide efficient solutions of common problems in chemoinformatics: improvement of predictive performance of structure–property (activity) models, generation of structures possessing desirable properties, model applicability domain, modeling of properties with functional endpoints (e.g., phase diagrams and dose–response curves), and accounting for multiple molecular species (e.g., conformers or tautomers). J. Chem. Inf. Model., 2012, 52 (6), pp 1413–1437
2.
Dmitry I. Osolodkin, Vladimir A. Palyulin, Nikolay S. Zefirov
 Chemical Biology & Drug Design , 2011
Comparative assessment of nine different scoring functions (OpenEye and Tripos implementation) applied to structurebased virtual screening based on rigid docking of the pregenerated conformations library of glycogen synthase kinase 3β (GSK3β)...
Comparative assessment of nine different scoring functions (OpenEye and Tripos implementation) applied to structurebased virtual screening based on rigid docking of the pregenerated conformations library of glycogen synthase kinase 3β (GSK3β) inhibitors has been carried out. The functions studied belong to the following types: Gaussian (Chemgauss3, Shapegauss), empirical (Chemscore, OEChemscore, Piecewise Linear Potential, Screenscore), force fieldbased (D_score and G_score), and potential of mean force (PMF_score). Overall enrichment of the large true inhibitors set against the set of true noninhibitors, Directory of Useful Decoys (DUD), cyclindependent kinase 2 subset, and NCI Diversity Set was evaluated by means of ROC (receiver operating characteristic) method. According to this analysis, scoring function Chemscore leads to the best enrichment of the inhibitors whereas the best early enrichment of the actives may be obtained with the help of Chemgauss3 function as estimated by BEDROC (Boltzmannenhanced discrimination of ROC) metrics. Volume 78, Issue 3, pages 378–390, September 2011
3.
I.I.Baskin, M.I.Skvortsova, I.V.Stankevich, N.S.Zefirov
 Journal of Chemical Information and Computer Science , 1995
It is proved that any molecular graph invariant (that is any topological index) can be uniquely represented as (1) a linear combination of occurrence numbers of some substructures (fragments), both connected and disconnected, or (2) a...
It is proved that any molecular graph invariant (that is any topological index) can be uniquely represented as (1) a linear combination of occurrence numbers of some substructures (fragments), both connected and disconnected, or (2) a polynomial on occurrence numbers of connected substructures of corresponding
molecular graph. Besides, any (0,l)valued molecular graph invariant can be uniquely represented as a linear combination (in the terms of logic operations) of some basic (0, 1)valued invariants indicating the
presence of some substructures in the chemical structure. Thus, the occurrence numbers of substructures in
a structure (or numbers indicating the presence or absence of substructures in a structure for the case of
(0,l)valued invariants) are shown to constitute the basis of invariants of labeled molecular graphs. A
possibility to use these results for the mathematical justification of substructuresbased methods in the
“structureproperty” problem is also discussed. J. Chem. Inf. Comput. Sci., 1995, V. 35, No. 3, P. 527531; DOI: 10.1021/ci00025a021
4.
A.Varnek, C.Gaudin, G.Marcou, I. Baskin, A.K.Pandey, I.V.Tetko
 Journal of Chemical Information and Modeling , 2009
Two inductive knowledge transfer approaches  multitask learning (MTL) and Feature Net (FN)  have been used to build predictive neural networks (ASNN) and PLS models for 11 types of tissueair partition coefficients (TAPC). Unlike conventional...
Two inductive knowledge transfer approaches  multitask learning (MTL) and Feature Net (FN)  have been used to build predictive neural networks (ASNN) and PLS models for 11 types of tissueair partition coefficients (TAPC). Unlike conventional singletask learning (STL) modeling focused only on a single target property without any relations to other properties, in the framework of inductive transfer approach, the individual models are viewed as nodes in the network of interrelated models built in parallel (MTL) or sequentially (FN). It has been demonstrated that MTL and FN techniques are extremely useful in structureproperty modeling on small and structurally diverse data sets, when conventional STL modeling is unable to produce any predictive model. The predictive STL individual models were obtained for 4 out of 11 TAPC whereas application of inductive knowledge transfer techniques resulted in models for 9 TAPC. Differences in prediction performances of the models as a function of the machinelearning method, and of the number of properties simultaneously involved in the learning, has been discussed. J. Chem. Inf. Model., 2009, V. 49, No. 1, P. 133144. DOI: 10.1021/ci8002914
