Der-Chen Chang, Ophir Frieder, Chi-Feng Hung, Hao-Ren Yao, The analysis from nonlinear distance metric to kernel-based prescription prediction system

Full Text: PDF
DOI: 10.23952/jnva.5.2021.2.01

Volume 5, Issue 2, 1 April 2021, Pages 179-199

Abstract. The distance metric and its nonlinear variant play a substantial role in machine learning, particularly yoso in building kernel functions. Often, the Euclidean distance with a radial basis function (RBF) is used to construct a RBF kernel for nonlinear classification. However, domain implications periodically constrain the distance metrics. Specifically, within the domain of drug efficacy prediction, distance measures must account for time that varies based on disease duration, short to chronic. Recently, a distance-derived graph kernel approach was commercially licensed for drug prescription efficacy prediction. The analysis of the distance functions used therein, namely the Euclidean and cosine distance measures and their respective derived graph kernels, is provided. Theoretically, we provide a formulation of our efforts and demonstrate how both the Euclidean and cosine distance induce space and discuss the difference from geometric perspectives. The aforementioned approach is likewise empirically evaluated using a million-plus patient subset of a life-spanning, real-world, electronic health record database. Diseases are characterized as either short in duration or chronic and either common, hence balanced data, or relatively rare, hence imbalanced. Empirically, the system accurately predicted the efficacy of prescriptions for both balanced and imbalanced and short-term and chronic diseases, with at least one of the measures used being statistically significantly superior to conventional prediction methods. Succinctly, for short-term, balanced diseases, the Euclidean and cosine measures were generally statistically equivalent. For short-term, imbalanced diseases however, the Euclidean measure was superior to the cosine measure, at times and not infrequently, statistically significantly so. For chronic, balanced diseases, Euclidean was slightly superior to the cosine measure, but they were statistically equivalent. In contrast, for chronic, imbalanced diseases, the cosine measure was consistently statistically significantly superior to the Euclidean measure. These findings indicate the need for both measures depending on the use case. Our empirical findings match our theoretical underpinnings.

How to Cite this Article:
Der-Chen Chang, Ophir Frieder, Chi-Feng Hung, Hao-Ren Yao, The analysis from nonlinear distance metric to kernel-based prescription prediction system, J. Nonlinear Var. Anal. 5 (2021), 179-199.