Vu Thi Huong, Ida Litzel, Thorsten Koch, Similarity-based fuzzy clustering scientific articles: Potentials and challenges from mathematical and computational perspectives

Full Text: PDF
DOI: 10.23952/jnva.10.2026.2.08

Volume 10, Issue 2, 1 April 2026, Pages 381-401

Abstract. Fuzzy clustering, which allows an article to belong to multiple clusters with soft membership degrees, plays a vital role in analyzing publication data. This problem can be formulated as a constrained optimization model, where the goal is to minimize the discrepancy between the similarity observed from data and the similarity derived from a predicted distribution. While this approach benefits from leveraging state-of-the-art optimization algorithms, tailoring them to work with real, massive databases like OpenAlex or Web of Science — which contain approximately 70 million articles and a billion citations — poses significant challenges. We analyze potentials and challenges of the approach from both mathematical and computational perspectives. Among other things, second-order optimality conditions are established, providing new theoretical insights, and practical solution methods are proposed by exploiting the problem’s structure. Specifically, we accelerate the gradient projection method using GPU-based parallel computing to efficiently handle large-scale data.

How to Cite this Article:
V.T. Huong, I. Litzel, T. Koch, Similarity-based fuzzy clustering scientific articles: Potentials and challenges from mathematical and computational perspectives, J. Nonlinear Var. Anal. 10 (2026), 381-401.