Weak Label Prior Enhances Constrained Clustering Efficiency

Higher Education Press

Clustering is widely exploited in data mining. It has been proved that embedding weak label prior into clustering is effective to promote its performance. Previous researches mainly focus on only one type of prior. However, in many real scenarios, two kinds of weak label prior information, e.g., pairwise constraints and cluster ratio, are easily obtained or already available. How to incorporate them to improve clustering performance is important but rarely studied.

To deal with this problem, a research team led by Chenping Hou published their new research on 15 June 2024 in Frontiers of Computer Science co-published by Higher Education Press and Springer Nature.

The team proposed a constrained Clustering with Weak Label Prior (CWLP) to consider compound weak label prior in an integrated framework. Within the unified spectral clustering model, the pairwise constraints are employed as a regularizer in spectral embedding and label proportion is added as a constraint in spectral rotation. Except for the theoretical convergence and computational complexity analyses, the experimental evaluation illustrates the superiority of the proposed approach.

In the research, both pairwise constraints information and cluster ratio information are helpful in improving the confidence of the clustering problem. To establish a unified model by simultaneously integrating pairwise constraints information and cluster ratio information, which could effectively improve the clustering performance.

Specifically, the pairwise constraints information is utilized as a regularization term in the spectral clustering model. The cluster ratio is added as a constraint to the indicator matrix. To approximate a variant of the embedding matrix more precisely, we replace a cluster indicator matrix with a scaled cluster indicator matrix. Instead of fixing an initial similarity matrix in the integrated model, they learn a new similarity matrix that is more suitable for deriving the final clustering results. These ideas can help to reduce information loss and obtain a globally optimized clustering result. Extensive experiments on ten benchmark data sets clearly validate the effectiveness of the proposed method for constrained clustering with weak label prior.

In our future work, methods to decrease the computational complexity of the proposed method are worth studying, so that the computational efficiency can be increased even more and the improved method can be applied to large-scale datasets.

DOI: 10.1007/s11704-023-3355-7

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.