Leveraging LRFM Analysis and Synthetic Data for Customer Segmentation Using K-Means Clustering
DOI:
https://doi.org/10.59188/eduvest.v5i2.50263Keywords:
lrfm, timegan, kmeans , clustering, segmentationAbstract
This research explores the use of synthetic data in Length Recency Frequency Monetary (LRFM) analysis and K-Means clustering for customer segmentation. It is challenging to access accurate and comprehensive customer data, this study generates synthetic data using Time-series Generative Adversarial Networks (TimeGAN) to supplement or replace original data. LRFM analysis is used to measure customer characteristics based on the dimensions of Length, Recency, Frequency, and Monetary, which are then applied to clustering using the K-Means algorithm. The quality of clustering is evaluated using the Silhouette Coefficient and Davies-Bouldin Index. The results show that the Silhouette Coefficient for synthetic data is 0.42, slightly higher compared to the original data which has a value of 0.41. Meanwhile, the Davies-Bouldin Index for synthetic data is 0.90, slightly higher than the original data which has a value of 0.89. This indicates that synthetic data can mimic the characteristics of real data without compromising the accuracy and quality of clustering. By combining synthetic data, LRFM analysis, and K-Means clustering, this research provides in-depth insights into customer segmentation. The findings are expected to help companies develop more effective marketing strategies, enhance customer retention, and optimize overall customer experience. This study asserts that synthetic data is a valid alternative to real data in customer analysis.
References
Gul, M., & Rehman, M. A. (2023). Big data: an optimized approach for cluster initialization. Journal of Big Data, 10(1), 120.
Hasan, Y. (2024). Pengukuran Silhouette Score dan Davies-Bouldin Index pada Hasil Cluster K-Means dan Dbscan. KAKIFIKOM (Kumpulan Artikel Karya Ilmiah Fakultas Ilmu Komputer), 60–74.
Ibrahim, M. R. K., & Tyasnurita, R. (2022). LRFM model analysis for customer segmentation using K-means clustering. 2022 International Conference on Electrical and Information Technology (IEIT), 383–391.
Jordon, J., Yoon, J., & Van Der Schaar, M. (2018). PATE-GAN: Generating synthetic data with differential privacy guarantees. International Conference on Learning Representations.
mahmoud Taher, N., Elzanfaly, D., & Salama, S. (2016). Investigation in customer value segmentation quality under different preprocessing types of RFM attributes. International Journal of Recent Contributions from Engineering, Science & IT (IJES), 4(4), 5–10.
Marisa, F., Ahmad, S. S. S., Yusof, Z. I. M., Hunaini, F., & Aziz, T. M. A. (2019). Segmentation model of customer lifetime value in small and medium enterprise (SMEs) using K-means clustering and LRFM model. International Journal of Integrated Engineering, 11(3).
McCrory, M., & Thomas, S. A. (2024). Cluster Metric Sensitivity to Irrelevant Features. ArXiv Preprint ArXiv:2402.12008.
Montenegro, M., Meiguins, A., Meiguins, B., & Morais, J. (2020). Improving the Clustering Algorithms Automatic Generation Process with Cluster Quality Indexes. International Conference on Computational Science and Its Applications, 1017–1031.
Ramponi, G., Protopapas, P., Brambilla, M., & Janssen, R. (2018). T-cgan: Conditional generative adversarial network for data augmentation in noisy time series with irregular sampling. ArXiv Preprint ArXiv:1811.08295.
Ros, F., Riad, R., & Guillaume, S. (2023). PDBI: A partitioning Davies-Bouldin index for clustering evaluation. Neurocomputing, 528, 178–199.
Serwah, A. M. A., KHAW, K. W. A. H., Yeng, C. S. P., & Alnoor, A. (2023). Customer analytics for online retailers using weighted k-means and RFM analysis. Data Analytics and Applied Mathematics (DAAM), 1–6.
Supangat, S., & Mulyani, Y. (2023). Customer Loyalty Analysis Using Recency, Frequency, Monetary (RFM) and K-means Cluster for Labuan Bajo Souvenirs in Online Store. Journal of Information Systems and Informatics, 5(1), 285–299.
Suraya, S., Sholeh, M., & Lestari, U. (2023). Evaluation of Data Clustering Accuracy using K-Means Algorithm. International Journal of Multidisciplinary Approach Research and Science, 2(01), 385–396.
Tomašev, N., & Radovanović, M. (2016). Clustering evaluation in high-dimensional data. In Unsupervised learning algorithms (pp. 71–107). Springer.
Yoon, J., Jarrett, D., & Van der Schaar, M. (2019). Time-series generative adversarial networks. Advances in Neural Information Processing Systems, 32.
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Muhibuddin, Erna Budhiarti Nababan, Fahmi Fahmi

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.