Leveraging LRFM Analysis and Synthetic Data for Customer Segmentation Using K-Means Clustering

Muhibuddin Muhibuddin; Erna  Budhiarti Nababan; Fahmi Fahmi

doi:10.59188/eduvest.v5i2.50263

Authors

Muhibuddin Data Science and Artificial Intelligence Program Universitas Sumatera Utara Medan, Indonesia
Erna Budhiarti Nababan Data Science and Artificial Intelligence Program Universitas Sumatera Utara Medan, Indonesia
Fahmi Fahmi Faculty of Electrical Engineering Universitas Sumatera Utara Medan, Indonesia

DOI:

https://doi.org/10.59188/eduvest.v5i2.50263

Keywords:

lrfm, timegan, kmeans , clustering, segmentation

Abstract

This research explores the use of synthetic data in Length Recency Frequency Monetary (LRFM) analysis and K-Means clustering for customer segmentation. It is challenging to access accurate and comprehensive customer data, this study generates synthetic data using Time-series Generative Adversarial Networks (TimeGAN) to supplement or replace original data. LRFM analysis is used to measure customer characteristics based on the dimensions of Length, Recency, Frequency, and Monetary, which are then applied to clustering using the K-Means algorithm. The quality of clustering is evaluated using the Silhouette Coefficient and Davies-Bouldin Index. The results show that the Silhouette Coefficient for synthetic data is 0.42, slightly higher compared to the original data which has a value of 0.41. Meanwhile, the Davies-Bouldin Index for synthetic data is 0.90, slightly higher than the original data which has a value of 0.89. This indicates that synthetic data can mimic the characteristics of real data without compromising the accuracy and quality of clustering. By combining synthetic data, LRFM analysis, and K-Means clustering, this research provides in-depth insights into customer segmentation. The findings are expected to help companies develop more effective marketing strategies, enhance customer retention, and optimize overall customer experience. This study asserts that synthetic data is a valid alternative to real data in customer analysis.

References

Gul, M., & Rehman, M. A. (2023). Big data: an optimized approach for cluster initialization. Journal of Big Data, 10(1), 120.

Hasan, Y. (2024). Pengukuran Silhouette Score dan Davies-Bouldin Index pada Hasil Cluster K-Means dan Dbscan. KAKIFIKOM (Kumpulan Artikel Karya Ilmiah Fakultas Ilmu Komputer), 60–74.

Ibrahim, M. R. K., & Tyasnurita, R. (2022). LRFM model analysis for customer segmentation using K-means clustering. 2022 International Conference on Electrical and Information Technology (IEIT), 383–391.

Jordon, J., Yoon, J., & Van Der Schaar, M. (2018). PATE-GAN: Generating synthetic data with differential privacy guarantees. International Conference on Learning Representations.

mahmoud Taher, N., Elzanfaly, D., & Salama, S. (2016). Investigation in customer value segmentation quality under different preprocessing types of RFM attributes. International Journal of Recent Contributions from Engineering, Science & IT (IJES), 4(4), 5–10.

Marisa, F., Ahmad, S. S. S., Yusof, Z. I. M., Hunaini, F., & Aziz, T. M. A. (2019). Segmentation model of customer lifetime value in small and medium enterprise (SMEs) using K-means clustering and LRFM model. International Journal of Integrated Engineering, 11(3).

McCrory, M., & Thomas, S. A. (2024). Cluster Metric Sensitivity to Irrelevant Features. ArXiv Preprint ArXiv:2402.12008.

Montenegro, M., Meiguins, A., Meiguins, B., & Morais, J. (2020). Improving the Clustering Algorithms Automatic Generation Process with Cluster Quality Indexes. International Conference on Computational Science and Its Applications, 1017–1031.

Ramponi, G., Protopapas, P., Brambilla, M., & Janssen, R. (2018). T-cgan: Conditional generative adversarial network for data augmentation in noisy time series with irregular sampling. ArXiv Preprint ArXiv:1811.08295.

Ros, F., Riad, R., & Guillaume, S. (2023). PDBI: A partitioning Davies-Bouldin index for clustering evaluation. Neurocomputing, 528, 178–199.

Serwah, A. M. A., KHAW, K. W. A. H., Yeng, C. S. P., & Alnoor, A. (2023). Customer analytics for online retailers using weighted k-means and RFM analysis. Data Analytics and Applied Mathematics (DAAM), 1–6.

Supangat, S., & Mulyani, Y. (2023). Customer Loyalty Analysis Using Recency, Frequency, Monetary (RFM) and K-means Cluster for Labuan Bajo Souvenirs in Online Store. Journal of Information Systems and Informatics, 5(1), 285–299.

Suraya, S., Sholeh, M., & Lestari, U. (2023). Evaluation of Data Clustering Accuracy using K-Means Algorithm. International Journal of Multidisciplinary Approach Research and Science, 2(01), 385–396.

Tomašev, N., & Radovanović, M. (2016). Clustering evaluation in high-dimensional data. In Unsupervised learning algorithms (pp. 71–107). Springer.

Yoon, J., Jarrett, D., & Van der Schaar, M. (2019). Time-series generative adversarial networks. Advances in Neural Information Processing Systems, 32.

Leveraging LRFM Analysis and Synthetic Data for Customer Segmentation Using K-Means Clustering

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

Current Issue

Information

Language

Browse