Sequence Clustering in Process Mining for Business Process Analysis Using K-Means

NUR FITRIANTI FAHRUDIN

doi:10.26760/mindjournal.v6i1.16-30

Sequence Clustering in Process Mining for Business Process Analysis Using K-Means

NUR FITRIANTI FAHRUDIN

Sari

ABSTRAK

Proses Discovery merupakan teknik utama dalam proses mining yang bertujuan untuk menghasilkan sebuah model dari event log. Namun dalam implementasinya ditemukan masalah, karena banyak varian proses yang terdapat pada event log. Hal ini membuat hasil proses discovery sulit untuk dipahami. Penelitian ini di awali dengan mengelompokan event log menggunakan metode K-Means sebagai tahap pre-processing. Hasil dari tahap pre-processing ini kemudian di modelkan menggunakan teknik proses mining. Namun, pada saat metode K-Means ini di terapkan penentuan jumlah cluster yang optimal sangatlah penting. Kesalahan dalam menentukan nilai K dapat menurunkan nilai fitness dan precision dari model yang dihasilkan. Berdasarkan hasil pengujian pada data set issue tracking dengan jumlah case 1091 dan jumlah event 7924 Â yang terbagi ke dalam empat cluster nilai precision meningkat dari 0,49 menjadi 1 dan nilai fitness meningkat dari 0,34 menjadi kisaran 0,61-1 pada cluster 2, 3 dan 4.

Â Kata kunci: K-Means, proses mining, event log, clustering, sequence clustering

ABSTRACT

Process Discovery as the main technique in the mining process aims to produce a model of an event log. However, in the implementation, there is a problem found, for a lot of process variants contained in the event log. This makes the results of the discovery process difficult to understand. This research begins by grouping event logs using the K-Means method as a pre-processing stage. The results of this pre-processing stage are then modeled using the process mining technique. However, determining the optimal number of clusters is crucial. Mistakes in determining the K value can reduce the fitness value and precision of the resulting model. Based on the test results on the issue tracking data set with the number of cases 1091 and the number of events 7924 which is divided into four clusters the precision value increased from 0.49 to 1 and the fitness value increased from 0.34 to 0.61-1 in clusters 2, 3 and 4.

Keywords: K-Means, process mining, event log, clustering, sequence clustering

Teks Lengkap:

PDF (English)

Referensi

Bholowalia, P., & Kumar, A. (2014). EBK-Means: A Clustering Technique based on Elbow Method and K-Means in WSN. International Journal of Computer Applications, 105(9), 975â€“8887. https://doi.org/10.5120/18405-9674

Bolt, A., de Leoni, M., & van der Aalst, W. M. P. (2018). Process variant comparison: Using event logs to detect differences in behavior and business rules. Information Systems, 74, 53â€“66. https://doi.org/10.1016/j.is.2017.12.006

Cornell, D., & Sastry, S. (2015). Performance Comparison of K-Means and Expectation Maximization with Gaussian Msixture Models for Clustering. (May). Retrieved from https://dcornellresearch.org/2015/10/30/performance-comparison-of-k-means-and-expectation-maximization-with-gaussian-mixture-models-for-clustering/

Dinh, D.-T., Fujinami, T., & Huynh, V.-N. (2019). Estimating the Optimal Number of Clusters in Categorical Data Clustering by Silhouette Coefficient. Communications in Computer and Information Science.

Duling, D. R., & Gao, E. Y. (2016). Improve Your Business through Process Mining. 1â€“17.

Dumas, M., La Rosa, M., Mendling, J., & Reijers, H. A. (2013). Fundamentals of Business Process Management. In Quantitative Process Analysis. https://doi.org/10.1007/978-3-642-33143-5

Ferreira, D., Zacarias, M., Malheiros, M., & Ferreira, P. (n.d.). Approaching Process Mining with Sequence Clustering : Experiments and Findings. (1), 1â€“15.

Fukui, K. ichi, Okada, Y., Satoh, K., & Numao, M. (2019). Cluster sequence mining from event sequence data and its application to damage correlation analysis. Knowledge-Based Systems, 179, 136â€“144. https://doi.org/10.1016/j.knosys.2019.05.012

Gyu, Y., Soo, M., & Heo, J. (2014). ARTICLE ; BIOINFORMATICS Clustering performance comparison using K -means and expectation maximization algorithms. Biotechnology & Biotechnological Equipment, 28(1), 44â€“48. https://doi.org/10.1080/13102818.2014.949045

Jung, Y. G., Kang, M. S., & Heo, J. (2014). Clustering performance comparison using K-means and expectation maximization algorithms. Biotechnology and Biotechnological Equipment, 28(1), S44â€“S48. https://doi.org/10.1080/13102818.2014.949045

Kumar Singh, A., Mittal, S., Malhotra, P., & Srivastava, Y. V. (2020). Clustering Evaluation by Davies-Bouldin Index(DBI) in Cereal data using K-Means. 2020 Fourth International Conference on Computing Methodologies and Communication (ICCMC).

Le, R., Ku, H., & Jun, D. (2021). Sequence-based clustering applied to long-term credit risk assessment. Expert Systems with Applications, 165(August 2020), 113940. https://doi.org/10.1016/j.eswa.2020.113940

Liu, X., Alshangiti, M., Ding, C., & Yu, Q. (2018). Log sequence clustering for workflow mining in multi-workflow systems. Data and Knowledge Engineering, 117, 1â€“17. https://doi.org/10.1016/j.datak.2018.04.002

Lu, X., Fahland, D., & van der Aalst, W. M. P. (2015). Conformance checking based on partially ordered event data. Lecture Notes in Business Information Processing, 202, 75â€“88. https://doi.org/10.1007/978-3-319-15895-2_7

Mannhardt, F., de Leoni, M., Reijers, H. A., & van der Aalst, W. M. P. (2016). Balanced multi-perspective checking of process conformance. Computing, 98(4), 407â€“437. https://doi.org/10.1007/s00607-015-0441-1

Omar AlShathry. (2016). Journal of Computer Engineering & Information Technology Process Mining as a Business Process Discovery Technique. Journal of Computer Engineering & Information Technology.

Prasetyo, E. (2014). DATA MINING Mengolah Data Menjadi Informasi Menggunakan MATLAB.

Premchaiswadi, W., & Porouhan, P. (2015). Process modeling and bottleneck mining in online peer-review systems. SpringerPlus, 4(1). https://doi.org/10.1186/s40064-015-1183-4

Putu, N., Merliana, E., Studi, P., Teknik, M., Industri, F. T., & Jaya, U. A. (2014). Analisa Penentuan Jumlah Kluster Terbaik Pada Metode K-means Klustering. Prosiding Seminar Nasional Multidisiplin Ilmu Dan Call For Paper Unisbank, 978â€“979.

Rebuge, Ã., & Ferreira, D. R. (2012). Business process analysis in healthcare environments: A methodology based on process mining. Information Systems, 37(2), 99â€“116. https://doi.org/10.1016/j.is.2011.01.003

Rozinat, A., & van der Aalst, W. M. P. (2008). Conformance checking of processes based on monitoring real behavior. Information Systems, 33(1), 64â€“95. https://doi.org/10.1016/j.is.2007.07.001

Song, M., GÃ¼nther, C. W., & Van Der Aalst, W. M. P. (2009). Trace clustering in process mining. Lecture Notes in Business Information Processing, 17 LNBIP, 109â€“120. https://doi.org/10.1007/978-3-642-00328-8_11

SÃ¼bakan, Y. C., Kurt, B., Cemgil, A. T., & Sankur, B. (2014). Probabilistic sequence clustering with spectral learning. Digital Signal Processing: A Review Journal, 29(1), 1â€“19. https://doi.org/10.1016/j.dsp.2014.02.014

Tax, N., Lu, X., Sidorova, N., Fahland, D., & van der Aalst, W. M. P. (2018). The imprecisions of precision measures in process mining. Information Processing Letters, 135, 1â€“8. https://doi.org/10.1016/j.ipl.2018.01.013

van der Aalst, W. M. P. (2016). Process Mining. In Process Mining Data Science in Action Second Edition (2nd ed., pp. 30â€“34). https://doi.org/10.1007/978-3-662-49851-4

Veiga, G. M., & Ferreira, D. R. (2010). Understanding spaghetti models with sequence clustering for ProM. Lecture Notes in Business Information Processing, 43 LNBIP, 92â€“103. https://doi.org/10.1007/978-3-642-12186-9_10

DOI: https://doi.org/10.26760/mindjournal.v6i1.16-30