Sequence Clustering in Process Mining for Business Process Analysis Using K-Means
Sari
ABSTRAK
Proses Discovery merupakan teknik utama dalam proses mining yang bertujuan untuk menghasilkan sebuah model dari event log. Namun dalam implementasinya ditemukan masalah, karena banyak varian proses yang terdapat pada event log. Hal ini membuat hasil proses discovery sulit untuk dipahami. Penelitian ini di awali dengan mengelompokan event log menggunakan metode K-Means sebagai tahap pre-processing. Hasil dari tahap pre-processing ini kemudian di modelkan menggunakan teknik proses mining. Namun, pada saat metode K-Means ini di terapkan penentuan jumlah cluster yang optimal sangatlah penting. Kesalahan dalam menentukan nilai K dapat menurunkan nilai fitness dan precision dari model yang dihasilkan. Berdasarkan hasil pengujian pada data set issue tracking dengan jumlah case 1091 dan jumlah event 7924 Â yang terbagi ke dalam empat cluster nilai precision meningkat dari 0,49 menjadi 1 dan nilai fitness meningkat dari 0,34 menjadi kisaran 0,61-1 pada cluster 2, 3 dan 4.
 Kata kunci: K-Means, proses mining, event log, clustering, sequence clustering
ABSTRACT
Process Discovery as the main technique in the mining process aims to produce a model of an event log. However, in the implementation, there is a problem found, for a lot of process variants contained in the event log. This makes the results of the discovery process difficult to understand. This research begins by grouping event logs using the K-Means method as a pre-processing stage. The results of this pre-processing stage are then modeled using the process mining technique. However, determining the optimal number of clusters is crucial. Mistakes in determining the K value can reduce the fitness value and precision of the resulting model. Based on the test results on the issue tracking data set with the number of cases 1091 and the number of events 7924 which is divided into four clusters the precision value increased from 0.49 to 1 and the fitness value increased from 0.34 to 0.61-1 in clusters 2, 3 and 4.
Keywords: K-Means, process mining, event log, clustering, sequence clustering
Â
ÂTeks Lengkap:
PDF (English)Referensi
Bholowalia, P., & Kumar, A. (2014). EBK-Means: A Clustering Technique based on Elbow Method and K-Means in WSN. International Journal of Computer Applications, 105(9), 975–8887. https://doi.org/10.5120/18405-9674
Bolt, A., de Leoni, M., & van der Aalst, W. M. P. (2018). Process variant comparison: Using event logs to detect differences in behavior and business rules. Information Systems, 74, 53–66. https://doi.org/10.1016/j.is.2017.12.006
Cornell, D., & Sastry, S. (2015). Performance Comparison of K-Means and Expectation Maximization with Gaussian Msixture Models for Clustering. (May). Retrieved from https://dcornellresearch.org/2015/10/30/performance-comparison-of-k-means-and-expectation-maximization-with-gaussian-mixture-models-for-clustering/
Dinh, D.-T., Fujinami, T., & Huynh, V.-N. (2019). Estimating the Optimal Number of Clusters in Categorical Data Clustering by Silhouette Coefficient. Communications in Computer and Information Science.
Duling, D. R., & Gao, E. Y. (2016). Improve Your Business through Process Mining. 1–17.
Dumas, M., La Rosa, M., Mendling, J., & Reijers, H. A. (2013). Fundamentals of Business Process Management. In Quantitative Process Analysis. https://doi.org/10.1007/978-3-642-33143-5
Ferreira, D., Zacarias, M., Malheiros, M., & Ferreira, P. (n.d.). Approaching Process Mining with Sequence Clustering : Experiments and Findings. (1), 1–15.
Fukui, K. ichi, Okada, Y., Satoh, K., & Numao, M. (2019). Cluster sequence mining from event sequence data and its application to damage correlation analysis. Knowledge-Based Systems, 179, 136–144. https://doi.org/10.1016/j.knosys.2019.05.012
Gyu, Y., Soo, M., & Heo, J. (2014). ARTICLE ; BIOINFORMATICS Clustering performance comparison using K -means and expectation maximization algorithms. Biotechnology & Biotechnological Equipment, 28(1), 44–48. https://doi.org/10.1080/13102818.2014.949045
Jung, Y. G., Kang, M. S., & Heo, J. (2014). Clustering performance comparison using K-means and expectation maximization algorithms. Biotechnology and Biotechnological Equipment, 28(1), S44–S48. https://doi.org/10.1080/13102818.2014.949045
Kumar Singh, A., Mittal, S., Malhotra, P., & Srivastava, Y. V. (2020). Clustering Evaluation by Davies-Bouldin Index(DBI) in Cereal data using K-Means. 2020 Fourth International Conference on Computing Methodologies and Communication (ICCMC).
Le, R., Ku, H., & Jun, D. (2021). Sequence-based clustering applied to long-term credit risk assessment. Expert Systems with Applications, 165(August 2020), 113940. https://doi.org/10.1016/j.eswa.2020.113940
Liu, X., Alshangiti, M., Ding, C., & Yu, Q. (2018). Log sequence clustering for workflow mining in multi-workflow systems. Data and Knowledge Engineering, 117, 1–17. https://doi.org/10.1016/j.datak.2018.04.002
Lu, X., Fahland, D., & van der Aalst, W. M. P. (2015). Conformance checking based on partially ordered event data. Lecture Notes in Business Information Processing, 202, 75–88. https://doi.org/10.1007/978-3-319-15895-2_7
Mannhardt, F., de Leoni, M., Reijers, H. A., & van der Aalst, W. M. P. (2016). Balanced multi-perspective checking of process conformance. Computing, 98(4), 407–437. https://doi.org/10.1007/s00607-015-0441-1
Omar AlShathry. (2016). Journal of Computer Engineering & Information Technology Process Mining as a Business Process Discovery Technique. Journal of Computer Engineering & Information Technology.
Prasetyo, E. (2014). DATA MINING Mengolah Data Menjadi Informasi Menggunakan MATLAB.
Premchaiswadi, W., & Porouhan, P. (2015). Process modeling and bottleneck mining in online peer-review systems. SpringerPlus, 4(1). https://doi.org/10.1186/s40064-015-1183-4
Putu, N., Merliana, E., Studi, P., Teknik, M., Industri, F. T., & Jaya, U. A. (2014). Analisa Penentuan Jumlah Kluster Terbaik Pada Metode K-means Klustering. Prosiding Seminar Nasional Multidisiplin Ilmu Dan Call For Paper Unisbank, 978–979.
Rebuge, Ã., & Ferreira, D. R. (2012). Business process analysis in healthcare environments: A methodology based on process mining. Information Systems, 37(2), 99–116. https://doi.org/10.1016/j.is.2011.01.003
Rozinat, A., & van der Aalst, W. M. P. (2008). Conformance checking of processes based on monitoring real behavior. Information Systems, 33(1), 64–95. https://doi.org/10.1016/j.is.2007.07.001
Song, M., Günther, C. W., & Van Der Aalst, W. M. P. (2009). Trace clustering in process mining. Lecture Notes in Business Information Processing, 17 LNBIP, 109–120. https://doi.org/10.1007/978-3-642-00328-8_11
Sübakan, Y. C., Kurt, B., Cemgil, A. T., & Sankur, B. (2014). Probabilistic sequence clustering with spectral learning. Digital Signal Processing: A Review Journal, 29(1), 1–19. https://doi.org/10.1016/j.dsp.2014.02.014
Tax, N., Lu, X., Sidorova, N., Fahland, D., & van der Aalst, W. M. P. (2018). The imprecisions of precision measures in process mining. Information Processing Letters, 135, 1–8. https://doi.org/10.1016/j.ipl.2018.01.013
van der Aalst, W. M. P. (2016). Process Mining. In Process Mining Data Science in Action Second Edition (2nd ed., pp. 30–34). https://doi.org/10.1007/978-3-662-49851-4
Veiga, G. M., & Ferreira, D. R. (2010). Understanding spaghetti models with sequence clustering for ProM. Lecture Notes in Business Information Processing, 43 LNBIP, 92–103. https://doi.org/10.1007/978-3-642-12186-9_10
DOI: https://doi.org/10.26760/mindjournal.v6i1.16-30
Refbacks
- Saat ini tidak ada refbacks.
____________________________________________________________
ISSN (cetak) : 2338-8323 | ISSN (elektronik) : 2528-0902
diterbitkan oleh:
Informatika Institut Teknologi Nasional Bandung
Alamat : Gedung 2 Jl. PHH. Mustofa 23 Bandung 40124
Kontak : Tel. 7272215 (ext. 181)Â Fax. 7202892
Email : mind.journal@itenas.ac.id
____________________________________________________________
Statistik Pengunjung :
Jurnal ini terlisensi oleh Creative Commons Attribution-ShareAlike 4.0 International License.