The Impact of Chunking Granularity on Hybrid GraphRAG Architecture Performance in Mitigating Hallucinations
Sari
Pesatnya pertumbuhan literatur herbal memicu information overload yang menghambat ekstraksi data manual. Meskipun Large Language Models (LLMs) membantu otomasi, risiko halusinasi faktual pada domain medis tetap tinggi, sementara Retrieval-Augmented Generation (RAG) konvensional sering gagal menangkap hubungan relasional antar-entitas. Penelitian ini menerapkan Hybrid GraphRAG, menggabungkan pencarian vektor dan Knowledge Graph, untuk mengatasi kelemahan tersebut. Fokus utamanya adalah menguji dampak granularitas chunking (karakter, kata, kalimat) terhadap representasi pengetahuan, mengingat fragmentasi teks berisiko memutus konteks semantik. Hasil eksperimen menunjukkan bahwa chunking berbasis kalimat memberikan performa terbaik, menggandakan skor Correctness dan Recall dari 0,28 ke 0,56. Temuan ini menegaskan pentingnya menjaga keutuhan kalimat demi akurasi dan keterhubungan data dalam sistem informasi medis.
Kata kunci: Hybrid GraphRAG, Knowledge Graph, Chunking, Tanaman Herbal
AbstractThe rapid growth of herbal medicine literature triggers an information overload that hinders manual data extraction. Although Large Language Models (LLMs) assist in automation, the risk of factual hallucination within the medical domain remains high, while conventional Retrieval-Augmented Generation (RAG) frequently fails to capture relational connections between entities. To address these limitations, this study implements a Hybrid GraphRAG architecture that integrates vector search and Knowledge Graphs. The primary focus is to evaluate the impact of chunking granularity (character, word, and sentence-level) on knowledge representation, considering that text fragmentation risks disrupting semantic context. Experimental results demonstrate that sentence-based chunking yields the best performance, doubling the Correctness and Recall scores from 0.28 to 0.56. These findings emphasize the importance of preserving sentence integrity for data accuracy and interconnectivity within medical information systems.
Keywords:Hybrid GraphRAG, Knowledge Graph, Chunking, Herbal Plants
Teks Lengkap:
PDFReferensi
Ardiyanto, D., Triyono, A., Nisa, U., Fitriani, U., Astana, P. R., Novianto, F., & Zulkarnain, Z. (2021). The use of hyperuricemia herbs at “Hortus Medicus” herbal medicine clinic Tawangmangu. JKKI : Jurnal Kedokteran Dan Kesehatan Indonesia, 12(2), 158–165. https://doi.org/10.20885/JKKI.Vol12.Iss2.art9
Arozal, W., Louisa, M., & Soetikno, V. (2020). Selected Indonesian medicinal plants for the management of metabolic syndrome: Molecular basis and recent studies. Frontiers in cardiovascular medicine, 7, 82. https://doi.org/10.3389/fcvm.2020.00082
Barnett, S., Kurniawan, S., Thudumu, S., Brannelly, Z., & Abdelrazek, M. (2024, April). Seven failure points when engineering a retrieval augmented generation system. In Proceedings of the IEEE/ACM 3rd International Conference on AI Engineering-Software Engineering for AI (pp. 194-199). https://doi.org/10.1145/3644815.3644945
Fathir, A., Haikal, M., & Wahyudi, D. (2021). Ethnobotanical study of medicinal plants used for maintaining staminain Madura ethnic, East Java, Indonesia. Biodiversitas, 22(1), 386-392. https://doi.org/10.13057/biodiv/d220147
Firdaus, D., Sumardi, I., & Kulsum, Y. (2024). Integrating Retrieval-Augmented Generation with Large Language Model Mistral 7b for Indonesian Medical Herb. JISKA (Jurnal Informatika Sunan Kalijaga), 9(3), 230-243. https://doi.org/10.14421/jiska.2024.9.3.230-243
Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., ... & Wang, H. (2023). Retrieval-augmented generation for large language models: A survey. arXiv preprint arXiv:2312.10997, 2(1), 32. http://arxiv.org/abs/2312.10997
Geberemeskel, G. A., Debebe, Y. G., & Nguse, N. A. (2019). Antidiabetic effect of fenugreek seed powder solution (Trigonella foenum-graecum L.) on hyperlipidemia in diabetic patients. Journal of diabetes research, 2019(1), 8507453. https://doi.org/10.1155/2019/8507453
Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., ... & Fung, P. (2023). Survey of hallucination in natural language generation. ACM computing surveys, 55(12), 1-38. https://doi.org/10.1145/3571730
Kartini, K., Jayani, N. I. E., Octaviyanti, N. D., Krisnawan, A. H., & Avanti, C. (2019, December). Standardization of some Indonesian medicinal plants used in “Scientific Jamu”. In IOP Conference Series: Earth and Environmental Science (Vol. 391, No. 1, p. 012042). IOP Publishing. https://doi.org/10.1088/1755-1315/391/1/012042
Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., ... & Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in neural information processing systems, 33, 9459-9474. http://arxiv.org/abs/2005.11401
Mardiansyah, M. (2016). Ethnobotanical study of herbal medicine in ranggawulung urban forest, subang district, west java, Indonesia. Biodiversitas Journal of Biological Diversity. https://doi.org/10.3390/ijerph17103376
Putri, L. S. E., Dasumiati, Kristiyanto, Mardiansyah, Malik, C., Leuvinadrie, L. P., & Mulyono, E. A. (2016). Ethnobotanical study of herbal medicine in Ranggawulung Urban Forest, Subang District, West Java, Indonesia. Biodiversitas, 17(1), 172–176. https://doi.org/10.13057/biodiv/d170125
Sarmah, B., Mehta, D., Hall, B., Rao, R., Patel, S., & Pasquali, S. (2024, November). Hybridrag: Integrating knowledge graphs and vector retrieval augmented generation for efficient information extraction. In Proceedings of the 5th ACM International Conference on AI in Finance (pp. 608-616). http://arxiv.org/abs/2408.04948
Sholikhah, E. N. (2016). Indonesian medicinal plants as sources of secondary metabolites for pharmaceutical industry. J Med Sci, 48(4), 226-239. https://doi.org/10.19106/jmedsci004804201606
Sianipar, E. A. (2021). The potential of Indonesian traditional herbal medicine as immunomodulatory agents: a review. International Journal of Pharmaceutical Sciences and Research, 12(10), 5229. https://doi.org/10.13040/IJPSR.0975-8232.12(10).5229-37
Sumarni, W., Sudarmin, S., & Sumarti, S. S. (2019, October). The scientification of jamu: A study of Indonesian’s traditional medicine. In Journal of Physics: Conference Series (Vol. 1321, No. 3, p. 032057). IOP Publishing. https://doi.org/10.1088/1742-6596/1321/3/032057
Woerdenbag, H. J., & Kayser, O. (2014). Jamu: Indonesian traditional herbal medicine towards rational phytopharmacological use. Journal of herbal medicine, 4(2), 51-73. https://doi.org/10.1016/j.hermed.2014.01.002
Zubiaga, A. (2024). Natural language processing in the era of large language models. Frontiers in artificial intelligence, 6, 1350306. https://doi.org/10.3389/frai.2023.1350306.
DOI: https://doi.org/10.26760/mindjournal.v11i1.88-101
Refbacks
- Saat ini tidak ada refbacks.
____________________________________________________________
ISSN (Print): 2338-8323 | ISSN (Online): 2528-0902
Dipublikasikan oleh:
Program Studi Informatika, Institut Teknologi Nasional Bandung
Alamat:
Gedung 2 Informatika, Jl. PHH Mustofa No. 23, Bandung 40124, Indonesia
Kontak:
Telp: +62-22-7272215 (ext. 181) Fax: +62-22-7202892
Email: mind.journal@itenas.ac.id
______________________________
Statistik Pengunjung :
Jurnal ini terlisensi oleh Creative Commons Attribution-ShareAlike 4.0 International License.
1.png)



