The Impact of Chunking Granularity on Hybrid GraphRAG Architecture Performance in Mitigating Hallucinations

YUSUP MIFTAHUDDIN, AFIN MAULANA, DIASH FIRDAUS

Sari


Abstrak

Pesatnya pertumbuhan literatur herbal memicu information overload yang menghambat ekstraksi data manual. Meskipun Large Language Models (LLMs) membantu otomasi, risiko halusinasi faktual pada domain medis tetap tinggi, sementara Retrieval-Augmented Generation (RAG) konvensional sering gagal menangkap hubungan relasional antar-entitas. Penelitian ini menerapkan Hybrid GraphRAG, menggabungkan pencarian vektor dan Knowledge Graph, untuk mengatasi kelemahan tersebut. Fokus utamanya adalah menguji dampak granularitas chunking (karakter, kata, kalimat) terhadap representasi pengetahuan, mengingat fragmentasi teks berisiko memutus konteks semantik. Hasil eksperimen menunjukkan bahwa chunking berbasis kalimat memberikan performa terbaik, menggandakan skor Correctness dan Recall dari 0,28 ke 0,56. Temuan ini menegaskan pentingnya menjaga keutuhan kalimat demi akurasi dan keterhubungan data dalam sistem informasi medis. 

Kata kunci: Hybrid GraphRAG, Knowledge Graph, Chunking, Tanaman Herbal

Abstract

The rapid growth of herbal medicine literature triggers an information overload that hinders manual data extraction. Although Large Language Models (LLMs) assist in automation, the risk of factual hallucination within the medical domain remains high, while conventional Retrieval-Augmented Generation (RAG) frequently fails to capture relational connections between entities. To address these limitations, this study implements a Hybrid GraphRAG architecture that integrates vector search and Knowledge Graphs. The primary focus is to evaluate the impact of chunking granularity (character, word, and sentence-level) on knowledge representation, considering that text fragmentation risks disrupting semantic context. Experimental results demonstrate that sentence-based chunking yields the best performance, doubling the Correctness and Recall scores from 0.28 to 0.56. These findings emphasize the importance of preserving sentence integrity for data accuracy and interconnectivity within medical information systems.

Keywords:Hybrid GraphRAG, Knowledge Graph, Chunking, Herbal Plants


Teks Lengkap:

PDF

Referensi


Ardiyanto, D., Triyono, A., Nisa, U., Fitriani, U., Astana, P. R., Novianto, F., & Zulkarnain, Z. (2021). The use of hyperuricemia herbs at “Hortus Medicus” herbal medicine clinic Tawangmangu. JKKI : Jurnal Kedokteran Dan Kesehatan Indonesia, 12(2), 158–165. https://doi.org/10.20885/JKKI.Vol12.Iss2.art9

Arozal, W., Louisa, M., & Soetikno, V. (2020). Selected Indonesian medicinal plants for the management of metabolic syndrome: Molecular basis and recent studies. Frontiers in cardiovascular medicine, 7, 82. https://doi.org/10.3389/fcvm.2020.00082

Barnett, S., Kurniawan, S., Thudumu, S., Brannelly, Z., & Abdelrazek, M. (2024, April). Seven failure points when engineering a retrieval augmented generation system. In Proceedings of the IEEE/ACM 3rd International Conference on AI Engineering-Software Engineering for AI (pp. 194-199). https://doi.org/10.1145/3644815.3644945

Fathir, A., Haikal, M., & Wahyudi, D. (2021). Ethnobotanical study of medicinal plants used for maintaining staminain Madura ethnic, East Java, Indonesia. Biodiversitas, 22(1), 386-392. https://doi.org/10.13057/biodiv/d220147

Firdaus, D., Sumardi, I., & Kulsum, Y. (2024). Integrating Retrieval-Augmented Generation with Large Language Model Mistral 7b for Indonesian Medical Herb. JISKA (Jurnal Informatika Sunan Kalijaga), 9(3), 230-243. https://doi.org/10.14421/jiska.2024.9.3.230-243

Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., ... & Wang, H. (2023). Retrieval-augmented generation for large language models: A survey. arXiv preprint arXiv:2312.10997, 2(1), 32. http://arxiv.org/abs/2312.10997

Geberemeskel, G. A., Debebe, Y. G., & Nguse, N. A. (2019). Antidiabetic effect of fenugreek seed powder solution (Trigonella foenum-graecum L.) on hyperlipidemia in diabetic patients. Journal of diabetes research, 2019(1), 8507453. https://doi.org/10.1155/2019/8507453

Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., ... & Fung, P. (2023). Survey of hallucination in natural language generation. ACM computing surveys, 55(12), 1-38. https://doi.org/10.1145/3571730

Kartini, K., Jayani, N. I. E., Octaviyanti, N. D., Krisnawan, A. H., & Avanti, C. (2019, December). Standardization of some Indonesian medicinal plants used in “Scientific Jamu”. In IOP Conference Series: Earth and Environmental Science (Vol. 391, No. 1, p. 012042). IOP Publishing. https://doi.org/10.1088/1755-1315/391/1/012042

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., ... & Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in neural information processing systems, 33, 9459-9474. http://arxiv.org/abs/2005.11401

Mardiansyah, M. (2016). Ethnobotanical study of herbal medicine in ranggawulung urban forest, subang district, west java, Indonesia. Biodiversitas Journal of Biological Diversity. https://doi.org/10.3390/ijerph17103376

Putri, L. S. E., Dasumiati, Kristiyanto, Mardiansyah, Malik, C., Leuvinadrie, L. P., & Mulyono, E. A. (2016). Ethnobotanical study of herbal medicine in Ranggawulung Urban Forest, Subang District, West Java, Indonesia. Biodiversitas, 17(1), 172–176. https://doi.org/10.13057/biodiv/d170125

Sarmah, B., Mehta, D., Hall, B., Rao, R., Patel, S., & Pasquali, S. (2024, November). Hybridrag: Integrating knowledge graphs and vector retrieval augmented generation for efficient information extraction. In Proceedings of the 5th ACM International Conference on AI in Finance (pp. 608-616). http://arxiv.org/abs/2408.04948

Sholikhah, E. N. (2016). Indonesian medicinal plants as sources of secondary metabolites for pharmaceutical industry. J Med Sci, 48(4), 226-239. https://doi.org/10.19106/jmedsci004804201606

Sianipar, E. A. (2021). The potential of Indonesian traditional herbal medicine as immunomodulatory agents: a review. International Journal of Pharmaceutical Sciences and Research, 12(10), 5229. https://doi.org/10.13040/IJPSR.0975-8232.12(10).5229-37

Sumarni, W., Sudarmin, S., & Sumarti, S. S. (2019, October). The scientification of jamu: A study of Indonesian’s traditional medicine. In Journal of Physics: Conference Series (Vol. 1321, No. 3, p. 032057). IOP Publishing. https://doi.org/10.1088/1742-6596/1321/3/032057

Woerdenbag, H. J., & Kayser, O. (2014). Jamu: Indonesian traditional herbal medicine towards rational phytopharmacological use. Journal of herbal medicine, 4(2), 51-73. https://doi.org/10.1016/j.hermed.2014.01.002

Zubiaga, A. (2024). Natural language processing in the era of large language models. Frontiers in artificial intelligence, 6, 1350306. https://doi.org/10.3389/frai.2023.1350306.




DOI: https://doi.org/10.26760/mindjournal.v11i1.88-101

Refbacks

  • Saat ini tidak ada refbacks.


____________________________________________________________

ISSN (Print): 2338-8323 | ISSN (Online): 2528-0902

Dipublikasikan oleh:
Program Studi Informatika, Institut Teknologi Nasional Bandung

Alamat:
Gedung 2 Informatika, Jl. PHH Mustofa No. 23, Bandung 40124, Indonesia

Kontak:
Telp: +62-22-7272215 (ext. 181) Fax: +62-22-7202892

Email: mind.journal@itenas.ac.id

______________________________

Statistik Pengunjung :

Flag Counter

  Web
Analytics Statistik Pengunjung

 Jurnal ini terlisensi oleh Creative Commons Attribution-ShareAlike 4.0 International License.

Creative Commons License