Pengembangan dan Evaluasi Agen Virtual dengan Model Generasi Gestur berbasis Aturan Sederhana
Sari
ABSTRAK
Studi pengembangan gestur sebelumnya telah menyoroti manfaat pendekatan berbasis deep learning untuk menghasilkan gerakan yang mirip manusia, namun, pendekatan tersebut memerlukan dastaset besar dan komputasi intensif. Model milik penulis membedakan antara dialog pendek dan panjang, menghasilkan gerakan spesifik konteks untuk percapakan pendek (salam, perpisahan, persetujuan/tidak setuju) dan gerakan berbasis emosi untuk dialog yang lebih panjang (netral, bahagia, agresif). Penulis membandingkan kinerja sistem dengan ground truth gestures, random gestures, dan idling gestures, menggunakan metrik dari GENEA Challenge. Pendekatan ini bertujuan untuk memberikan alternatif yang lebih efisien dibandingkan model deep learning. Temuan penulis diharapkan dapat berkontribusi pada pengembangan generasi gestur yang menarik meningkatkan pemahaman pengguna dalam interaksi manusia-komputer.
Kata kunci: Generasi Gestur, Interaksi Manusia-Komputer, Ukuran Dialog
ABSTRACT
While previous gesture generation studies have highlighted the benefits of deep learning-based approaches for generating human-like gestures, these often require large datasets and intensive computation. Our model differentiates between short and long dialogues, generating context-specific gestures for short exchanges (e.g., greetings, farewells, agreement/disagreement) and emotionbased gestures for longer dialogues (neutral, happy, aggressive). We compare the system's performance against ground truth gestures, random gestures, and idling gestures using metrics from the GENEA Challenge. This approach aims to provide a more efficient alternative to deep learning models. Our findings are expected to contribute to the development of more engaging, responsive virtual assistants, improving user comprehension in human-computer interaction.
Keywords: Gesture Generation, Human-Computer Interaction, Dialogue Size
Kata Kunci
Teks Lengkap:
PDFReferensi
Agarwal, A. (2023, April 12). Unreal Engine and its Evolution | Extern Labs Inc. Extern Labs Blog | Delivering IT Innovation. Extern Labs.
Arnheim, R., & McNeill, D. (1994). Hand and Mind: What Gestures Reveal about Thought. Leonardo, 27(4), 358. https://doi.org/10.2307/1576015
Atmaja, B. T., & Sasou, A. (2022). Sentiment Analysis and Emotion Recognition from Speech Using Universal Speech Representations. Sensors, 22(17), 6369. https://doi.org/10.3390/s22176369
Calvaresi, D., Eggenschwiler, S., Mualla, Y., Schumacher, M., & Calbimonte, J.-P. (2023). Exploring agent-based chatbots: a systematic literature review. Journal of Ambient Intelligence and Humanized Computing, 14(8), 11207–11226. https://doi.org/10.1007/s12652-023-04626-5
Cassell, J. (2001). Embodied Conversational Agents: Representation and Intelligence in User Interfaces. AI Magazine, 22(4), 67. https://doi.org/10.1609/aimag.v22i4.1593
Cassell, J., & Vilhjálmsson, H. (1999). Fully Embodied Conver- sational Avatars: Making Communicative Behaviors Autonomous. Autonomous Agents and Multi-agent Systems,. Autonomous Agents and Multi-Agent Systems, 2(1), 45–64. https://doi.org/10.1023/A:1010027123541
Ferstl, Y., & McDonnell, R. (2018). Investigating the use of recurrent motion modelling for speech gesture generation. Proceedings of the 18th International Conference on Intelligent Virtual Agents, (pp. 93–98). https://doi.org/10.1145/3267851.3267898
Ferstl, Y., Neff, M., & McDonnell, R. (2019). Multi-objective adversarial gesture generation. Motion, Interaction and Games, (pp. 1–10). https://doi.org/10.1145/3359566.3360053
Ginosar, S., Bar, A., Kohavi, G., Chan, C., Owens, A., & Malik, J. (2019). Learning Individual Styles of Conversational Gesture. CoRR, abs/1906.04160. http://arxiv.org/abs/1906.04160
Gu, X., Yu, T., Huang, J., Wang, F., Zheng, X., Sun, M., Ye, Z., & Li, Q. (2023). Virtual-Agent-Based Language Learning: A Scoping Review of Journal Publications from 2012 to 2022. Sustainability, 15(18), 13479. https://doi.org/10.3390/su151813479
HOSTETTER, A. B., & ALIBALI, M. W. (2008). Visible embodiment: Gestures as simulated action. Psychonomic Bulletin & Review, 15(3), 495–514. https://doi.org/10.3758/PBR.15.3.495
Jonell, P., Yoon, Y., Wolfert, P., Kucherenko, T., & Henter, G. E. (2021). HEMVIP: Human Evaluation of Multiple Videos in Parallel. Proceedings of the 2021 International Conference on Multimodal Interaction, (pp. 707–711). https://doi.org/10.1145/3462244.3479957
Kendon, A. (1980). Gesticulation and Speech: Two Aspects of the Process of Utterance. In The Relationship of Verbal and Nonverbal Communication, (pp. 207–228). DE GRUYTER MOUTON. https://doi.org/10.1515/9783110813098.207
Kim, Y., & Baylor, A. L. (2016). Research-Based Design of Pedagogical Agent Roles: a Review, Progress, and Recommendations. International Journal of Artificial Intelligence in Education, 26(1), 160–169. https://doi.org/10.1007/s40593-015-0055-y
Kipp, M. (2004). Gesture Generation by Imitation: From Human Behavior to Computer Character Animation.
Kopp, S., Krenn, B., Marsella, S., Marshall, A. N., Pelachaud, C., Pirker, H., Thorisson, K. R., & Vilhjálmsson, H. (2006). Towards a Common Framework for Multimodal Generation: The Behavior Markup Language, (pp. 205–217). https://doi.org/10.1007/11821830_17
Kopp, S., & Wachsmuth, I. (2004). Synthesizing multimodal utterances for conversational agents. Computer Animation and Virtual Worlds, 15(1), 39–52. https://doi.org/10.1002/cav.6
Kramer, N. C., Rosenthal-von der Putten, A. M., & Hoffmann, L. (2015). Social Effects of Viritual and Robot Companions. In The Handbook of the Psychology of Communication Technology, (pp. 137-159). Wiley. https://doi.org/10.1002/9781118426456.ch6
Kucherenko, T., Hasegawa, D., Kaneko, N., Henter, G. E., & Kjellstrom, H. (2021). Moving Fast and Slow: Analysis of Representations and Post-Processing in Speech-Driven Automatic Gesture Generation. International Journal of Human–Computer Interaction, 37(14), 1300–1316. https://doi.org/10.1080/10447318.2021.1883883
Kucherenko, T., Wolfert, P., Yoon, Y., Viegas, C., Nikolov, T., Tsakov, M., & Henter, G. E. (2024). Evaluating Gesture Generation in a Large-scale Open Challenge: The GENEA Challenge 2022. ACM Transactions on Graphics, 43(3), 1–28. https://doi.org/10.1145/3656374
Martin, A. (2024). ElevenLabs Review 2024 — Pricing, Features, and Alternatives. Technopedia. https://www.techopedia.com/ai/elevenlabs-review
Merdivan, E., Singh, D., Hanke, S., Kropf, J., Holzinger, A., & Geist, M. (2020). Human Annotated Dialogues Dataset for Natural Conversational Agents. Applied Sciences, 10(3), 762. https://doi.org/10.3390/app10030762
Sadoughi, N., & Busso, C. (2018). Novel Realizations of Speech-Driven Head Movements with Generative Adversarial Networks. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (pp. 6169–6173). https://doi.org/10.1109/ICASSP.2018.8461967
Tipper, C. M., Signorini, G., & Grafton, S. T. (2015). Body language in the brain: constructing meaning from expressive movement. Frontiers in Human Neuroscience, 9. https://doi.org/10.3389/fnhum.2015.00450
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention Is All You Need. CoRR, abs/1706.03762. http://arxiv.org/abs/1706.03762
DOI: https://doi.org/10.26760/elkomika.v12i4.953
Refbacks
- Saat ini tidak ada refbacks.
_______________________________________________________________________________________________________________________
ISSN (cetak) : 2338-8323 | ISSN (elektronik) : 2459-9638
diterbitkan oleh :
Teknik Elektro Institut Teknologi Nasional Bandung
Alamat : Gedung 20 Jl. PHH. Mustofa 23 Bandung 40124
Kontak : Tel. 7272215 (ext. 206) Fax. 7202892
Surat Elektronik : jte.itenas@itenas.ac.id________________________________________________________________________________________________________________________
Statistik Pengunjung
Jurnal ini terlisensi oleh Creative Commons Attribution-ShareAlike 4.0 International License.