Pengembangan dan Evaluasi Agen Virtual dengan Model Generasi Gestur berbasis Aturan Sederhana

MUHAMMAD FIRDAUS SYAWALUDIN LUBIS; MIKAIL FAUZAN ATHALLAH; CATUR APRIONO

doi:10.26760/elkomika.v12i4.953

Pengembangan dan Evaluasi Agen Virtual dengan Model Generasi Gestur berbasis Aturan Sederhana

MUHAMMAD FIRDAUS SYAWALUDIN LUBIS, MIKAIL FAUZAN ATHALLAH, CATUR APRIONO

Abstract

ABSTRAK

Studi pengembangan gestur sebelumnya telah menyoroti manfaat pendekatan berbasis deep learning untuk menghasilkan gerakan yang mirip manusia, namun, pendekatan tersebut memerlukan dastaset besar dan komputasi intensif. Model milik penulis membedakan antara dialog pendek dan panjang, menghasilkan gerakan spesifik konteks untuk percapakan pendek (salam, perpisahan, persetujuan/tidak setuju) dan gerakan berbasis emosi untuk dialog yang lebih panjang (netral, bahagia, agresif). Penulis membandingkan kinerja sistem dengan ground truth gestures, random gestures, dan idling gestures, menggunakan metrik dari GENEA Challenge. Pendekatan ini bertujuan untuk memberikan alternatif yang lebih efisien dibandingkan model deep learning. Temuan penulis diharapkan dapat berkontribusi pada pengembangan generasi gestur yang menarik meningkatkan pemahaman pengguna dalam interaksi manusia-komputer.

Kata kunci: Generasi Gestur, Interaksi Manusia-Komputer, Ukuran Dialog

ABSTRACT

While previous gesture generation studies have highlighted the benefits of deep learning-based approaches for generating human-like gestures, these often require large datasets and intensive computation. Our model differentiates between short and long dialogues, generating context-specific gestures for short exchanges (e.g., greetings, farewells, agreement/disagreement) and emotionbased gestures for longer dialogues (neutral, happy, aggressive). We compare the system's performance against ground truth gestures, random gestures, and idling gestures using metrics from the GENEA Challenge. This approach aims to provide a more efficient alternative to deep learning models. Our findings are expected to contribute to the development of more engaging, responsive virtual assistants, improving user comprehension in human-computer interaction.

Keywords: Gesture Generation, Human-Computer Interaction, Dialogue Size

Keywords

Generasi Gestur; Interaksi Manusia-Komputer; Ukuran Dialog

Full Text:

PDF (Bahasa Indonesia)

References

Agarwal, A. (2023, April 12). Unreal Engine and its Evolution | Extern Labs Inc. Extern Labs Blog | Delivering IT Innovation. Extern Labs.

Arnheim, R., & McNeill, D. (1994). Hand and Mind: What Gestures Reveal about Thought. Leonardo, 27(4), 358. https://doi.org/10.2307/1576015

Atmaja, B. T., & Sasou, A. (2022). Sentiment Analysis and Emotion Recognition from Speech Using Universal Speech Representations. Sensors, 22(17), 6369. https://doi.org/10.3390/s22176369

Calvaresi, D., Eggenschwiler, S., Mualla, Y., Schumacher, M., & Calbimonte, J.-P. (2023). Exploring agent-based chatbots: a systematic literature review. Journal of Ambient Intelligence and Humanized Computing, 14(8), 11207–11226. https://doi.org/10.1007/s12652-023-04626-5

Cassell, J. (2001). Embodied Conversational Agents: Representation and Intelligence in User Interfaces. AI Magazine, 22(4), 67. https://doi.org/10.1609/aimag.v22i4.1593

Cassell, J., & Vilhjálmsson, H. (1999). Fully Embodied Conver- sational Avatars: Making Communicative Behaviors Autonomous. Autonomous Agents and Multi-agent Systems,. Autonomous Agents and Multi-Agent Systems, 2(1), 45–64. https://doi.org/10.1023/A:1010027123541

Ferstl, Y., & McDonnell, R. (2018). Investigating the use of recurrent motion modelling for speech gesture generation. Proceedings of the 18th International Conference on Intelligent Virtual Agents, (pp. 93–98). https://doi.org/10.1145/3267851.3267898

Ferstl, Y., Neff, M., & McDonnell, R. (2019). Multi-objective adversarial gesture generation. Motion, Interaction and Games, (pp. 1–10). https://doi.org/10.1145/3359566.3360053

Ginosar, S., Bar, A., Kohavi, G., Chan, C., Owens, A., & Malik, J. (2019). Learning Individual Styles of Conversational Gesture. CoRR, abs/1906.04160. http://arxiv.org/abs/1906.04160

Gu, X., Yu, T., Huang, J., Wang, F., Zheng, X., Sun, M., Ye, Z., & Li, Q. (2023). Virtual-Agent-Based Language Learning: A Scoping Review of Journal Publications from 2012 to 2022. Sustainability, 15(18), 13479. https://doi.org/10.3390/su151813479

HOSTETTER, A. B., & ALIBALI, M. W. (2008). Visible embodiment: Gestures as simulated action. Psychonomic Bulletin & Review, 15(3), 495–514. https://doi.org/10.3758/PBR.15.3.495

Jonell, P., Yoon, Y., Wolfert, P., Kucherenko, T., & Henter, G. E. (2021). HEMVIP: Human Evaluation of Multiple Videos in Parallel. Proceedings of the 2021 International Conference on Multimodal Interaction, (pp. 707–711). https://doi.org/10.1145/3462244.3479957

Kendon, A. (1980). Gesticulation and Speech: Two Aspects of the Process of Utterance. In The Relationship of Verbal and Nonverbal Communication, (pp. 207–228). DE GRUYTER MOUTON. https://doi.org/10.1515/9783110813098.207

Kim, Y., & Baylor, A. L. (2016). Research-Based Design of Pedagogical Agent Roles: a Review, Progress, and Recommendations. International Journal of Artificial Intelligence in Education, 26(1), 160–169. https://doi.org/10.1007/s40593-015-0055-y

Kipp, M. (2004). Gesture Generation by Imitation: From Human Behavior to Computer Character Animation.

Kopp, S., Krenn, B., Marsella, S., Marshall, A. N., Pelachaud, C., Pirker, H., Thorisson, K. R., & Vilhjálmsson, H. (2006). Towards a Common Framework for Multimodal Generation: The Behavior Markup Language, (pp. 205–217). https://doi.org/10.1007/11821830_17

Kopp, S., & Wachsmuth, I. (2004). Synthesizing multimodal utterances for conversational agents. Computer Animation and Virtual Worlds, 15(1), 39–52. https://doi.org/10.1002/cav.6

Kramer, N. C., Rosenthal-von der Putten, A. M., & Hoffmann, L. (2015). Social Effects of Viritual and Robot Companions. In The Handbook of the Psychology of Communication Technology, (pp. 137-159). Wiley. https://doi.org/10.1002/9781118426456.ch6

Kucherenko, T., Hasegawa, D., Kaneko, N., Henter, G. E., & Kjellstrom, H. (2021). Moving Fast and Slow: Analysis of Representations and Post-Processing in Speech-Driven Automatic Gesture Generation. International Journal of Human–Computer Interaction, 37(14), 1300–1316. https://doi.org/10.1080/10447318.2021.1883883

Kucherenko, T., Wolfert, P., Yoon, Y., Viegas, C., Nikolov, T., Tsakov, M., & Henter, G. E. (2024). Evaluating Gesture Generation in a Large-scale Open Challenge: The GENEA Challenge 2022. ACM Transactions on Graphics, 43(3), 1–28. https://doi.org/10.1145/3656374

Martin, A. (2024). ElevenLabs Review 2024 — Pricing, Features, and Alternatives. Technopedia. https://www.techopedia.com/ai/elevenlabs-review

Merdivan, E., Singh, D., Hanke, S., Kropf, J., Holzinger, A., & Geist, M. (2020). Human Annotated Dialogues Dataset for Natural Conversational Agents. Applied Sciences, 10(3), 762. https://doi.org/10.3390/app10030762

Sadoughi, N., & Busso, C. (2018). Novel Realizations of Speech-Driven Head Movements with Generative Adversarial Networks. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (pp. 6169–6173). https://doi.org/10.1109/ICASSP.2018.8461967

Tipper, C. M., Signorini, G., & Grafton, S. T. (2015). Body language in the brain: constructing meaning from expressive movement. Frontiers in Human Neuroscience, 9. https://doi.org/10.3389/fnhum.2015.00450

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention Is All You Need. CoRR, abs/1706.03762. http://arxiv.org/abs/1706.03762

DOI: https://doi.org/10.26760/elkomika.v12i4.953

Refbacks

There are currently no refbacks.

_______________________________________________________________________________________________________________________

ISSN (print) : 2338-8323 | ISSN (electronic) : 2459-9638

Publisher:

Department of Electrical Engineering Institut Teknologi Nasional Bandung, Indonesia

Address: 20th Building Institut Teknologi Nasional Bandung PHH. Mustofa Street No. 23 Bandung 40124, Indonesia

Contact: +627272215 (ext. 206)

Email: jte.itenas@itenas.ac.id ________________________________________________________________________________________________________________________

Free counters!

Statistic Journal

Jurnal ini terlisensi oleh Creative Commons Attribution-ShareAlike 4.0 International License.

Username
Password
Remember me