REVOLUTIONIZING DIGITAL INCLUSION: A GPT-4 POWERED HYBRID EVALUATION OF INTERFACES
uma avaliação híbrida de interfaces potencializada pelo GPT-4
DOI:
https://doi.org/10.21728/p2p.2025v12n1e-7624Keywords:
GPT-4, GenderMag, Inclusives InterfacesAbstract
This article presents a hybrid approach to evaluating inclusive digital interfaces by combining the GenderMag method with the GPT-4 language model. The research explores Prompt Engineering techniques to identify inclusion barriers, such as confusing labels and lack of visual feedback, while maintaining the critical perspective of a human inspector. Applied to the Kahoot platform, using the “Abby” persona as a reference, the method demonstrated moderate convergence—measured by Cohen's Kappa—between the automated and traditional analyses, reinforcing GPT-4’s potential to offer useful insights and significantly reduce evaluation time. Even so, limitations such as the difficulty of capturing all aspects of direct navigation and the limit on images sent to the model highlight the importance of preserving the human specialist’s role in the inspection process.
Downloads
References
BISANTE, Alba; DATLA, Venkata Srikanth Varma; PANIZZI, Emanuele; TRASCIATTI, Gabriella; ZEPPIERI, Stefano. Enhancing Interface Design with AI: An Exploratory Study on a ChatGPT-4-Based Tool for Cognitive Walkthrough Inspired Evaluations. In: International Conference on Advanced Visual Interfaces (AVI 2024), Arenzano, Genoa, Itália, 2024. DOI: 10.1145/3656650.3656676. Disponível em: https://doi.org/10.1145/3656650.3656676. Acesso em: 02 fev. 2025.
BROWN, Tom B.; MANN, Benjamin; RYDER, Nick; SUBBIAH, Melanie; et al. Language Models are Few-Shot Learners. ArXiv preprint arXiv:2005.14165, 2020. Disponível em: https://arxiv.org/abs/2005.14165. Acesso em: 25 mar. 2025.
BURNETT, Margaret; STUMPF, Simone; MACBETH, Jacob; MAKRI, Stephann; BECKWITH, Laura; KWAN, Irene; PETERS, Anneliese; JERNIGAN, Wendy. GenderMag: A Method for Evaluating Software’s Gender Inclusiveness. Interacting with Computers, v. 28, n. 6, p. 760-787, 2016. DOI: 10.1093/iwc/iwv046. Disponível em: https://doi.org/10.1093/iwc/iwv046. Acesso em: 01 mar. 2025.
CHATTERJEE, Amreeta; GUIZANI, Mariam; STEVENS, Catherine; EMARD, Jillian; MAY, Mary Evelyn; BURNETT, Margaret; AHMED, Iftekhar; SARMA, Anita. AID: An automated detector for gender-inclusivity bugs in OSS project pages. [S.l.]: [s.n.], [s.d.]. Acesso em: 02 fev. 2025.
COHEN, Jacob. A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement, v. XX, n. 1, p. xx-xx, 1960. Acesso em: 25 mar. 2025.
DESMOND, Michael; BRACHMAN, Michelle. Exploring Prompt Engineering Practices in the Enterprise. IBM Research, [s.l.], [s.d.]. Acesso em: 20 fev. 2025.
ISMAGAMBETOV, Zhenis. User Interface Testing Methods in Complex Software Systems. International Research Journal of Modernization in Engineering Technology and Science, v. 07, n. 02, fev. 2025. DOI: https://www.doi.org/10.56726/IRJMETS68058. Disponível em: https://www.doi.org/10.56726/IRJMETS68058. Acesso em: 20 mar. 2025.
KAHOOT. Kahoot. Disponível em: https://kahoot.com/. Acesso em: 20 fev. 2025.
KOVALEVA, Yekaterina; HAPPONEN, Ari; KINDSIKO, Eneli. Designing gender-neutral software engineering program: stereotypes, social pressure, and current attitudes based on recent studies. In: 2022 IEEE/ACM 3rd International Workshop on Gender Equality, Diversity and Inclusion in Software Engineering (GE@ICSE'22), Pittsburgh, PA, EUA, 20 de maio de 2022. Proceedings [...]. ACM, 2022. p. 43-50. DOI: 10.1145/3524501.3527600. Disponível em: https://doi.org/10.1145/3524501.3527600. Acesso em: 13 fev. 2025.
MAHATODY, Thomas; SAGAR, Mouldi; KOLSKI, Christophe. State of the Art on the Cognitive Walkthrough Method, Its Variants and Evolutions. International Journal of Human-Computer Interaction, v. 26, n. 8, p. 741-785, 2010. DOI: 10.1080/10447311003781409. Disponível em: https://www.researchgate.net/publication/220302514. Acesso em: 13 mar. 2025.
MENDEZ, Christopher; ANDERSON, Andrew; BHUVA, Brijesh; BURNETT, Margaret. The GenderMag Recorder’s Assistant. In: VLHCC18 – Showpiece, 2018. ©2018 IEEE. Acesso em: 20 mar. 2025.
NAVEED, Humza; KHAN, Asad Ullah; QIU, Shi; SAQIB, Muhammad; ANWAR, Saeed; USMAN, Muhammad; AKHTAR, Naveed; BARNES, Nick; MIAN, Ajmal. A Comprehensive Overview of Large Language Models. Preprint submetido à Elsevier, 2024. Acesso em: 21 mar. 2025.
NUNES, Inês; MOREIRA, Ana; ARAUJO, João. GIRE: Gender-Inclusive Requirements Engineering. Data & Knowledge Engineering, v. 143, p. 102108, 2023. DOI: 10.1016/j.datak.2022.102108. Disponível em: https://doi.org/10.1016/j.datak.2022.102108. Acesso em: 13 dez. 2024.
OPENAI. Creating a GPT. Disponível em: https://help.openai.com/en/articles/8554397-creating-a-gpt. Acesso em: 16 dez. 2024.
PINHEIRO, Rayane Marques. Usando GenderMag como técnica para avaliação de requisitos de inclusão em ferramentas de ensino online. 2021. 54 f. Trabalho de Conclusão de Curso (Bacharelado em Engenharia de Software) – Instituto de Computação, Universidade Federal do Amazonas, Manaus, 2021.
RATHJE, Steve; MIREA, Dan-Mircea; SUCHOLUTSKY, Ilia; MARJIEH, Raja; ROBERTSON, Claire E.; VAN BAVEL, Jay J. GPT is an effective tool for multilingual psychological text analysis. Proceedings of the National Academy of Sciences (PNAS), v. 121, n. 34, p. e2308950121, 2024. DOI: 10.1073/pnas.2308950121. Disponível em: https://doi.org/10.1073/pnas.2308950121. Acesso em: 01 mar. 2025.
VOGELSANG, Andreas. From Specifications to Prompts: On the Future of Generative LLMs in Requirements Engineering. IEEE, 2024. DOI: 10.1109/XXX.0000.0000000. Disponível em: https://doi.org/10.1109/XXX.0000.0000000. Acesso em: 13 dez. 2024.
WANG, Xinyuan; LI, Chenxi; WANG, Zhen; BAI, Fan; LUO, Haotian; ZHANG, Jiayou; JOJIC, Nebojsa; XING, Eric; HU, Zhiting. PromptAgent: Strategic Planning with Language Models Enables Expert-Level Prompt Optimization. [S.l.], [s.d.]. Disponível em: https://github.com/XinyuanWangCS/PromptAgent. Acesso em: 15 mar. 2025.
WANG, Zichong; CHU, Zhibo; DOAN, Thang Viet; NI, Shiwen; YANG, Min; ZHANG, Wenbin. History, Development, and Principles of Large Language Models—An Introductory Survey. [S.l.], [s.d.]. Acesso em: 13 dez. 2024.
WEI, Jason; WANG, Xuezhi; SCHUURMANS, Dale; BOSMA, Maarten; ICHTER, Brian; XIA, Fei; CHI, Ed H.; LE, Quoc V.; ZHOU, Denny. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. In: Proceedings of the 36th Conference on Neural Information Processing Systems (NeurIPS 2022), 2022. Disponível em: https://papers.nips.cc/paper/2022. Acesso em: 18 fev. 2025.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Christian Alex de Souza da Silva, Tayana Uchôa Conte, Wilson Silva Prata

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
The journal is published under the Creative Commons - Attribution - Noncommercial - Share Alike 3.0 Brazil.
The published work is considered collaboration and therefore the author will not receive any remuneration for this as well as anything will be charged in exchange for publication.
All texts are responsibility of the authors.
It’s allowed partial or total reproduction of the texts of the magazine since the source is cited.







