Comparison of Conversational Corpus and News Corpus on Gender Bias in Indonesian-English Transformer Model Translation
DOI:
https://doi.org/10.51519/journalisi.v6i4.918Keywords:
Machine Translation, Gender Bias, Transformer, Indonesia-EnglishAbstract
Gender bias in machine translation is a significant issue that affects text translation and gender perception, often leading to misunderstandings, such as the tendency to default to using male pronouns. For example, the word "dia" in Indonesian is often translated as "he" rather than "she," even when the context suggests otherwise, as seen in the case of President Megawati. Reducing this bias requires ongoing research, particularly in understanding how different corpora affect translation accuracy. Studies have shown that formal news corpora, which have less gender bias, produce different results compared to conversational corpora that are more informal and exhibit gender bias. This research uses a training dataset of the Indonesian-English conversational parallel corpus from Open Subtitles, which contains many gendered pronouns. Additionally, a news corpus from Tanzil, with fewer gendered words, was also used. These corpora were sourced from Opus, widely used by previous researchers. For the testing dataset, biographies of female presidents were used, which are often translated as masculine by popular machine translation systems by default. Each corpus was trained using a Transformer model, resulting in a translation model. Each sentence from the generated translations was then detected for gender, and compared with the gender of sentences from the test data to evaluate accuracy. The results showed that the accuracy of gender translation from the conversational corpus was 84%, while the news corpus achieved an accuracy of 8%.
Downloads
References
M. O. R. Prates, P. H. Avelar, and L. C. Lamb, “Assessing gender bias in machine translation: a case study with Google Translate,” Neural Comput Appl, vol. 32, no. 10, pp. 6363–6381, 2020, doi: 10.1007/s00521-019-04144-6.
B. Savoldi, M. Gaido, L. Bentivogli, M. Negri, and M. Turchi, “Gender bias in machine translation,” Trans Assoc Comput Linguist, vol. 9, pp. 845–874, 2021, doi: 10.1162/tacl_a_00401.
D. Bourguignon, V. Y. Yzerbyt, C. P. Teixeira, and G. Herman, “When does it hurt? Intergroup permeability moderates the link between discrimination and self-esteem.,” European Journal of Social Psychology, 45(1):3–9., 2015.
L. Zimman, E. Hazenberg, and M. Meyerhoff, “Trans peoples linguistic self-determination and the dialogic nature of identity,” in Linguistic, legal and everyday perspectives, pages 226–248., 2017.
M. J. Martindale and C. Park, “Fluency Over Adequacy : A Pilot Study in Measuring User Trust in Imperfect MT,” in Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track), pages 13–25, Boston, USA. Association for Machine Translation in the Americas, 2018.
I. Régner, C. Thinus-blanc, A. Netter, T. Schmader, and P. Huguet, “Committees with implicit biases promote fewer women when they do not believe gender bias exists,” Nature Human Behaviour, 3(11):1171–1179., 2019, doi: 10.1038/s41562-019-0686-3.
Nurtamin, H. Abbas, E. Iswary, and M. Hasyim, “Gender Bias In Machine Translation ( Google Translate ) From Indonesian To English,” Journal of Positive School Psychology, vol. 6, no. 4, pp. 9754–9761, 2022.
A. Faqih, “Penggunaan Google Translate Dalam Penerjemahan Teks Bahasa Arab Ke Dalam Bahasa Indonesia,” ALSUNIYAT: Jurnal Penelitian Bahasa, Sastra, dan Budaya Arab, vol. 1, no. 2, pp. 88–97, 2018, doi: 10.17509/alsuniyat.v1i2.24216.
J. Sheny et al., “The source-target domain mismatch problem in machine translation,” ArXiv, 2019.
A. Alqudsi, N. Omar, and K. Shaker, “A Hybrid Rules and Statistical Method for Arabic to English Machine Translation,” in 2nd International Conference on Computer Applications and Information Security, ICCAIS 2019, IEEE, 2019. doi: 10.1109/CAIS.2019.8769545.
M. Singh, R. Kumar, and I. Chana, “Improving Neural Machine Translation Using Rule-Based Machine Translation,” 2019 7th International Conference on Smart Computing and Communications, ICSCC 2019, pp. 1–5, 2019, doi: 10.1109/ICSCC.2019.8843685.
J. Zhang, M. Utiyama, E. Sumita, G. Neubig, and S. Nakamura, “Improving neural machine translation through phrase-based soft forced decoding,” Machine Translation, vol. 34, no. 1, pp. 21–39, 2020, doi: 10.1007/s10590-020-09244-y.
L. Li, C. Parra Escartín, A. Way, and Q. Liu, “Combining translation memories and statistical machine translation using sparse features,” Machine Translation, vol. 30, no. 3–4, pp. 183–202, 2016, doi: 10.1007/s10590-016-9187-6.
P. Koehn, Statistical Machine Translation, no. 2. 2017. doi: 10.5565/rev/tradumatica.203.
H. Cuong and K. Sima’an, A survey of domain adaptation for statistical machine translation, vol. 31, no. 4. Springer Netherlands, 2017. doi: 10.1007/s10590-018-9216-8.
M. A. Haji Sismat, “Neural and Statistical Machine Translation: A comparative error analysis,” in Conference: 17th International Conference on Translation, 2019.
Y. Zhang and G. Liu, “Paragraph-Parallel based Neural Machine Translation Model with Hierarchical Attention,” J Phys Conf Ser, vol. 1453, no. 1, 2020, doi: 10.1088/1742-6596/1453/1/012006.
J. E. Ortega, R. Castro Mamani, and K. Cho, “Neural machine translation with a polysynthetic low resource language,” Machine Translation, vol. 34, no. 4, pp. 325–346, 2021, doi: 10.1007/s10590-020-09255-9.
I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks,” Adv Neural Inf Process Syst, vol. 4, no. January, pp. 3104–3112, 2014.
B. Van Merri and C. S. Fellow, “Learning Phrase Representations using RNN Encoder – Decoder for Statistical Machine Translation,” in Proceedings ofthe 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014, pp. 1724–1734.
S. Garg, S. Peitz, U. Nallasamy, and M. Paulik, “Jointly learning to align and translate with transformer models,” EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference, pp. 4453–4462, 2019, doi: 10.18653/v1/d19-1453.
D. Britz, A. Goldie, M. T. Luong, and Q. V. Le, “Massive exploration of neural machine translation architectures,” EMNLP 2017 - Conference on Empirical Methods in Natural Language Processing, Proceedings, pp. 1442–1451, 2017, doi: 10.18653/v1/d17-1151.
Y. Liu, D. Zhang, L. Du, Z. Gu, J. Qiu, and Q. Tan, “A simple but effective way to improve the performance of RNN-Based encoder in neural machine translation task,” in Proceedings - 2019 IEEE 4th International Conference on Data Science in Cyberspace, DSC 2019, IEEE, 2019, pp. 416–421. doi: 10.1109/DSC.2019.00069.
X. Wang, C. Chen, and Z. Xing, “Domain-specific machine translation with recurrent neural network for software localization,” Empir Softw Eng, vol. 24, no. 6, pp. 3514–3545, 2019, doi: 10.1007/s10664-019-09702-z.
L. Corallo, G. Li, K. Reagan, A. Saxena, A. S. Varde, and B. Wilde, “A Framework for German-English Machine Translation with GRU RNN,” in CEUR Workshop Proceedings, 2022.
Downloads
Published
Issue
Section
License
Authors Declaration
- The Authors certify that they have read, understood, and agreed to the Journal of Information Systems and Informatics (JournalISI) submission guidelines, policies, and submission declaration. The submission has been prepared using the provided template.
- The Authors certify that all authors have approved the publication of this manuscript and that there is no conflict of interest.
- The Authors confirm that the manuscript is their original work, has not received prior publication, is not under consideration for publication elsewhere, and has not been previously published.
- The Authors confirm that all authors listed on the title page have contributed significantly to the work, have read the manuscript, attest to the validity and legitimacy of the data and its interpretation, and agree to its submission.
- The Authors confirm that the manuscript is not copied from or plagiarized from any other published work.
- The Authors declare that the manuscript will not be submitted for publication in any other journal or magazine until a decision is made by the journal editors.
- If the manuscript is finally accepted for publication, the Authors confirm that they will either proceed with publication immediately or withdraw the manuscript in accordance with the journal’s withdrawal policies.
- The Authors agree that, upon publication of the manuscript in this journal, they transfer copyright or assign exclusive rights to the publisher, including commercial rights














