Programme

Monday, May 20, 2024

9:00-09:10 Welcome and Introduction

9:10-10:30 Session 1:  Corpus Building 

  • 9:10-9:30 (online) Italian-Ligurian Machine Translation in its Cultural Context (paper | slides)
    • Christopher R. Haberland, Stefano Lusito and Jean Maillard
  • 9:30-9:50 Labadain-30k+: A Monolingual Tetun and Document-Level Audited Dataset (paper | slides)
    • Gabriel de Jesus and Sérgio Nunes
  • 9:50-10:10  (online) Philippine Languages Database: A Multilingual Speech Corpora for Developing Systems for Low-Resource Languages (paper | slides)
    • Rhandley D. Cajote, Rowena Cristina L. Guevara, Michael Gringo Angelo R. Bayona and Crisron Rudolf G. Lucas
  • 10:10-10:30 (online) A Novel Corpus for Automated Sexism Identification on Social Media (paper | slides)
    • Lutfiye Seda Mut Altin and Horacio Saggion 

10:30-11:00 Coffee Break

11:00-12:00 Session 2:  Language Tools

  • 11:00-11:20 TELP — Text Extraction with Linguistic Patterns (paper | slides)
    • João Cordeiro, Purificação Moura Silvano, António Leal and Sebastião Pais
  • 11:20-11:40 (online) Improving Language Coverage on HeLI-OTS (paper | slides)
    • Tommi Jauhiainen and Krister Lindén
  • 11:40-12:00 (online) Nepal Script Text Recognition using CRNN CTC Architecture (paper | slides)
    • Swornim Nakarmi, Sarin Sthapit, Arya Shakya, Rajani Chulyadyo and Bal Krishna Bal

12:00-13:00 Session 3: Regional Languages of Europe 

  • 12:00-12:20 Managing Fine-grained Metadata for Text Bases in Extremely Low Resource Languages: the Cases of Two Regional Languages of France (paper | slides)
    • Marianne Vergez-Couret, Delphine Bernhard, Michael Nauge, Myriam Bras, Pablo Ruiz Fabo and Carole Werner
  • 12:20-12:40 NLP for Arbresh: How an Endangered Language Learns to Write in the 21st Century (paper | slides)
    • Giulio Cusenza and Çağrı Çöltekin
  • 12:40-13:00  CorpusArija: Building an Annotated Corpus with Variation in Occitan (paper | slides)
    • Clamença Poujade, Myriam Bras and Assaf Urieli

13:00-13:10 Online Poster Session I

  • Bidirectional English-Nepali Machine Translation(MT) System for Legal Domain (paper | poster)
    • Shabdapurush Poudel, Bal Krishna Bal and Praveen Acharya
  • Multilingual Self-supervised Visually Grounded Speech Models (paper | poster)
    • Huynh Phuong Thanh Nguyen and Sakriani Sakti

13:10-14:10 Lunch Break & Poster Session I

  • A Bit of a Problem: Measurement Disparities in Dataset Sizes Across Languages (paper | poster)
    • Catherine Arnett, Tyler A. Chang and Benjamin Bergen
  • Beyond Error Categories: A Contextual Approach of Evaluating Emerging Spell and Grammar Checkers (paper | poster)
    • Þórunn Arnardóttir, Svanhvít Lilja Ingólfsdóttir, Haukur Barri Símonarson, Hafsteinn Einarsson, Anton Karl Ingason and Vilhjálmur Þorsteinsson
  • Forget NLI, Use a Dictionary: Zero-Shot Topic Classification for Low-Resource Languages with Application to Luxembourgish (paper | poster)
    • Fred Philippy, Shohreh Haddadan and Siwen Guo
  • Mixat: A Data Set of Bilingual Emirati-English Speech (paper | poster)
    • Maryam Khalifa Al Ali and Hanan Aldarmaki
  • Resource Acquisition for Understudied Languages: Extracting Wordlists from Dictionaries for Computer-assisted Language Comparison (paper | poster)
    • Frederic Blum, Johannes Englisch, Alba Hermida Rodriguez, Rik van Gijn and Johann-Mattis List
  • Tracing Linguistic Heritage: Constructing a Somali-Italian Terminological Resource Through Explorers’ Notebooks and Contemporary Corpus Analysis (paper | poster)
    • Silvia Piccini, Giuliana Elizabeth Vilela Ruiz, Andrea Bellandi and Enrico Carniani
  • ViHealthNLI: A Dataset for Vietnamese Natural Language Inference in Healthcare (paper | poster)
    • Huyen Nguyen, Quyen The Ngo, Thanh-Ha Do and Tuan-Anh Hoang

14:10-14:50 Keynote Speech (online)

  • Co-creating a Road Map for Indigenous Language Digital Activism
    • Eddie Avila (Director of Global Voices

14:50-15:50 Session 4: Machine Translation 

  • 14:50-15:10  Investigating Neural Machine Translation for Low-Resource Languages: Using Bavarian as a Case Study (paper | slides)
    • Wan-hua Her and Udo Kruschwitz
  • 15:10-15:30  The First Parallel Corpus and Neural Machine Translation Model of Western Armenian and English
    • Ari Nubar Boyacıoğlu and Jan Niehues (paper | slides)
  • 15:30-15:50 (online) Robust Guidance for Unsupervised Data Selection: Capturing Perplexing Named Entities for Domain-Specific Machine Translation (paper | slides)
    • Seunghyun Ji, Hagai Raja Sinulingga and Darongsae Kwon

15:50-16:30 Coffee Break

16:30-17:50 Session 5:  Large Language Models

  • 16:30-16:50  Advancing Generative AI for Portuguese with Open Decoder Gervásio PT* (paper | slides)
    • Rodrigo Santos, João Ricardo Silva, Luís Gomes, João Rodrigues and António Branco
  • 16:50-17:10  Fostering the Ecosystem of Open Neural Encoders for Portuguese with Albertina PT* Family (paper | slides)
    • Rodrigo Santos, João Rodrigues, Luís Gomes, João Ricardo Silva, António Branco, Henrique Lopes Cardoso, Tomás Freitas Osório and Bernardo Leite
  • 17:10-17:30 (online) BERTbek: A Pretrained Language Model for Uzbek (paper | slides)
    • Elmurod Kuriyozov, David Vilares and Carlos Gómez-Rodríguez
  • 17:30-17:50 (online) Language Models on a Diet: Cost-Efficient Development of Encoders for Closely-Related Languages via Additional Pretraining (paper | slides)
    • Nikola Ljubešič, Vít Suchomel, Peter Rupnik, Taja Kuzman and Rik van Noord

Tuesday, May 21, 2024 

9:00-10:00 Session 6: Quality and Evaluation

  • 9:00-9:20  Man or Machine: Evaluating Spelling Error Detection in Danish Newspaper Corpora (paper | slides)
    • Eckhard Bick, Jonas Nygaard Blom, Marianne Rathje and Jørgen Schack
  • 9:20-9:40 (online) Unsupervised Outlier Detection for Language-Independent Text Quality Filtering (paper | slides)
    • Jón Daðason and Hrafn Loftsson
  • 9:40-10:00  (online) Evaluating Icelandic Sentiment Analysis Models Trained on Translated Data (paper | slides)
    • Ólafur A. Jóhannsson, Birkir H. Arndal, Eysteinn Ö. Jónsson, Stefán Ólafsson and Hrafn Loftsson

10:00-10:30 Session 7: Position Papers 

  • 10:00-10:15  Seeding Alignment Between Language Technology and Indigenous Methodologies: A Decolonizing Framework for Endangered Language Revitalization (paper | slides)
    • Craig John Carpenter, John Lyon, Miles Thorogood and Jeannette C. Armstrong
  • 10:15-10:30 Solving Failure Modes in the Creation of Trustworthy Language Technologies (paper | slides)
    • Gianna Leoni, Lee Steven, Tūreiti Keith, Keoni Mahelona, Peter-Lucas Jones and Suzanne Duncan

10:30-11:00 Coffee Break

11:00-11:40 Keynote Speech

  • Massively Multilingual Language Technologies
    • Jean Maillard (AI Researcher (FAIR team) at META)  

11:40-12:40 Session 8: Language Resources 

  • 11:40-12:00  Uncovering Social Changes of the Basque Speaking Twitter Community During COVID-19 Pandemic (paper | slides)
    • Joseba Fernandez de Landa, Iker García-Ferrero, Ander Salaberria and Jon Ander Campos
  • 12:00-12:20 (online) BK3AT: Bangsamoro K-3 Children’s Speech Corpus for Developing Assessment Tools in the Bangsamoro Languages (paper | slides)
    • Kiel Gonzales, Jazzmin Maranan, Nissan Macale, Edsel Jedd Renovalles, Nicole Anne Palafox, Francis Paolo Santelices and Jose Marie Mendoza
  • 12:20-12:40 UzABSA: Aspect-Based Sentiment Analysis for the Uzbek Language (paper | slides)
    • Sanatbek Matlatipov, Jaloliddin Rajabov, Elmurod Kuriyozov and Mersaid Aripov 

12:40-14:00 Lunch Break & Poster Session II

  • Assessing Pre-Built Speaker Recognition Models for Endangered Language Data (paper | poster)
    • Gina-Anne Levow
  • Improving Legal Judgement Prediction in Romanian with Long Text Encoders (paper | poster)
    • Mihai Masala, Traian Rebedea and Horia Velicu
  • Inter-language Transfer Learning for Visual Speech Recognition toward Under-resourced Environments (paper | poster)
    • Fumiya Kondo and Satoshi Tamura
  • Residual Dropout: A Simple Approach to Improve Transformer’s Data Efficiency (paper | poster)
    • Carlos Escolano, Francesca De Luca Fornaciari and Maite Melero
  • Exploring Text Classification for Enhancing Digital Game-Based Language Learning for Irish (paper | poster)
    • Leona Mc Cahill, Thomas Baltazar, Sally Bruen, Liang Xu, Monica Ward, Elaine Uí Dhonnchadha and Jennifer Foster
  • UniDive: A COST Action on Universality, Diversity and Idiosyncrasy in Language Technology (paper | poster)
    • Agata Savary, Daniel Zeman, Verginica Barbu Mititelu, Anabela Barreiro, Olesea Caftanatov, Marie-Catherine de Marneffe, Kaja Dobrovoljc, Gülşen Eryiğit, Voula Giouli, Bruno Guillaume, Stella Markantonatou, Nurit Melnik, Joakim Nivre, Atul Kr. Ojha, Carlos Ramisch, Abigail Walsh, Beata Wójtowicz and Alina Wróblewska
  • Work in Progress: Text-to-speech on Edge Devices for te Reo Maōri and ‘Ōlelo Hawai’i (paper | poster)
    • Tūreiti Keith, Gianna Leoni, Keoni Mahelona, Hina Puamohala Kneubuhl, Stephanie Huriana Fong and Peter-Lucas Jones
  • Developing Infrastructure for Low-Resource Language Corpus Building (paper | poster)
    • Hedwig Sekeres, Wilbert Heeringa, Wietse de Vries, Oscar Yde Zwagers, Martijn Wieling and Goffe Th. Jensma

14:00-15:20 Session 9:  Speech Technologies 

  • 14:00-14:20 Tandem Long-Short Duration-based Modeling for Automatic Speech Recognition (paper | slides)
    • Dalai Mengke, Yan Meng and Péter Mihajlik
  • 14:20-14:40 Bi-dialectal ASR of Armenian from Naturalistic and Read Speech (paper | slides)
    • Arthur Malajyan, Victoria Khurshudyan, Karen Avetisyan,Hossep Dolatian and Damien Nouvel
  • 14:40-15:00 (online) Indonesian-English Code-Switching Speech Recognition Using the Machine Speech Chain Based Semi-Supervised Learning (paper | slides)
    • Rais Vaza Man Tazakka, Dessi Lestari, Ayu Purwarianti, Dipta Tanaya, Kurniawati Azizah and Sakriani Sakti
  • 15:00-15:20 (online) Improving Noisy Student Training for Low-resource Languages in End-to-End ASR Using CycleGAN and Inter-domain Losses (paper | slides)
    • Chia-Yu Li and Ngoc Thang Vu 

15:20-16:00 Panel Discussion

  • “In a post-ChatGPT world, what are the Challenges and Opportunities for Under-resourced Languages?”

16:00-16:30 Coffee Break

16:30-16:40 Online Poster Session II

  • PersianEmo: Enhancing Farsi-Dari Emotion Analysis with a Hybrid Transformer and Recurrent Neural Network Model (paper | poster)
    • Mohammad Ali Hussiny, Mohammad Arif Payenda and Lilja Øvrelid
  • Why the Unexpected? Dissecting the Political and Economic Bias in Persian Small and Large Language Models (paper | poster)
    • Ehsan Barkhordar, Surendrabikram Thapa, Ashwarya Maratha and Usman Naseem

16:40-17:20 Session 10:  Data Scarcity-related Issues 

  • 16:40-17:00 Quantifying the Ethical Dilemma of Using Culturally Toxic Training Data in AI Tools for Indigenous Languages (paper | slides)
    • Pedro Henrique Domingues, Claudio Santos Pinhanez, Paulo Cavalin and Julio Nogima
  • 17:00-17:20 (online)  Prompting Towards Alleviating Code-Switched Data Scarcity in Under-Resourced Languages with GPT as a Pivot (paper | slides)
    • Michelle Terblanche, Kayode Olaleye and Vukosi Marivate

17:20 -17:30 Closing