AI Gabriel Bueno Pimentel Borges

Gabriel Bueno Pimentel Borges

Nominated Award:
Best Application of AI in a Student Project

Website of Company (or Linkedin profile of Person):
https://www.linkedin.com/in/gabriel-bueno-pimentel-borges/

 

I have just finished the course Master of Science in Data Analytics in CCT College Dublin and my dissertation dived in the computer vision studies applied to Augmentative and Alternative Communication (AAC) in order to create a classification tool to classify 20 signs of Lámh language, the key word signing (KWS) language from Ireland.

AAC is an area of research of speech-language that supplements and creates alternatives of communication for those who cannot communicate effectively through conventional verbal language. An alternative to support people that face difficulties in their communication is key word signing (KWS), which is communication through manual signing alongside the spoken word incorporating features from sign language, being Lámh the KWS language practiced in Ireland. Lámh vocabulary is made of 580 different signs categorised as actions, modifiers, objects, people and social words, and they are originally from the Irish Sign Language. Although this type of communication enhancement is important to be established in a one-to-one relation between the client and the speech therapist, the communication partners (CP) play an important role by bringing key word signing to the natural setting of the individual.

A solution focused on improving the learning methods specifically for Lámh language communication partners may be proposed considering the fragility of this field according to the authors presented in the literature review. Such a solution would not extinguish the practicality and the portability that the literature highlights as the main advantages of KWS languages, once the assisting tool would not be introduced in the direct relation between the user and their communication partner.

Instead, a data analytics supporting alternative could be applied at the preparation of the communication partners, in a stage of the learning process that the literature classifies as “Controlled Practice and Feedback”. The latter consists in a controlled environment with only the communication partner, the Lámh user and an instructor.

In this context, if a Deep Neural Network model is able to classify Lámh signs in real time, this model can be used in the “Controlled Practice and Feedback” stage, giving the communication partner the possibility to train the learnt signs more often. Hence, the CP acquires confidence and knowledge to be able to use the language in a natural environment, directly with the AAC/Lámh user.

Therefore, the proposal of a solution focused on facilitating the learning methods for Lámh communication partners using Deep Learning for sign recognition would promote more democratic, accessible and playful learning opportunities for those involved in the Lámh user’s life. Furthermore, the consistent implementation and use of this communication support can also provide a wide variety of benefits to one’s inclusion in society.

Reason for Nomination:

This study has proposed an alternative to promote the learning and enhancement of Lámh language for communication partners that support current users by creating a real time detection tool to recognise 20 chosen Lámh signs based on existing studies in the field. The signs are ‘to play’, ‘to look’, ‘to sit’, ‘to go’, ‘you’, ‘good’, ‘what?’, ‘time’, ‘thank you’, ‘book’, ‘hello’, ‘to want’, ‘I/me’, ‘girl’, ‘box’, ‘to show’, ‘table’, ‘jigsaw’, ‘game’, ‘please’. The choice of this words was based on the study carried out by Frizelle and Lyons (2022): “The development of a core key word signing vocabulary (Lámh) to facilitate communication with children with down syndrome in the first year of mainstream primary school in Ireland”.

This implementation was carried out by generating primary data composed by MediaPipe landmark numpy arrays of 40 frames and 45 repetitions per sign. The signs were captured with the assistance of a volunteer that works as an aim support worker. The Neural Networks were built using the Python library Keras and the applied SVM models were built with the library sklearn. The real time detection was carried out by integrating the mentioned elements with the library OpenCV. Neural Networks with different architectures with Long Short-Term Memory (LSTM) and 1D Convolutional Neural Network (CNN) were compared with SVM classifications applied with cross-validations to achieve the optimal hyperparameters in order to determine the most appropriate model.The final chosen model after the assessment of the training and testing accuracy and loss was the two 1-D CNN layers with 32 and 64 nodes respectively, a dropout of 0.2 followed by two LSTM layers with 32 and 64 nodes respectively and a dense layer of 32 nodes. The training accuracy was 99.86%, the testing accuracy was 93.33%, the training loss was 0.0035 and the testing loss was 0.1791. This was the model which performed better in a real-time detection environment, easily detecting 8 different Lámh signs and detecting other 6 with reservations.

For future work, some skeletal motion signs should be captured again and other data augmentation strategies should be adopted, like capturing hips and legs landmarks alongside the signs and explore the augmentation of the data by promoting offset measures of the landmark coordinates of the skeletons captured by MediaPipe. Once the corrections of the methodology achieve better real time results, works toward tool accessibility and user experience should be investigated in order to generate a Lámh language real-time detection tool that could potentially promote Lámh and become a learning alternative for communication partners.

In a social perspective, the initiative of this research by itself was a great contribution to the Lámh language sector in Ireland in terms of the promotion of technologies and alternatives that can assist the area. During the development of the project, the methodologies and objectives of the research were displayed and explained to the representatives of the language, special needs assistants and speech therapists in order to stimulate their participation. This intersection between the data analytics sector and the social care sector brought curiosity and widened the involved professionals’ awareness of how machine learning can contribute and develop unexpected solutions even to an AAC mechanism like Lámh, which is in essence an unassisted communication tool. Such integration and knowledge exchange between sectors was most certainly the best contribution of this research above all technical achievements that were fulfilled through the project.

All in all, this research was able to conciliate the data analytics background with the social concern by integrating the most relevant academic literature related to Lámh language and AAC with the state of the art in regards to sign language detection and classification in the machine learning researches of other sign languages and motion recognition. This intends to challenge the current computer vision studies in a positive way, specially the areas towards Alternative and Augmented Communication. The problem definition must ultimately be user oriented and, although an outstanding progress has been perceived in the existing machine learning models and solutions within the last two decades, very little attention is given to the integration of these advanced technologies with a greater variety of languages as there are several existing KWS languages, like Lámh, which contain an expressive number of users that could benefit from the machine learning advances, but at the moment there are no studies of proposed tools to help the promotion of these languages and the user experience. The research carried out by Alexanderson and Beskow (2015), “Towards Fully Automated Motion Capture of Signs–Development and Evaluation of a Key Word Signing Avatar”, must be mentioned as they promoted a methodology for Swedish KWS motion capture with a final intention to create a game avatar. In relation to researches that are more focused in the machine learning aspect, improving and outperforming the existing state of the art will always be an important validation and scientific pursuit, but the final intention of this research was to open the discussion and highlight the importance of putting the practical social benefits as the major priority of the computer vision studies and experiments.

Additional Information:

The novelty of this work is the application of machine learning functions and tools to classify Lámh language signs. The main contributions of this work are firstly the data acquisition methodology – since there is no existing database of Lámh signs for machine learning classification, after the best Machine Learning models are chosen, they were evaluated in actual real time detection  applications – and secondly the comparison of Convolution Neural Network, Long Short Term Memory and Support Vector Machine for the classification of these signs, defining the most appropriate method for the proposed context.

An academic poster briefly describing the study and the final code with the Neural Network weights are attached to this application.

The Lámh representatives participated and assisted the project by promoting fruitful discussions and suggesting the most relevant literature in the field.

Files:

https://aiawards.ie/wp-content/uploads/ninja-forms/4/The-use-of-deep-learning-solutions-to-develop-a-practice-tool-to-support-L%C3%A1mh-language-for-communication-partners.pdf