AI Tyndall National Institute

Tyndall National Institute

Nominated Award: Best Application of AI in an Academic Research Body

Website of Company: https://www.tyndall.ie

Tyndall is one of Europe ’s leading research centres in integrated ICT hardware and systems, and the largest facility of its type in Ireland. With a network of industry partners and customers worldwide, Tyndall as a partnership between UCC, the Science Foundation of Ireland, and the Department of Enterprise Trade and Employment, generates 85% of its income each year from competitively won contracts. Tyndall has secured over €57m to date in direct funding from EU Programmes, such as Horizon 2020 and European Regional Development Funds.

Tyndall’s mission is to affect change globally, based on the excellence of its research, its state-of-the-art facilities, and its people. Tyndall launched its 2025 strategy, setting out an ambitious roadmap to build on almost 40 years of success in ICT research. The strategy represents a shift in Tyndall’s focus towards addressing the world ’s major societal challenges including energy and climate change, c lean water, healthcare, disease prevention, and gender equality using deep- tech R&D.

“Tyndall 2025” provides for a growth strategy which will be driven by collaborating with global leaders, providing a platform to pursue new fields of research, and make impactful contributions to global projects. It will contribute to advances in technology that shape the future of a sustainable world and nurture the minds of the leaders who will drive those advances. Tyndall is ideally positioned to be a leader in deep-tech – an important and expanding field of technology based on tangible engineering innovations and scientific advances with applications in AI, virtual reality, drones, self-driving cars, etc.. Deep-tech rese arch is at the heart of a successful Europe, and the team at Tyndall is behind some of its most advanced research. The deep-tech being developed at Tyndall will have a huge impact in creating solutions in the areas of health and well-being, the energy crisis, a greener sustainable society, smart agriculture and transport.

The Human-Centric System Cluster in Tyndall , in particular, has the vision to enable sustainable well-being for healthy populations by developing next-generation human-centric wearable systems by leveraging the core technology platform capabilities available in body-centric communication, human-computer interaction, flexible electronics, and AI.

The technological advances to build such wearables, the unique role of underlying AI, and how these can transform our everyday lives are the main research focus of Dr. Salvatore Tedesco’s team. For example, wearable sensors combined with AI can support real-time patient monito ring during lower-limbs rehabilitation, or athletes during sport activities .

Likewise, applying AI to data collected from wearable sensors may detect indications of failing health and the onset of potential health problems in the elderly population. Salvatore is the Team Lead for Wearable AI for Health and Wellbeing in Tyndall’s Human-Centric System Cluster . The Wearable-AI team led by Dr. Tedesco focuses on AI-powered on-body wearable technology for healthcare & well-being applications, going from fundamental to industry-oriented research, and the ultimate view of the team is to drive the vision for the ubiquitous adoption of wearables in healthcare and fitness accessible to society at large.

Reason for Nomination:

In 2010, a UK study showed that 15% of the European population aged 65+ years consumed 60% of healthcare resources. The long-term care expenditure is expected to rise by 315% by 205 1, if current approaches to geriatric care remain unchanged. Europe is home to the world’s most elderly population. By 2060, 155 million Euro peans (30% of total
population) will be aged 65+.

Subsequently, some researchers are focused on extending the human lifecycle duration while minimizing the overall associated healthcare costs. One research focus area is on the development of predictive tools to analyze individuals’ risk-of-death. As aging is not a standard phenomenon, disease diagnosis and mortality predictions varies significantly even amongst same age individuals.

Traditionally, prognostic indices used for mortality prediction have assumed a significant importance for personalized risk management (i.e. identifying patients at high risk-of-death) to ensure effective healthcare services to patients. However, those scores are usually defined and applied to specific cohorts, e.g. subjects in acute care or with specific conditions (diabetes).

These models were defined decades ago. They do not incorporate recent changes in patient-care characteristics and healthcare de livery models and outcomes (i.e., mean age, number and severity of chronic diseases presented). ML mortality risk modelling has huge potential, but limited research has been undertaken to date. Salvatore’s tea m thus
investigated ML modelling for all-cause and cancer-related mortality prediction in an older adult cohort. Specifically, the study investigated the development of a ML model able to predict mortality with at least 2 to 7 years’ notice. The dataset used was provided by the “Healthy Ageing Initiative” conducted in Umeå, Sweden.

The initiative’s aim was to identify traditional and new risk factors for cardiovascular disorders, falls, and fractures among 70-year-olds in Umeå. For this work, the data collected between January 2013 and December 2017 was considered. The subjects’ status was monitored using population registers to know which patients deceased in the time between t heir data collection and the initiative end (31st December 2019). The initiative aimed to generate a dataset including various and heterogeneous features to evaluate all the possible aspects influencing older people’s daily life. The overall dataset consisted of 156 variables for 2291 participants. Only 92 subjects (~4%) died in the 2-7 year follow-up period, and of these, 50 (~2%) died from cancer-related conditions.

Imbalanced classification is a primary challenge in predictive modelling because of the severely skewed class distribution. In this case, traditional AI model s and evaluation metrics perform poorly as they generally assume a balanced class distribution. The imbalanced classification challenge is compounded by small dataset size, label noise, and data distribution. A positive class in the order of 2-4%, as in this case, represents a severely imbalanced dataset, which is specifically hard.

Standard ML models, coupled with over-/under-sampling, cost-sensitive learning, probability calibration, an d MonteCarlo-based data augmentation, did not achieve a predictive performance significantly better compared to a standard epidemiological Cox model – AUC: 0.642 vs AUC 0.702 (Cox). However, ensembles have shown promising results when dealing with imbalanced problems.

We overcame the challenge by developing a new ensemble model in this study. The model developed relies on pre-processing steps (e.g. normalization, missing entries handling), feature engineering (creating additional informative variables), feature selection, outlier removal, hyper-parameters’ tuning, synthetic samples creation, and final ensemble learning.

For the all-cause mortality scenario, if all variables from the starting dataset are considered before developing the model, the model achieves a predictive performance of AUC 0.88, an excellent result compared to the Cox model performance. However, if the analysis relies on a parsimonious set of the original features (only including demographics, anthropometrics, questionnaires, and physical activity data collected via a wearable device), AUC is only slightly lower (0.763). The parsimonious set of features has been chosen because they’re present variables easy-to-use and easy-to-collect from clinicians in the re al-world practice. In the cancer-related mortality scenario, a high AUC (0. 882) is again obtained in the case where all features are initially considered. Even more interestingly, with the parsimonious features set, the model performance is comparable (0.857).

The results obtained show that AI is a powerful tool which can be adopted for the accurate prediction of all-cause mortality in older adults in a 2-7 years predictive range. Importantly, this conclusion can be extended also for cancer-related mortality. The study has acknowledged that this could be achieved based on the information gathered with a limited number of simple tools (e.g. questionnaires and free-living wearable data). This work also highlights the contribution wearable technology can make in providing unbiased health markers in clinical research.

Additional Information:

The dataset adopted involves rich and diverse variables considering all aspects influencing older people’s daily life. The dataset include: anthropometry (gender, height,…), medication/medical history, lab analysis (cholesterol, triglycerides,…), questionnaires (de pression, smoking,…), examinations (gait/balance analysis, body segments’ fat and lean mass), 1-week free-living wearable data (steps taken, minutes of sedentary/light/moderate/vigorous activities,…). Notably, this dataset is proprietary, not publicly available due to ethical and national regulations, therefore the present analysis could not have been performed by any other research group.

For model development, the dataset was split into training, validation, test, hold-out sets. The hold-out was 30% of the whole dataset; the remaining 70% was split into 50%-2 5%-25% assigned to training, validation, test sets, respectively. The splitting was stratified so that the positive-to-negative cases proportion was the same in every set.

After data pre-processing, the training data was fed to the Forward Selection Component Analysis (FSCA) for feature selection. FSCA is successfully adopted to build interpretable robust anomaly detection systems. Isolation Forest was then applied on the training set to remove possible outliers. Henceforth, the algorithm separates positive and negative class samples in the training set.

The samples in Class 0 were split into N different chunks, while N copies of Class 1 samples were generated and each copy was assigned to a different chunk of Class 0 sample s, thus
generating N different subsets with each subset composed by the same Class 1 samples but different Class 0 samples. Then, the Random Balance algorithm was applied on each subset in order to generate a set randomly balanced between Class 1 and Class 0 samples. Indeed, each subset differs from the others as to a ratio between the number of original and synthetically generated samples (via SMOTE), increasing the diversity for the learning model.

Once each subset was properly balanced, its data was used to train a different AdaBoost classifier via a stratified 5-fold cross-validation. Consequently, N different AdaBoost classifiers have been used, each one trained on a different training subset. Each classifier’s hyper-parameters were tuned via the validation set to prevent over-fitting. The performance of each of the N classifiers was evaluated on the test set.

After training and individually evaluating the N classifiers, the whole model performance was evaluated on the hold-out set, with the predictions of each classifier weighted based on the accuracy computed in the test set.

The study innovated the technical/scientific space with a new ensemble model developed for a specifically challenging task, and likewise provides a societal benefit by showing that the variables selected for prediction (e.g., lean muscle mass, adipose tissue levels) are linked to all-cause and cancer-specific mortality. The model also showed that objective physical activity metrics from a wearable accelerometer worn 1-week at-home, along with demo graphics/questionnaires, predict subsequent cancer-specific mortality in older adults. Wearables have been recently used in oncology to predict clinical outcomes in patients undergoing treatments, but no studies have investigated how to predict cancer-related mortality in older people even several years before symptoms occur, until this study completed by Salvatore ’s team.

Services

Resources