İlknur Buçan Kırkbir1,2, Burçin Kurt2, Cavit Boz3, Murat Terzi4

1Department of Public Health Nursing, Karadeniz Technical University, Faculty of Heath Science, Trabzon, Türkiye
2Department of Biostatistics and Medical Informatics, Karadeniz Technical University, Institute of Medical Science, Trabzon, Türkiye
3Department of Neurology, Karadeniz Technical University Faculty of Medicine, Trabzon, Türkiye
4Department of Neurology, Ondokuz Mayıs University Faculty of Medicine, Samsun, Türkiye

Keywords: Feature selection, machine learning, multiple sclerosis.

Abstract

Objectives: This study aimed to determine important predictors of fifth-year Expanded Disability Status Scale (EDSS) scores in multiple sclerosis (MS) patients using machine learning.

Patients and methods: In this retrospective study, the XGBoost basic model was developed to predict five-year EDSS scores in 1,000 patients (317 males, 683 females; mean age: 43.4±10.9 years; range, 18 to 76 years) with MS between January 1999 and December 2020. Patients were categorized based on the initial symptoms of MS onset: brainstem symptoms, optic symptoms, spinal symptoms, or supratentorial symptoms. In the next stage, important predictors of fifth-year EDSS scores were determined and ranked by their importance using the SHAP (SHapley Additive exPlanations) algorithm, which is a machine learning method.

Results: For patients with optic symptoms at onset, second-year EDSS scores, age, and first-year pyramidal functions were identified as the most important variables, respectively. In contrast, for those with spinal symptoms at onset, second-year pyramidal functions, age, and second-year ambulation were important predictors. In the patients with brainstem symptoms at onset, age, first-year EDSS scores, and first-year bowel and bladder functions were determined as important variables. Additionally, for patients with supratentorial symptoms at onset, second-year pyramidal functions, second-year EDSS scores, and age were the top predictors.

Conclusion: The results provided valuable insights into predictors of fifth-year EDSS scores in patients with MS grouped by their initial symptoms. Our findings indicate that the ranking of importance of functional system evaluations varies among patients with MS based on their initial symptoms, with age as a significant predictor for all symptom groups.

Introduction

Multiple sclerosis (MS) is an immune-mediated chronic and inflammatory disease of the central nervous system, typically manifesting with initial episodes of relapses and transient neurological deficits. Furthermore, neurodegeneration and chronic axonal injury lead to progressive disability accumulation over time, varying in degree among the majority of patients with MS. Various lesions such as astrogliosis, demyelination, inflammatory infiltrates, and early axonal damage can be observed in the central nervous system in MS.[1] Although MS is more widespread in young adults, it may occur at any age. The disease progresses differently in each person. Nerves are damaged in all patients, but the symptoms may vary.[1] Typically, MS presents with brainstem syndromes, unilateral optic neuritis, sensory symptoms, or internuclear ophthalmoplegia that develop within a few days. Diagnosis is established by evaluating symptoms and signs, which are components of the 2017 McDonald criteria, along with radiological imaging results and laboratory findings, such as oligoclonal bands.[2] Clinically approved drug combinations are used as therapy.[3] Disability assessment of patients with MS includes the Kurtzke Functional Systems (KFS) scores, developed by John Kurtzke and widely implemented by neurologists. The KFS score measures the impact of demyelination on body systems, including systems related to the brain (pyramidal, cerebellar, brainstem, sensory, and cerebral), as well as bowel and bladder, visual, and motor.[4]

The Expanded Disability Status Scale (EDSS) is the most commonly used tool to evaluate disability, with scores on a scale ranging from 0 to 10, based on evaluating the functional systems of the central nervous system.[5] Objective evaluation of disability in patients with MS and identification of effective variables for long-term disability status can assist healthcare providers in treating and managing MS. Early identification of patients at higher risk of developing worse disability is crucial for their clinical management. There is increasing evidence supporting improved disability outcomes with early initiation of high-efficacy therapy. For this purpose, our study focused on selecting the most important variables to predict the fifth-year EDSS scores of patients with MS grouped according to initial symptoms using machine learning.

Material and Methods

The methodology of the retrospective research study is illustrated in Figure 1 and detailed in this chapter subheadings.

Data collection

The dataset was obtained from the Departments of Neurology at Karadeniz Technical University and Ondokuz Mayıs University, comprising patients recorded in the MS database between January 1999 and December 2020. Data entry typically occurred in real-time or closely approximated real-time during visits as part of routine clinical practice. The data entry portal was the MSBase database, and quality assurance procedures were followed. A series of automated procedures were implemented to identify any invalid or erroneous data entries. The data extracted from the registry in 2020 included information on patients with MS who were followed up for at least five years since their initial clinical MS diagnosis. Additionally, patients whose EDSS scores were calculated at the first (clinical diagnosis visit), second (after 24 months from the first visit), and fifth (after 60 months from the first visit) year visits were included in the study. Patients with any systemic disease and those whose EDSS scores were calculated within one month from the date of relapse were excluded from the study. The MS course was not taken into account.

Data preprocessing and statistics

Following data preprocessing and filtering stages, only 1,000 patients (317 males, 683 females; mean age: 43.4±10.9 years; range, 18 to 76 years) meeting the inclusion criteria were selected from a pool of 3,034 records for the study. Table 1 presents the variables used in the initial stage of the study along with the corresponding number of missing values. In studies using machine learning methods, it is preferred to work with as many samples as possible to positively impact the model's performance; therefore, deleting missing data is not preferred. To make the most of the available information, variables with missing data below the 30% threshold were imputed using the missing data imputation method,[6] while variables with missing data exceeding 30% (first-year KFS score ambulation, second-year impact on activities of daily living (ADL), first-year impact on ADL, and number of relapses) were excluded from the analysis. The classification and regression tree (CART) algorithm was used to impute the missing values of the dataset. The imputed dataset, which best represents the original data, was used to determine predictors of disability status in the fifth year. Subsequently, patients were categorized based on the initial symptoms of MS onset: brainstem symptoms, optic symptoms, spinal symptoms, or supratentorial symptoms. The primary outcome of the study was the fifth-year EDSS scores of patients.

Descriptive statistics for numerical variables were reported using either the mean ± standard deviation (SD) or median, along with the min and max values. Categorical variables were described using frequency and percentage. The Wilcoxon signed rank test was used for comparison of repeated measures. The statistical analysis of the data and determination of the fifth-year EDSS predictors was performed by open source Python programming language.

Determination of important predictors

Machine learning

The significant variables for the fifth-year EDSS measurement were ranked according to their importance levels by using a machine learning method. Machine learning is a technique based on learning from data.[6] The machine learning method has a different structure than the traditional programming structure, and it works using algorithms. It is an approach that provides benefits in terms of both time and speed compared to traditional statistical methods. Machine learning-based techniques have been successfully applied in various fields such as pattern recognition, computer vision, aerospace engineering, finance, entertainment, computational biology, and biomedical and medical applications.[7,8] In this study, the SHAP (SHapley Additive exPlanations) method based on the XGBoost machine learning algorithm was used. The XGBoost is an advanced implementation of the gradient boosting algorithm, which is a machine learning technique where the main idea is to combine many simple models.[9]

SHapley Additive exPlanation Feature Importance Method

The feature selection is a machine learning method of determining and removing redundant features from data. Similarly, the feature importance is a machine learning method and measure of the individual contribution of the dependent variable.[9] The SHAP method is a newly developed machine learning feature selection method. It is preferred because this method not only indicates the importance of each feature but also quantifies how each feature affects the dependent variable, both on a single sample level and on the whole dataset level.[10] The SHAP method can be used to explain the predictions of any machine learning model by calculating the contribution of each feature to the prediction and can determine the most important features and their influence.[11,12] The Python programming “shap” library was used for the analysis of the relative importance of the independent variables. The determination of the feature importance analysis steps was carried out in four steps. In the first step, we used the demographic data of patients with MS alongside clinical data gathered during first-year and second-year visits as input vectors. The second step involved the development of a customized XGBoost baseline model for forecasting the fifth-year EDSS scores.In the third step, the variables used as input vectors were ranked according to their importance levels using the SHAP algorithm. The SHAP values were determined to calculate the contribution of independent variables in predicting the model.[13] These values allocated the contribution of features towards the output of the model, which in this study, was the fifth-year EDSS results of patients with MS. The fourth step involved sorting input variables according to their importance on output and displaying them graphically.

Results

The mean ages of the optic, spinal, supratentorial, and brainstem symptoms subgroups were 42.2±10.3, 43.5±10.3, 44.6±10.8, and 41.9±11.3 years, respectively. Table 2 provides the mean EDSS score measured at the first-, second-, and fifth-year visits, along with descriptive statistics for other variables. Among patients with optic symptoms at onset, 75.8% were female, while this percentage was 66.7% for spinal symptoms at onset, 65.9% for supratentorial symptoms at onset, and 62.9% for brainstem symptoms at onset. Additionally, the dominant hand was consistently identified as the right hand across all groups (Table 3).


It was determined that none of the groups had a family history of MS, and treatment was initiated within the first two years after diagnosis. The most commonly used drug as the initial treatment for all initial symptom subgroups was interferon (Table 3). Additionally, when the Wilcoxon signed-rank test results for the comparison of the EDSS means were evaluated, while there was no significant difference between the first- and fifth-year EDSS mean scores of patients with optic, brainstem, and spinal symptoms at onset, it was concluded that there was a statistically significant (p<0.05) difference between the first- and fifth-year EDSS mean scores of patients with supratentorial symptoms at onset (Table 4).

The SHAP summary plots ranked variables based on their importance. The line of the plot was red if the independent variable increased the prediction of the dependent variable.

The graphs obtained using the SHAP method ranked important predictors of fifth-year EDSS scores in patient groups with optic, spinal, brainstem, and supratentorial symptoms at onset. The ranked importance of variables for patients with optic symptoms at onset is presented in Figure 2. The most important three variables for the optic symptoms subgroup were second-year EDSS scores, age, and first-year pyramidal functions (according to KFS score), respectively.

The three most significant variables for patients with spinal symptoms at onset were second-year pyramidal functions, age, and second-year KFS score ambulation, respectively (Figure 3). Second-year brainstem functions (according to KFS score) and second-year EDSS scores were not as crucial for predicting the fifth-year disability score compared to other variables.

In patients with brainstem symptoms at onset, the three most important variables were age, firstyear EDSS scores, and first-year bowel and bladder functions (according to KFS score), in that order (Figure 4).

In Figure 5, the three most important predictors of fifth-year EDSS scores in patients with supratentorial symptoms at onset were secondyear pyramidal functions (according to KFS score), second-year EDSS scores, and age.

When all these graphs were collectively evaluated, it was concluded that age was among the top three variables for predicting the fifth-year EDSS scores in all symptom groups.

Discussion

In this study, the predictors of the fifth-year EDSS measurements of patients with MS were ranked by their importance according to their symptoms at onset, and the statistical difference between the first and fifth-year EDSS measurements was also examined. Multiple sclerosis is generally more prevalent in females than in males, with patients typically diagnosed between the ages of 20 and 50 years.[14-16] The present study concluded that most of the patients, grouped by their onset symptoms, were female, with the mean age ranging from 44.6 to 41.9 across all patients (Table 2). This finding is consistent with literature studies.

Patients with brainstem symptoms at MS onset demonstrate a significantly better prognosis. A study indicated that MS patients with early brainstem symptoms experienced disability accumulation at a slower rate.[17] In our study, we found no statistically significant difference between the first- and fifth-year EDSS scores of patients with brainstem symptoms at onset (Table 4). This confirms the relatively favorable prognosis of patients with symptoms onset from the brainstem. In a 2019 study, the presence of brainstem, spinal cord, and cerebellar symptoms at onset were identified as poor prognostic factors.[18] Additionally, onset with sensory symptoms and optic neuritis was associated with a favorable prognosis. In contrast to this study, our findings showed no statistically significant difference between the first- and fifth-year EDSS scores of patients with brainstem and spinal symptoms at onset. However, there was a statistically significant difference between the first- and fifth-year EDSS scores of patients with supratentorial symptoms at onset (Table 3). In our study, we identified that among patients with optic onset, secondyear EDSS scores, age, and first-year pyramidal functions were significant predictors of fifth-year disability. In addition, second-year pyramidal functions, age, and second-year ambulation were found to be the three most important predictors, respectively, of spinal cord onset. In patients with brainstem onset, age, first-year EDSS scores, and first-year bowel and bladder functions were determined as the most important three factors for the prediction of fifth-year EDSS scores. Finally, in patients with supratentorial onset, the important factors for fifth-year EDSS were second-year pyramidal functions, second-year EDSS scores, and age.

Although there are not many studies assessing long-term disability based on patients' initial symptoms, a 2023 study found that higher age and the use of disease-modifying drugs were associated with an increased probability of EDSS scores ≥3.[19] Regarding the importance of factors affecting disability in the fifth-year, age was determined significant for all symptom groups, but disease-modifying drugs were found to be more influential only in patients with optic symptoms at onset in our study. Similar to our result, in a study aiming to identify predictors for long-term disability, first-line disease-modifying drugs were not significantly associated with long-term outcomes (10 years).[20]

This study had some limitations. The study was conducted with only two centers. In addition, evaluation of patients with MS by different physicians could be considered a limitation.

In conclusion, MS is a prevalent and chronic disease, underscoring the necessity for studies aimed at understanding the variability of MS disability status among individuals. As a result, personalized treatment planning can be facilitated. Predicting the disability of patients with MS is important in shaping treatment processes and enhancing patients' quality of life. Long-term disability status in patients with MS has been extensively discussed in the literature. In this study, significant predictors were determined using machine learning for predicting fifth-year EDSS scores of patients with MS. By analyzing various factors, such as age, initial symptoms, and functional systems evaluations, researchers can gain insights into the progression of disability over time in individuals with MS. Understanding these predictors can inform healthcare professionals in developing more personalized and effective treatment strategies. Additionally, this study’s results may encourage further research to uncover more factors affecting the progression of MS disability and enhancing patient outcomes.

Cite this article as: Buçan Kırkbir İ, Kurt B, Boz C, Terzi M. Determination of important predictors for the fifth-year Expanded Disability Status Scale scores of patients with multiple sclerosis using machine learning. Turk J Neurol 2024;30(3):157-166. doi: 10.55697/tnd.2024.107.

Data Sharing Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Ethics Committee Approval

The study protocol was approved by the Karadeniz Technical University Scientific Research Ethics Committee (date: 12.10.2020, no: 2020- 232). The study was conducted in accordance with the principles of the Declaration of Helsinki.

Author Contributions

Conceptualization, methodology, formal analysis, software, writing-original draft preparation: İ.B.K.; Conceptualization, methodology, supervision: B.K.; Conceptualization, data gathering: C.B.; Data gathering: M.T.

Conflict of Interest

The authors declared no conflicts of interest with respect to the authorship and/or publication of this article.

Financial Disclosure

The authors received no financial support for the research and/or authorship of this article.

References

  1. Özdelikara A, Taştan A. Multiple skleroz ve tamamlayıcı terapiler. JAREN 2019;5:228-32. doi:10.5222/ jaren.2019.17363.
  2. Solomon AJ, Naismith RT, Cross AH. Misdiagnosis of multiple sclerosis: Impact of the 2017 McDonald criteria on clinical practice. Neurology 2019;92:26-33. doi: 10.1212/WNL.0000000000006583.
  3. Brownlee WJ, Hardy TA, Fazekas F, Miller DH. Diagnosis of multiple sclerosis: Progress and challenges. Lancet 2017;389:1336-46. doi: 10.1016/S0140-6736(16)30959-X.
  4. Polman CH, Reingold SC, Banwell B, Clanet M, Cohen JA, Filippi M, et al. Diagnostic criteria for multiple sclerosis: 2010 revisions to the McDonald criteria. Ann Neurol 2011;69:292-302. doi: 10.1002/ana.22366.
  5. Truong CTL, Le HV, Kamauu AW, Holmen JR, Fillmore CL, Kobayashi MG, et al. Creating a real-world data, United States healthcare claims-based adaptation of Kurtzke functional systems scores for assessing multiple sclerosis severity and progression. Adv Ther 2021;38:4786-97. doi: 10.1007/s12325-021-01858-9.
  6. Uğuz S. Makine öğrenmesi-teorik yönleri ve python uygulamaları ile bir yapay zekâ ekolu. Ankara: Nobel Akademik Yayıncılık; 2019. s. 12-65.
  7. El Naqa I, Murphy MJ. What is machine learning? In: El Naqa I, Li R, Murphy MJ, editors. Machine learning in radiation oncology. Berlin: Springer; 2015. p. 3-11.
  8. Mahesh B. Machine learning algorithms: A review. IJSR[Internet] 2020;9:381-386.
  9. Chelgani SC, Nasiri H, Alidokht M. Interpretable modeling of metallurgical responses for an industrial coal column flotation circuit by XGBoost and SHAP-A “conscious-lab” development. Int J Min Sci Technol 2021;31:1135-44. doi: 10.1016/j.ijmst.2021.10.006.
  10. Balakrishnan S, Narayanaswamy R, Savarimuthu N, Samikannu R. SVM ranking with backward search for feature selection in type II diabetes databases. 2008 IEEE International Conference on Systems, Man and Cybernetics, Singapore 2008:2628-33. doi: 10.1109/ ICSMC.2008.4811692.
  11. Saarela M, Jauhiainen S. Comparison of feature importance measures as explanations for classification models. SN Appl Sci 2021;3:272. doi: 10.1007/s42452- 021-04148-9.
  12. Towardsdatascience. Avaliable at: https:// towardsdatascience.com/introduction-to-shapvalues-and-their-application-in-machine-learning8003718e6827 [Accessed: 20.03.2023]
  13. Marcílio WE, Eler DM. From explanations to feature selection: Assessing SHAP values as feature selection mechanism. 2020 33rd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), Porto de Galinhas, Brazil 2020:340-7. doi: 10.1109/ SIBGRAPI51738.2020.00053.
  14. Vaughn CB, Jakimovski D, Kavak KS, Ramanathan M, Benedict RHB, Zivadinov R, et al. Epidemiology and treatment of multiple sclerosis in elderly populations. Nat Rev Neurol 2019;15:329-42. doi: 10.1038/s41582-019- 0183-3.
  15. Dobson R, Giovannoni G. Multiple sclerosis - a review. Eur J Neurol 2019;26:27-40. doi: 10.1111/ene.13819.
  16. Harbo HF, Gold R, Tintoré M. Sex and gender issues in multiple sclerosis. Ther Adv Neurol Disord 2013;6:237- 48. doi: 10.1177/1756285613488434.
  17. Le M, Malpas C, Sharmin S, Horáková D, Havrdova E, Trojano M, et al. Disability outcomes of early cerebellar and brainstem symptoms in multiple sclerosis. Mult Scler 2021;27:755-66. doi: 10.1177/1352458520926955.
  18. Rotstein D, Montalban X. Reaching an evidencebased prognosis for personalized treatment of multiple sclerosis. Nat Rev Neurol 2019;15:287-300. doi: 10.1038/ s41582-019-0170-8.
  19. Bsteh G, Hegen H, Altmann P, Auer M, Berek K, Di Pauli F, et al. Retinal layer thickness predicts disability accumulation in early relapsing multiple sclerosis. Eur J Neurol 2023;30:1025-34. doi: 10.1111/ene.15718.
  20. Jokubaitis VG, Spelman T, Kalincik T, Lorscheider J, Havrdova E, Horakova D, et al. Predictors of long-term disability accrual in relapse-onset multiple sclerosis. Ann Neurol 2016;80:89-100. doi: 10.1002/ana.24682.