COMPARISON OF MACHINE LEARNING ALGORITHMS FOR DETECTING RARE DISEASES

Main Article Content

ALLEN JAWON PARK

Abstract

Machine learning is a powerful tool for finding important trends in data. One area of application for machine learning is to detect a specific type of disease from changes in symptoms or lab results of a patient. In real life, data about a disease is unbalanced, because there will often be more data from healthy subjects, and there may be a limited amount of data for a rare disease. In this study, various machine learning algorithms were compared to find which one would be the most effective in classifying diseases across all types of prevalence of the disease, whether rare or common. Without any machine learning tuning, linear SVM, Naïve Bayes, and Neural Network models performed the best with an accuracy over 90% for each model. With further exploration of SVM classifiers by tuning the regularization parameter and classification function degree, the classification boundary function is best to be a polynomial with odd degree or just linear.  The study found that the disease data was inherently a linear problem due to the degree 1 curve having the highest accuracy overall. An even polynomial did not have negative outputs, so it was therefore limited in its scope. Because our classification problem of disease progression required a hyperplane that divided data points between positive and negative changes, an SVM with an odd-degree polynomial might be necessary.

Keywords:
Computational algorithm, linear SVM, machine learning, naïve bayers, neural network models

Article Details

How to Cite
PARK, A. J. (2021). COMPARISON OF MACHINE LEARNING ALGORITHMS FOR DETECTING RARE DISEASES. Journal of Medicine and Health Research, 6(1), 26-34. Retrieved from https://www.ikprress.org/index.php/JOMAHR/article/view/6563
Section
Original Research Article

References

Duhigg C. How companies learn your secrets. The New York Times; 2012 Feb 16.

[Accessed 2021 Jan 25].

Available:https://www.nytimes.com/2012/02/19/magazine/shopping-habits.html.

Stone J, Hangge P, Albadawi H, Wallace A, Shamoun F, Knuttien MG, Naidu S, Oklu R. Deep vein thrombosis: pathogenesis, diagnosis, and medical management. Cardiovascular Diagnosis and Therapy. 2017;7(S3):S276–S284.

Trivedi MH. The link between depression and physical symptoms. Primary care companion to the Journal of Clinical Psychiatry. 2004;6(Suppl 1):12–16.

Boser, Bernhard E, Guyon, Isabelle M, Vapnik, Vladimir N. A training algorithm for optimal margin classifiers. Proceedings of the fifth annual workshop on Computational learning theory – COLT '92. p. 1992;144. CiteSeerX 10.1.1.21.3818.

DOI:10.1145/130385.130401. ISBN 978-0897914970. S2CID 207165665.

Pedregosa F, et al. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research; 2011 [accessed 2021 Mar 1].

Available:https://jmlr.csail.mit.edu/papers/v12/pedregosa11a.html.

Thomas J. Facts and Statistics About the Flu. Healthline. 2018 Nov 19 [accessed 2021 Feb 8]. Available:https://www.healthline.com/health/influenza/facts-and-statistics.