TY - GEN
T1 - Analysis and Prediction of Breast Cancer using AzureML Platform
AU - Alshouiliy, Khaldoon
AU - Shivanna, Abhishek
AU - Ray, Sujan
AU - Alghamdi, Ali
AU - Agrawal, Dharma P.
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/10
Y1 - 2019/10
N2 - Nowadays, healthcare sector starts relying on the datasets that are collected by clinics or some organizations to help doctors in predicting and analyzing the patient's status in early stage. There are many dangerous diseases around the world that people suffer from them, but one of the most dangerous diseases is cancer. Recent research shows that about 12% US women over the course of their life, develop invasive breast cancer. Thus, in this case, the breast cancer (BC) is categorized as a dangerous type among all cancer types. This study focuses on BC by using a well-known dataset titled Breast Cancer Wisconsin (Diagnostic) Data Set. It has 32 attributes and 569 instances. Some of those attributes have missing values and others are not necessary for our work. So, we removed the ID column and any instance that has a missing value. Our aims in this research is analyzing BC dataset and understand its features. Then, we upload it to Microsoft Azure machine learning (AzureML) platform for building our model. We use two classes Decision Jungle and two Classes Decision machine learning algorithms to predicate whether the patient diagnose is Benign or Malignant. We assess the performance of each algorithms in terms of different measures like Accuracy, Precision, Recall, F1 and AUC. The results of our study in this paper show that the accuracy of Decision Jungle is approximately 97%. On the other hand, the accuracy of Decision tree is approximately 95%.
AB - Nowadays, healthcare sector starts relying on the datasets that are collected by clinics or some organizations to help doctors in predicting and analyzing the patient's status in early stage. There are many dangerous diseases around the world that people suffer from them, but one of the most dangerous diseases is cancer. Recent research shows that about 12% US women over the course of their life, develop invasive breast cancer. Thus, in this case, the breast cancer (BC) is categorized as a dangerous type among all cancer types. This study focuses on BC by using a well-known dataset titled Breast Cancer Wisconsin (Diagnostic) Data Set. It has 32 attributes and 569 instances. Some of those attributes have missing values and others are not necessary for our work. So, we removed the ID column and any instance that has a missing value. Our aims in this research is analyzing BC dataset and understand its features. Then, we upload it to Microsoft Azure machine learning (AzureML) platform for building our model. We use two classes Decision Jungle and two Classes Decision machine learning algorithms to predicate whether the patient diagnose is Benign or Malignant. We assess the performance of each algorithms in terms of different measures like Accuracy, Precision, Recall, F1 and AUC. The results of our study in this paper show that the accuracy of Decision Jungle is approximately 97%. On the other hand, the accuracy of Decision tree is approximately 95%.
KW - Analysis
KW - AzureML
KW - Breast Cancer
KW - Decision Tree
KW - Jungle Tree
KW - Machine Learning
KW - Prediction UCI dataset
UR - https://www.scopus.com/pages/publications/85077966174
U2 - 10.1109/IEMCON.2019.8936294
DO - 10.1109/IEMCON.2019.8936294
M3 - Conference contribution
AN - SCOPUS:85077966174
T3 - 2019 IEEE 10th Annual Information Technology, Electronics and Mobile Communication Conference, IEMCON 2019
SP - 212
EP - 218
BT - 2019 IEEE 10th Annual Information Technology, Electronics and Mobile Communication Conference, IEMCON 2019
A2 - Chakrabarti, Satyajit
A2 - Saha, Himadri Nath
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 10th IEEE Annual Information Technology, Electronics and Mobile Communication Conference, IEMCON 2019
Y2 - 17 October 2019 through 19 October 2019
ER -