International Journal of Secondary Computing and Applications Research


home | blog | events | pubs | scholarship

Machine Learning Framework for Phishing Detection through Email using Imbalanced Data

Mher Mkrtumyan

Affiliation: Horizon Academic Research Program

IJSCAR Vol. 3, Issue 1 (2026)  ·  pp. 24–30

DOI: 10.5281/zenodo.18435064


Abstract

Phishing emails remain a persistent cybersecurity threat as attackers continue to employ increasingly sophisticated techniques to deceive users into disclosing sensitive information. Detection is particularly challenging in real-world environments where legitimate emails vastly outnumber malicious ones. This study presents a dual-layer machine learning framework for phishing detection that independently analyzes sender metadata and email body content. The sender layer evaluates structural characteristics of email addresses while the content layer extracts linguistic and statistical features from email text. Each layer produces a probability score representing the likelihood of phishing; these are subsequently integrated using a meta-classification model to generate a final decision. The framework is evaluated on a large real-world dataset containing over 500000 emails with a highly imbalanced class distribution. Experimental results demonstrate that the proposed approach provides robust and reliable performance under realistic conditions highlighting the effectiveness of integrating multiple analytical perspectives for practical phishing detection.


Keywords: Phishing Detection, Machine Learning, Dual-Layer Framework, Email Security, Imbalanced Data, Sender Metadata, Email Content Analysis, Random Forest, XGBoost, LightGBM, ROC Curve, Logistic Regression, Cybersecurity, Dataset Preprocessing


View Full Issue PDF   All Publications