International Journal of Secondary Computing and Applications Research

home | blog | events | pubs | scholarship

Human Perception and Detection of AI-Generated Phishing Emails: A Red-Teaming and Multi-Layered Detection Approach

Rohan Mehra

Affiliation: Riverdale Country School

IJSCAR Vol. 3, Issue 1 (2026) · pp. 4–12

Abstract

Large Language Models (LLMs) now generate phishing emails indistinguishable from legitimate correspondence challenging traditional detection systems. While baseline classifiers achieve 97-99% accuracy on established corpora they remain untested against adversarially-generated AI variants. We introduce a detectability gap metric quantifying confidence differences between AI and human-authored phishing enabling continuous monitoring of generative model evolution. Evaluating 155659 emails (72941 legitimate; 81519 human-phishing; 1199 AI-phishing) from three primary sources plus eleven datasets our Random Forest classifier achieved 98.13% test accuracy. A GAN to GPT adversarial pipeline generating 280 variants revealed 84.3% detection rate with misclassifications concentrated in formal non-urgent examples. Feature analysis showed reliance on traditional phishing markers rather than AI-specific patterns suggesting vulnerability as generative attacks evolve. This framework provides early-warning capability for tracking LLM advancement while maintaining computational efficiency suitable for resource-constrained organizations.

Keywords: Phishing Detection, Large Language Models, Machine Learning Security, Adversarial Machine Learning, Email Security, Generative AI, Random Forest, Cybersecurity

View Full Issue PDF All Publications