Spam Detection Using Natural Language Processing

Authors

  • Aditya Srivastava Amity School of Engineering and Technology Lucknow, Amity University Uttar Pradesh, India https://orcid.org/0000-0002-7508-7927
  • P. Singh Amity School of Engineering and Technology Lucknow, Amity University Uttar Pradesh, India

DOI:

https://doi.org/10.54060/a2zjournals.jase.70

Keywords:

Artificial Intelligence, Natural Language Processing, Naive Bayes Classifier, Spam Detection

Abstract

In the digital age, where digital communication is omnipresent, the issue of spam remains pervasive, undermining the quality of user experiences, compromising cybersecurity, and posing significant challenges. This research paper is a comprehensive exploration of "Spam Detection Using Natural Language Processing". We traverse a multifaceted journey through the realms of spam detection, dissecting its crucial components and implications. Our investigation commences with data collection and preprocessing, discussing the intricacies of gathering diverse datasets and transforming them into analysable forms. Feature engineering takes center stage as we unveil the pivotal role of engineered features in distinguishing spam from legitimate content. Model building and evaluation form the core of spam detection, and we scrutinize various algorithms, techniques, and metrics that drive the development of effective spam detection systems. Challenges loom large in spam detection, from imbalanced datasets and evasion tactics to the perpetual struggle for false positive-false negative equilibrium. Privacy concerns and the legal landscape add further layers of complexity. Real-world applications span the gamut, encompassing social media moderation, review systems, chat applications, and more. We unearth how spam detection safeguards user interactions, maintains quality, and secures digital ecosystems across these diverse platforms. Finally, we gaze into the horizon of spam detection's future, envisioning trends such as deep learning dominance, multimodal detection, adversarial defense, and blockchain authentication. This research paper is a compendium of insights, strategies, and prospects, providing a holistic view of spam detection in the dynamic digital age.

Downloads

Download data is not yet available.

References

S. Carreras and L. Marquez, “Boosting Trees for Anti-Spam Email Filtering,” IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 12, pp. 1619–1632, 2006.

S. Hamad, S. Al-Darabsah, and H. A. J. Alhammi, “Spam Email Detection Using Machine Learning Algorithms,” in Proceedings of the 2017 IEEE/ACS 14th International Conference on Computer Systems and Applications (AICCSA, 2017, pp. 216–221.

M. S. Islam, M. K. Hasan, and A. S. Tariq, “Spam Filtering and Email Security,” Information Systems Security, vol. 25, no. 1, pp. 48–69, 2016.

J. Zhang, “Text Classification and Spam Filtering: A Comparison of Semi-supervised Learning Approaches,” in Proceedings of the IEEE International Conference on Data Mining (ICDM), 2006, pp. 1157–1162.

A. Junnarkar, S. Adhikari, J. Fagania, P. Chimurkar and D. Karia, "E-Mail Spam Classification via Machine Learning and Natural Language Processing," 2021 Third International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV), Tirunelveli, India, 2021, pp. 693-699, doi: 10.1109/ICICV50876.2021.9388530.

R. Bhattacharjee, P. Debnath, and S. Das, “Spam Detection in Social Media Using Deep Learning,” IEEE Access, vol. 8, pp. 130298–130307, 2020.

S. Salloum, T. Gaber, S. Vadera, and K. Shaalan, “Phishing email detection using natural language processing techniques: A literature survey,” Procedia Comput. Sci., vol. 189, pp. 19–28, 2021.

J. Li, L. Wang, and S. Zhang, “A Survey of Email Spam Detection Methods: A Comprehensive Study,” IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol. 39, no. 4, pp. 421–434, 2009.

G. Forman, “An Extensive Empirical Study of Feature Selection Metrics for Text Classification,” in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2003, pp. 124–133.

J. Schmidhuber, “Deep learning in neural networks: an overview,” Neural Netw., vol. 61, pp. 85–117, 2015.

A. Oladimeji, A. Hamzat, and A. O. Balogun, “Spam Detection in Emails Using Machine Learning Techniques,” in Proceedings of the 2018 IEEE International Conference on Computational Science and Engineering (CSE), 2018, pp. 203–208.

K. Zhang, L. Ma, and S. Wang, “Detecting Phishing Emails via Ensemble Learning with Diverse Features,” IEEE Transactions on Information Forensics and Security, vol. 15, pp. 3650–3665, 2020.

A. Kolcz, “A Large Scale Evaluation of Email Address Obfuscation Techniques against Structured Email Addresses,” in Proceedings of the IEEE International Conference on Data Mining (ICDM), 2006, pp. 1022–1027.

S. P. Mohanty and R. R. Panda, “Feature Selection Methods for Text Classification: A Comparative Study,” in Proceedings of the 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2011, pp. 4887–4893.

C. Anilkumar, A. Karrothu, N. S. Mouli, and C. B. Tej, “Recognition and processing of phishing emails using NLP: A survey,” in 2023 International Conference on Computer Communication and Informatics (ICCCI), 2023.

Downloads

Published

2024-07-25

How to Cite

[1]
A. Srivastava and P. Singh, “Spam Detection Using Natural Language Processing”, J. Appl. Sci. Educ., vol. 4, no. 2, pp. 1–7, Jul. 2024.

CITATION COUNT