Detecção de ameaças em logs de segurança: comparação entre abordagem baseada em regras e análise com LLM

Flávia Thereza da Fonceca

doi:10.69849/e27xgm50

Authors

Flávia Thereza da Fonceca Faculdade de Tecnologia de Jundiaí Deputado Ary Fossen Author

DOI:

https://doi.org/10.69849/e27xgm50

Keywords:

web security, threat detection, LLM, regular expressions, payload classification, artificial intelligence, cybersecurity

Abstract

This paper aims to systematically compare two approaches for detecting and classifying malicious payloads in HTTP parameters: the classical rule-based approach using regular expressions, and the use of large language models (LLMs). This study used the HttpParamsDataset, which contains 31,067 records divided into five classes: normal traffic, SQL injection, Cross-Site Scripting, path traversal, and command injection. Seven models from two providers were tested — Anthropic (Claude Haiku 4.5, Sonnet 4.6, and Opus 4.6) and OpenAI (GPT-4o-mini, GPT-4.1-mini, GPT-4.1, and GPT-5.4) — in two modalities: pure textual analysis and dynamic regex script generation. The baseline was a static engine with 40 regex rules inspired by the OWASP ModSecurity Core Rule Set. Experiments were run over 30 sub-samples of 500 records each, with paired statistical tests for validation. Results show that LLM textual analysis outperforms the rule-based approach in almost every scenario: Claude Haiku 4.5 reached Macro-F1 of 0.967 against 0.867 for the rule engine (p < 0.0001), at a cost of US$ 1.96 for the 30 sub-samples. When LLMs were asked to generate regex scripts, performance dropped below the hand-written engine across all tested models. Total cost of the 15 experiments was US$ 35.81, with 9.8 hours of aggregate computation time. Since the experiments are statistically independent, they were run in parallel (4 concurrent processes for Anthropic and 2 for OpenAI, with both providers running in parallel to each other), reducing the effective wall-clock time to approximately 3.5 hours. The findings suggest that, for payload classification, the direct use of LLMs as textual classifiers offers better cost-effectiveness than approaches based on static rules or automatic regex generation, indicating that hybrid systems — with regex rules as a first filtering layer and LLMs for ambiguous cases — represent a promising direction for production environments.

Author Biography

Flávia Thereza da Fonceca, Faculdade de Tecnologia de Jundiaí Deputado Ary Fossen

Graduanda em Tecnologia em Gestão da Tecnologia da Informação — Faculdade de Tecnologia de Jundiaí Deputado Ary Fossen. E-mail: flafonceca@gmail.com

References

ANTHROPIC. Claude API documentation and pricing. San Francisco: Anthropic, 2026. Disponível em: https://docs.anthropic.com. Acesso em: 07 abr. 2026.

APPELT, D.; NGUYEN, C. D.; PANICHELLA, A.; BRIAND, L. C. A machine-learning-driven evolutionary approach for testing web application firewalls. IEEE Transactions on Reliability, v. 67, n. 3, p. 733-757, 2018. DOI: 10.1109/TR.2018.2805763.

CRESPO-MARTÍNEZ, I. S. et al. SQL injection attack detection in network flow data. Computers & Security, v. 127, p. 103093, 2023. DOI: 10.1016/j.cose.2023.103093.

FONCECA, F. T. llm-vs-regex-threat-detection: pipeline de avaliação comparativa para classificação de payloads HTTP. [S. l.]: GitHub, 2026. Disponível em: https://github.com/flafonceca/llm-vs-regex-threat-detection. Acesso em: 17 abr. 2026.

FREDJ, O. B. et al. An OWASP top ten driven survey on web application protection methods. In: Risks and Security of Internet and Systems — CRiSIS 2020. Lecture Notes in Computer Science. Cham: Springer, 2021. v. 12528, p. 235-252.

GOUTTE, C.; GAUSSIER, E. A probabilistic interpretation of precision, recall and F-score, with implication for evaluation. In: Advances in Information Retrieval — ECIR 2005. Berlin: Springer, 2005. p. 345-359.

HALFOND, W. G. J.; VIEGAS, J.; ORSO, A. A classification of SQL-injection attacks and countermeasures. In: Proceedings of the International Symposium on Secure Software Engineering. Washington, D.C., USA: [s. n.], 2006.

HYDARA, I.; SULTAN, A. B. M.; ZULZALIL, H.; ADMODISASTRO, N. Current state of research on cross-site scripting (XSS): a systematic literature review. Information and Software Technology, v. 58, p. 170-186, 2015. DOI: 10.1016/j.infsof.2014.07.010.

KAUR, J.; GARG, U.; BATHLA, G. Detection of cross-site scripting (XSS) attacks using machine learning techniques: a review. Artificial Intelligence Review, v. 56, p. 12725-12769, 2023. DOI: 10.1007/s10462-023-10433-3.

LIU, J.; XIA, C. S.; WANG, Y.; ZHANG, L. Is your code generated by ChatGPT really correct? Rigorous evaluation of large language models for code generation. In: Advances in Neural Information Processing Systems (NeurIPS), v. 36, 2023.

MEHTA, D. et al. SQLIML: a comprehensive analysis for SQL injection detection using multiple supervised and unsupervised learning schemes. SN Computer Science, v. 4, n. 3, p. 281, 2023. DOI: 10.1007/s42979-022-01626-8.

MITRE CORPORATION. Common Weakness Enumeration (CWE). Bedford: The MITRE Corporation, 2024. Disponível em: https://cwe.mitre.org/. Acesso em: 22 abr. 2026.

MONTGOMERY, D. C. Design and analysis of experiments. 10. ed. Hoboken: Wiley, 2019.

MORZEUX. HttpParamsDataset: dataset contains several benign and attacks samples which can be used as values in HTTP protocol. [S. l.]: GitHub, 16 mar. 2016. Licença MIT. Disponível em: https://github.com/Morzeux/HttpParamsDataset. Acesso em: 17 abr. 2026.

OPENAI. OpenAI API documentation and pricing. San Francisco: OpenAI, 2026. Disponível em: https://platform.openai.com/docs. Acesso em: 07 abr. 2026.

OWASP FOUNDATION. A01:2021 – Broken Access Control. OWASP Top 10:2021. Disponível em: https://owasp.org/Top10/2021/A01_2021-Broken_Access_Control/. Acesso em: 15 abr. 2026.

OWASP FOUNDATION. A03:2021 – Injection. OWASP Top 10:2021. Disponível em: https://owasp.org/Top10/2021/A03_2021-Injection/. Acesso em: 15 abr. 2026.

OWASP FOUNDATION. OWASP CRS — Core Rule Set. Versão 4.25.0 LTS. [S. l.]: OWASP, 2026. Disponível em: https://coreruleset.org/. Acesso em: 17 abr. 2026.

PEARCE, H.; AHMAD, B.; TAN, B.; DOLAN-GAVITT, B.; KARRI, R. Asleep at the keyboard? Assessing the security of GitHub Copilot's code contributions. In: Proceedings of the 43rd IEEE Symposium on Security and Privacy (SP). San Francisco: IEEE, 2022. p. 754-768. DOI: 10.1109/SP46214.2022.9833571.

RASHIMO. ChCNN: a convolutional neural network approach to classify web requests. [S. l.]: GitHub, 2020. Disponível em: https://github.com/rashimo/ChCNN. Acesso em: 07 abr. 2026.

RISTIC, I. ModSecurity handbook: the complete guide to the popular open source web application firewall. 2. ed. London: Feisty Duck, 2017.

VASWANI, A. et al. Attention is all you need. In: Advances in Neural Information Processing Systems (NeurIPS), v. 30, p. 5998-6008, 2017.

XU, H. et al. Large language models for cyber security: a systematic literature review. arXiv preprint, arXiv:2405.04760, 2024.

YAO, Y. et al. A survey on large language model (LLM) security and privacy: the good, the bad, and the ugly. High-Confidence Computing, v. 4, n. 2, p. 100211, 2024. DOI: 10.1016/j.hcc.2024.100211.

ZHOU, X. et al. Large language model for vulnerability detection and repair: literature review and the road ahead. ACM Transactions on Software Engineering and Methodology, v. 34, n. 5, 2025. DOI: 10.1145/3708522.

Threat detection in security logs: a comparison between a rules-based approach and analysis with LLM

Authors

DOI:

Keywords:

Abstract

Author Biography

References

Downloads

Published

Issue

Section

License

How to Cite

Language