CFP last date
21 July 2025
Reseach Article

Multimodal Threat Actor Profiling on the Tor Network: Techniques, Datasets, and Ethical Challenges

by Pavan Kumar Pativada, Rahul Karne, Akhil Dudhipala
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 187 - Number 20
Year of Publication: 2025
Authors: Pavan Kumar Pativada, Rahul Karne, Akhil Dudhipala
10.5120/ijca2025925278

Pavan Kumar Pativada, Rahul Karne, Akhil Dudhipala . Multimodal Threat Actor Profiling on the Tor Network: Techniques, Datasets, and Ethical Challenges. International Journal of Computer Applications. 187, 20 ( Jul 2025), 1-7. DOI=10.5120/ijca2025925278

@article{ 10.5120/ijca2025925278,
author = { Pavan Kumar Pativada, Rahul Karne, Akhil Dudhipala },
title = { Multimodal Threat Actor Profiling on the Tor Network: Techniques, Datasets, and Ethical Challenges },
journal = { International Journal of Computer Applications },
issue_date = { Jul 2025 },
volume = { 187 },
number = { 20 },
month = { Jul },
year = { 2025 },
issn = { 0975-8887 },
pages = { 1-7 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume187/number20/multimodal-threat-actor-profiling-on-the-tor-network-techniques-datasets-and-ethical-challenges/ },
doi = { 10.5120/ijca2025925278 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2025-07-09T01:07:50.874530+05:30
%A Pavan Kumar Pativada
%A Rahul Karne
%A Akhil Dudhipala
%T Multimodal Threat Actor Profiling on the Tor Network: Techniques, Datasets, and Ethical Challenges
%J International Journal of Computer Applications
%@ 0975-8887
%V 187
%N 20
%P 1-7
%D 2025
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Profiling threat actors operating on the Tor network presents considerable challenges due to its intrinsic anonymity and layered encryption. This paper offers a comprehensive survey of major advancements between 2019 and 2025, with reference to foundational tools and methods developed earlier where relevant (e.g., Tor simulation, darknet datasets). Core methodological approaches include stylometric analysis of linguistic features [8, 9], content classification of hidden services [2, 5], encrypted traffic analysis [7], temporal behavioral modeling [10], and graph-based account linkage [6, 12]. A conceptual profiling system is proposed that ingests heterogeneous data sources—such as textual posts, metadata, and traffic logs—extracts modality-specific features (e.g., writing style, network flow patterns, timestamp distributions), and applies domainaligned ML models for multimodal embedding and identity fusion. To illustrate its practical relevance, a synthetic case study is presented demonstrating how AI techniques can correlate a threat actor’s forum posts and marketplace listings to infer authorship and behavioral alignment. Key public datasets and tools are also cataloged—including Veri- Dark [9], CoDA [5], DUTA [2], ISCX-Tor [7], and the Shadow simulator [4]—that enable reproducible research in this domain. The survey concludes with a discussion of critical ethical and legal considerations, including compliance with the EU General Data Protection Regulation (GDPR) [11], the European Union Artificial Intelligence Act [1], and U.S. surveillance law under FISA Section 702 [3]. This paper aims to provide a rigorously referenced, technically detailed, and ethically grounded synthesis of state-of-the-art methods in AI-driven threat actor profiling on the Tor network.

References
  1. Artificial intelligence act high-level summary. EU AI Act Explorer, 2024.
  2. Mhd Wesam Al Nabki, Eduardo Fidalgo, Enrique Alegre, and Ivan De Paz. Classifying illegal activities on tor network based on web textual contents. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, pages 35–43, 2017.
  3. Brennan Center for Justice. Section 702 of fisa: A resource page, April 2024.
  4. Rob Jansen and Nicholas Hopper. Shadow: Running tor in a box for accurate and efficient experimentation. In 19th Annual Network and Distributed System Security Symposium, NDSS 2012, 2012.
  5. Youngjin Jin, Eugene Jang, Yongjae Lee, Seungwon Shin, and Jin-Woo Chung. Shedding new light on the language of the dark web. arXiv preprint arXiv:2204.06885, 2022.
  6. Ramnath Kumar, Shweta Yadav, Raminta Daniulaityte, Francois Lamy, Krishnaprasad Thirunarayan, Usha Lokala, and Amit Sheth. edarkfind: Unsupervised multi-view learning for sybil account detection. In Proceedings of The Web Conference 2020, pages 1955–1965, 2020.
  7. Arash Habibi Lashkari, Gerard Draper Gil, Mohammad Saiful Islam Mamun, and Ali A Ghorbani. Characterization of tor traffic using time based features. In International Conference on Information Systems Security and Privacy, volume 2, pages 253–262. SciTePress, 2017.
  8. Pranav Maneriker, Yuntian He, and Srinivasan Parthasarathy. Sysml: Stylometry with structure and multitask learning: Implications for darknet forum migrant analysis. arXiv preprint arXiv:2104.00764, 2021.
  9. Andrei Manolache, Florin Brad, Antonio Barbalau, Radu Tudor Ionescu, and Marius Popescu. Veridark: A large-scale benchmark for authorship verification on the dark web. Advances in Neural Information Processing Systems, 35:15574– 15588, 2022.
  10. A. Shrestha, C. Barr´on-Cede˜no, P. Rosso, and M. Potthast. Profile-based author clustering for short unsolicited texts. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2486–2496, 2017.
  11. Kalliopi Spyridaki. Gdpr and ai: Friends, foes or something in between. dari https://www. sas. com/en id/insights/articles/data-management/gdpr-andai– friends–foes-or-something-in-between-. html#/. Diakses pada, 26, 2020.
  12. Yiming Zhang, Yujie Fan, Wei Song, Shifu Hou, Yanfang Ye, Xin Li, Liang Zhao, Chuan Shi, Jiabin Wang, and Qi Xiong. Your style your identity: Leveraging writing and photography styles for drug trafficker identification in darknet markets over attributed heterogeneous information network. In The World Wide Web Conference, pages 3448–3454, 2019.
Index Terms

Computer Science
Information Sciences

Keywords

Tor network Darknet forensics threat actor profiling stylometry multimodal learning deep learning temporal behavior modeling graph neural networks user de-anonymization encrypted traffic classification darknet marketplaces AI ethics GDPR compliance Shadow simulator