CFP last date
20 May 2024
Reseach Article

Author verification using a Graph-based Representation

by Esteban Castillo, Ofelia Cervantes, Darnes Vilariño, David Báez
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 123 - Number 14
Year of Publication: 2015
Authors: Esteban Castillo, Ofelia Cervantes, Darnes Vilariño, David Báez
10.5120/ijca2015905654

Esteban Castillo, Ofelia Cervantes, Darnes Vilariño, David Báez . Author verification using a Graph-based Representation. International Journal of Computer Applications. 123, 14 ( August 2015), 1-8. DOI=10.5120/ijca2015905654

@article{ 10.5120/ijca2015905654,
author = { Esteban Castillo, Ofelia Cervantes, Darnes Vilariño, David Báez },
title = { Author verification using a Graph-based Representation },
journal = { International Journal of Computer Applications },
issue_date = { August 2015 },
volume = { 123 },
number = { 14 },
month = { August },
year = { 2015 },
issn = { 0975-8887 },
pages = { 1-8 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume123/number14/22024-2015905654/ },
doi = { 10.5120/ijca2015905654 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T23:12:40.219168+05:30
%A Esteban Castillo
%A Ofelia Cervantes
%A Darnes Vilariño
%A David Báez
%T Author verification using a Graph-based Representation
%J International Journal of Computer Applications
%@ 0975-8887
%V 123
%N 14
%P 1-8
%D 2015
%I Foundation of Computer Science (FCS), NY, USA
Abstract

This paper presents a methodology for tackling the authorship verification problem. The approach is based on comparing the similarity between a given unknown document against the known documents using a graph representation that captures the syntactic sequence of texts and a graph similarity measure. An unknown document can be classified as having been written by the same author if the majority of the comparisons surpass a predefined threshold. The best results were obtained on the Clef PAN 2014 dataset: 79% for the Spanish and 68% for English, showing that the proposed methodology could be a way for determining a document authorship.

References
  1. Patrick Juola. Authorship attribution. Foundations and Trends in Information Retrieval, 1(3):233–334, 2008.
  2. Moshe Koppel, Jonathan Schler, and Shlomo Argamon. Computational methods in authorship attribution. Journal of the American Society for Information Science and Technology, 60(1):9–26, 2009.
  3. Rada Mihalcea and Dragomir Radev. Graph-based natural language processing and information retrieval. Cambridge University Press, 2011.
  4. S. S. Sonawane and P. A. Kulkarni. Article: Graph based representation and analysis of text document: A survey of techniques. International Journal of Computer Applications, 96(19):1–8, 2014.
  5. Marie-Catherine de Marneffe, Bill MacCartney, and Christopher D. Manning. Generating typed dependency parses from phrase structure trees. In LREC, pages 449–454, 2006.
  6. Aria Haghighi, Andrew Y. Ng, and Christopher D. Manning. Robust textual inference via graph matching. In EMNLP. The Association for Computational Linguistics, 2005.
  7. Diane J. Cook and Lawrence B. Holder. Graph-based data mining. IEEE Intelligent Systems, 15(2):32–41, 2000.
  8. L.C. Freeman. The Development of Social Network Analysis: A Study in the Sociology of Science. BookSurge Publishing, 2004.
  9. S. Wasserman and K. Faust. Social network analysis: Methods and applications. Cambridge Univ Pr, 1994.
  10. Santo Fortunato. Community detection in graphs. Physics Reports, 486:75–174, 2010.
  11. M. E. J. Newman and M. Girvan. Finding and evaluating community structure in networks. Phys. Rev. E, 69(2):026113, February 2004.
  12. R. Arun, V. Suresh, and C. E. Veni Madhavan. Stopword graphs and authorship attribution in text corpora. In ICSC, pages 192–196. IEEE Computer Society, 2009.
  13. Darnes Vilari˜no, David Pinto, Helena G´omez-Adorno, Saul Le´on, and Esteban Castillo. Lexical-syntactic and graph-based features for authorship verification notebook for pan at clef 2013. In CLEF (Working Notes), volume 1179 of CEUR Workshop Proceedings. CEUR-WS.org, 2013.
  14. Efstathios Stamatatos. A survey of modern authorship attribution methods. Journal of the American Society for Information Science and Technology, 60(3):538–556, 2009.
  15. Lada A. Adamic and Eytan Adar. Friends and neighbors on the web. Social Networks, 25(3):211–230, 2003.
  16. Christopher D. Manning, Prabhakar Raghavan, and Hinrich Sch¨utze. Introduction to Information Retrieval. Cambridge University Press, 2008.
  17. Christopher D. Manning and Hinrich Sch¨utze. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, Massachusetts, 1999.
  18. L. P. Cordella, P. Foggia, C. Sansone, and M. Vento. An improved algorithm for matching large graphs. In In: 3rd IAPR-TC15 Workshop on Graph-based Representations in Pattern Recognition, Cuen, pages 149–159, 2001.
  19. L.C. Freeman. Centrality in Social Networks: Conceptual Clarification. Social Networks, 1:215–239, 1979.
  20. Efstathios Stamatatos, Walter Daelemans, Ben Verhoeven, Benno Stein, Martin Potthast, Patrick Juola, Miguel A. S´anchez, and Alberto Barr´on. Overview of the author identification task at PAN 2014. In Working Notes for CLEF 2014 Conference, Sheffield, UK, September 15-18, 2014., pages 877–897, 2014.
  21. G. Zipf. Selective Studies and the Principle of Relative Frequency in Language. Harvard University Press, Cambridge, MA, 1932.
  22. Gabor Csardi and Tamas Nepusz. The igraph software package for complex network research. InterJournal, Complex Systems:1695, 2006.
  23. Fabian Pedregosa. Scikit-learn: Machine learning in python. Journal of Machine Learning Research, 12:2825–2830, 2011.
  24. Mahmoud Khonji and Youssef Iraqi. A slightly-modified gi-based author-verifier with lots of features (asgalf). In CLEF (Working Notes), volume 1180 of CEUR Workshop Proceedings, pages 977–983. CEUR-WS.org, 2014.
  25. Esteban Castillo, Ofelia Cervantes, Darnes Vilari˜no, David Pinto, and Saul Le´on. Unsupervised method for the authorship identification task. In CLEF (Working Notes), volume 1180 of CEUR Workshop Proceedings, pages 1035–1041. CEUR-WS.org, 2014.
  26. S. P. Abney. Parsing by chunks. In Robert C. Berwick, Steven P. Abney, and Carol Tenny, editors, Principle-Based Parsing: Computation and Psycholinguistics, pages 257–278. Kluwer, 1991.
  27. MEJ Newman. Finding community structure in networks using the eigenvectors of matrices. Physical Review E, 74(3):36104, 2006.
  28. Usha Nandini Raghavan, R´eka Albert, and Soundar Kumara. Near linear time algorithm to detect community structures in large-scale networks. Phys. Rev. E, 76, 2007.
  29. Marco A. Alvarez and Changhui Yan. A graph-based semantic similarity measure for the gene ontology. J. Bioinformatics and Computational Biology, 9(6):681–695, 2011.
  30. Efstathios Stamatatos. Author identification: Using text sampling to handle the class imbalance problem. Inf. Process. Manage., 44(2):790–799, March 2008.
  31. Bo Pang and Lillian Lee. Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2(1-2):1–135, 2008.
Index Terms

Computer Science
Information Sciences

Keywords

Authorship Verification Syntactic Sequence Graph Graph Similarity