Call for Paper - August 2022 Edition
IJCA solicits original research papers for the August 2022 Edition. Last date of manuscript submission is July 20, 2022. Read More

A Sub-graph Matching Method based on Calibration of Characteristics of Topological Footprint

Print
PDF
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Year of Publication: 2015
Authors:
David F. Nettleton, Anton Dries
10.5120/ijca2015907098

David F Nettleton and Anton Dries. Article: A Sub-graph Matching Method based on Calibration of Characteristics of Topological Footprint. International Journal of Computer Applications 130(10):29-38, November 2015. Published by Foundation of Computer Science (FCS), NY, USA. BibTeX

@article{key:article,
	author = {David F. Nettleton and Anton Dries},
	title = {Article: A Sub-graph Matching Method based on Calibration of Characteristics of Topological Footprint},
	journal = {International Journal of Computer Applications},
	year = {2015},
	volume = {130},
	number = {10},
	pages = {29-38},
	month = {November},
	note = {Published by Foundation of Computer Science (FCS), NY, USA}
}

Abstract

Approximate sub-graph matching is important in many graph data mining fields. At present, current solutions can be difficult to implement, have an expensive pre-processing phase, or only work for given types of graph. In this paper a novel generic approach is presented which addresses these issues. An approximate sub-graph matcher (A-SGM) calculates the distance between the topological characteristics (footprint) of the sub-graphs to be matched, applying a weighting to the different sub-graph characteristics and those of neighbor nodes. The weights are calibrated for each dataset with a simulated annealing process using sample sets of graph nodes to reduce computational cost, and an exact isomorphism matcher as a fitness function which takes into account how well the match maintains the neighboring node degree distributions. Benchmarking is performed on several state of the art methods and real and synthetic graph datasets to evaluate the precision, recall and computational cost. The results show that the A-SGM is competitive with state of the art methods in terms of precision, recall and execution time.

References

  1. Ewing, T.J.A, Kuntz I.D. 1997. Critical evaluation of search algorithms for automated molecular docking and database screening. J. Comput. Chem., 18(9) (1997) 1175-1189.
  2. Raymond, J.W., Willett, P. 2002. Maximum common subgraph isomorphism algorithms for the matching of chemical structures. J. Computer-aided Molecular Design, 16(7) (2002) 521-533.
  3. Inokuchi, A., Washio, T., Motoda, H. Complete mining of frequent patterns from graphs: Mining graph data. Mach. Learn. 50(3) (2003) 321-354.
  4. Nettleton, D.F. 2013. Data Mining of Social Networks Represented as Graphs, Comp. Sci. Rev., 7 (2013) 1-34.
  5. Bunke, H. 1999. Error correcting graph matching: on the influence of the underlying cost function. IEEE Trans. Patt. Anal. Mach. Intell. 21, 917-922.
  6. Nettleton, D. F. and Dries, A. 2014. Local Neighbourhood Sub-Graph Matching Method. European Patent application number: 13382308.8. Presented: 30th July 2013. PCT application number: PCT/ES2014/065505. Presented 18th July 2014.
  7. Nettleton, D.F., Torra, V., Dries, A. 2014. A Comparison of Clustering and Modification based Graph Anonymization Methods with Constraints. International Journal of Computer Applications (0975 – 8887), Vol. 95– No.20, June 2014.
  8. McKay, B.D. 1981. Practical graph isomorphism, Congressus Numerantium 30, 45-87.
  9. Cordella, L.P., Foggia, P., Sansone, C. and Vento, M. 2001. An Improved Algorithm for Matching Large Graphs. In: Proc. 3rd IAPR-TC-15 Int. Workshop on Graph based Representations, pp. 149-159.
  10. Tian, Y. and Patel, J.M. 2008. TALE: A Tool for Approximate Large Graph Matching. In: Proc. IEEE ICDE 2008, 24th Int. Conf. on Data Engineering, pp. 963-972.
  11. Wei, F. 2010. Tedi: Efficient shortest path query answering on graphs. In: Proc. SIGMOD 2010, pp. 99–110.
  12. Zou, L., Chen, L., Özsu, M.T. and Zhao, D. 2012. Answering pattern match queries in large graph databases via graph embedding. J. VLDB 2012; 21(1): 97-120:
  13. Zhang, S., Yang, J. and Jin, W. 2010. SAPPER: Subgraph Indexing and Approximate Matching in Large Graphs. In: Proc. VLDB, 3(1), 2010.
  14. Zhao, P. and Jiawei Han, J. 2010. On Graph Query Optimization in Large Networks. In: Proc. VLDB 3(1): pp. 340-351.
  15. He, H. and Singh, A.K. 2008. Graphs-at-a-time: query language and access methods for graph databases. In: Proc. SIGMOD'08, pp. 405-418.
  16. Khan, A., Li, N., Yan, X., Guan, Z., Chakraborty, S. and Tao, S. 2011. Neighborhood based fast graph search in large networks. SIGMOD '11.
  17. Sun, Z., Wang, H., Wang, H., Shao, B. and Li, J. 2012. Efficient subgraph matching on billion node graphs. In: Proc. VLDB, 5(9):788-799.
  18. Zhang, S., Li, S. and Yang, J. 2009. Gaddi: distance index based subgraph matching in biological networks. In: Proc. EDBT '09, 12th Int. Conf. on Extending Database Technology: Advances in Database Technology, pp. 192-203.
  19. Walters, W.H. Comparative Recall and Precision of Simple and Expert Searches in Google Scholar and Eight Other Databases. 2011. The Johns Hopkins University Press, Baltimore. Portal: Libraries and the Academy, vol. 11, no. 4, pp. 971–1006.
  20. Yang, J. and Leskovec, J. 2012. Defining and Evaluating Network Communities based on Ground-truth. In: Proc. ICDM 2012, pp. 745-754.
  21. Leskovec, J., Kleinberg, J. and Faloutsos, C. 2007. Graph Evolution: Densification and Shrinking Diameters. ACM Trans. on Knowledge Discovery from Data (ACM TKDD), 1(1).
  22. Sun, S., Ling, L., Zhang, N., Li, G. and Chen, R. 2003. Topological structure analysis of the protein-protein interaction network in budding yeast. Nucleic Acids Research, Vol. 31, No. 9 2443-2450.
  23. Chakrabarti, D., Zhan, Y. and Faloutsos, C. 2004. R-mat: A recursive model for graph mining. In Proc. SDM (Secure Data Management), Orlando, Florida, USA, pp. 442-446.

Keywords

Graph Matching, topology, graph characteristics, weight calibration, simulated annealing, graph queries.