ANF: a fast and scalable tool for data mining in massive graphs. Graphs are an increasingly important data source, with such important graphs as the Internet and the Web. Other familiar graphs include CAD circuits, phone records, gene sequences, city streets, social networks and academic citations. Any kind of relationship, such as actors appearing in movies, can be represented as a graph. This work presents a data mining tool, called ANF, that can quickly answer a number of interesting questions on graph-represented data, such as the following. How robust is the Internet to failures? What are the most influential database papers? Are there gender differences in movie appearance patterns? At its core, ANF is based on a fast and memory-efficient approach for approximating the complete ”neighbourhood function” for a graph. For the Internet graph (268K nodes), ANF’s highly-accurate approximation is more than 700 times faster than the exact computation. This reduces the running time from nearly a day to a matter of a minute or two, allowing users to perform ad hoc drill-down tasks and to repeatedly answer questions about changing data sources. To enable this drill-down, ANF employs new techniques for approximating neighbourhood-type functions for graphs with distinguished nodes and/or edges. When compared to the best existing approximation, ANF’s approach is both faster and more accurate, given the same resources. Additionally, unlike previous approaches, ANF scales gracefully to handle disk resident graphs. Finally, we present some of our results from mining large graphs using ANF.

This software is also peer reviewed by journal TOMS.

References in zbMATH (referenced in 14 articles )

Showing results 1 to 14 of 14.
Sorted by year (citations)

  1. Woodruff, David P.; Zhang, Qin: When distributed computation is communication expensive (2017)
  2. Zhao, Junzhou; Wang, Pinghui; Lui, John C. S.; Towsley, Don; Guan, Xiaohong: I/O-efficient calculation of (H)-group closeness centrality over disk-resident graphs (2017)
  3. Schieber, Tiago A.; Carpi, Laura; Frery, Alejandro C.; Rosso, Osvaldo A.; Pardalos, Panos M.; Ravetti, Martín G.: Information theory perspective on network robustness (2016)
  4. Li, Rong-Hua; Yu, Jeffrey Xu; Huang, Xin; Cheng, Hong; Shang, Zechao: Measuring the impact of MVC attack in large complex networks (2014)
  5. Crescenzi, Pilu; Grossi, Roberto; Habib, Michel; Lanzi, Leonardo; Marino, Andrea: On computing the diameter of real-world undirected graphs (2013)
  6. Takes, Frank W.; Kosters, Walter A.: Computing the eccentricity distribution of large graphs (2013)
  7. Crescenzi, Pierluigi; Grossi, Roberto; Lanzi, Leonardo; Marino, Andrea: A comparison of three algorithms for approximating the distance distribution in real-world graphs (2011)
  8. Vijayalakshmi, R.; Nadarajan, R.; Roddick, John F.; Thilaga, M.; Nirmala, P.: FP-GraphMiner -- a fast frequent pattern mining algorithm for network graphs (2011)
  9. Leskovec, Jure; Chakrabarti, Deepayan; Kleinberg, Jon; Faloutsos, Christos; Ghahramani, Zoubin: Kronecker graphs: an approach to modeling networks (2010)
  10. Cami, Aurel; Deo, Narsingh: Techniques for analyzing dynamic random graph models of web-like networks: An overview (2008)
  11. Bawa, Mayank; Gionis, Aristides; Garcia-Molina, Hector; Motwani, Rajeev: The price of validity in dynamic networks (2007)
  12. Fogaras, Dániel; Rácz, Balász; Csalogány, Károly; Sarlós, Tamás: Towards scaling fully personalized PageRank: algorithms, lower bounds, and experiments (2005)
  13. Kuramochi, Michihiro; Karypis, George: Finding frequent patterns in a large sparse graph (2005) ioport
  14. Kuramochi, Michihiro; Karypis, George: Finding frequent patterns in a large sparse graph (2005) ioport