Graph Data Processing

Introduction:

Due to explosively growth in the number of highly connected datasets (social networks, web graphs, and protein interaction graphs) during the last years, graphs are becoming increasingly popular in both academia and industry. In the past years, we develop systems and algorithms for analyzing and managing extremely large graphs.

Graph Databases:

We develop TripleBit, a fast and compact system for large scale RDF graph. TripeBit is designed based on two important observations. First, it is important to design a storage structure that can directly and efficiently query the RDF graph. This motivates us to design a compact storage and index structure in TripleBit. Second, in order to truly scale the RDF query processor, we need efficient index structures and query evaluation algorithms to minimize the size of intermediate results generated when evaluating queries, especially complex join queries. This leads us to the design decision that we should not only reduce the size of indexes (e.g., through compression techniques) but also minimize the number of indexes used in query evaluation.

Based on TripleBit, we develop a semantic-preserving distributed RDF triple store for scalable SPARQL query processing. In SemStore, we design a partitioning algorithm for RDF data using Rooted Sub-Graph. Furthermore, a query decomposition algorithm and a query optimization strategy are used to minimize the communication cost for query execution.

Graph Processing System:

Large scale graph analysis applications typically involve datasets of massive scale. Most of existing approaches address the iterative graph computation problem by programming and executing graph computation using either vertex centric or edge centric approaches. We develop a path-centric graph processing system PathGraph for fast iterative computation on large graphs with billions of edges.

Our development is motivated by our observation that most of the iterative graph computation algorithms share three common processing requirements: (1) For each vertex to be processed, we need to examine all its outgoing or incoming edges and all of its neighbor vertices. (2) The iterative computation for each vertex will converge only after completing the traversal of the graph through the direct and indirect edges connecting to this vertex. (3) The iterative computation of a graph converges when all vertices have completed their iterative computation.

Software:

Publication:

Useful Link:

Feedback:

If you have comments, questions, or suggestions regarding TripleBit, SemStore or PathGraph, please email Pingpeng Yuan


This page has been accessed 2257 times since August 22, 2013.
Copyright © 2011-2013 by Massive Data Management Group @ SCTS & CGCL, HUST.