TripleBit

Introduction:

TripleBit is a compact and fast engine for large scale RDF graph. We will briefly introduce its technologies in the following (please refer to our publication for more detailed information).

Compact Storage. We design a Triple Matrix model where RDF triples are represented as a two dimensional bit matrix. We call it the triple matrix. The triple matrix is created with subjects (S) and objects (O) as one dimension (row) and triples (T) as the other dimension (column). We group the columns according to predicates such that the triples with the same predicate will be adjacent in the matrix. For each RDF dataset, we store the triple matrix physically in two duplicates, one for S-O order and another for O-S order. We use chunks, fixed-size storage spaces to store a triple matrix. The chunks having the same predicates are placed adjacently on storage media.

Compact and Fast Indexing Technologies. Indexes include ID-Chunks index and aggregate index. ID-Chunk matrix represents the relationship between IDs (rows) and Chunks (columns). In TripleBit, we additionally build two binary aggregate indexes: SP and OP which are used to estimate the selectivity query patterns and determine the query plan.

Dynamic Query Plan Generation (DQPGA). We develop a two phase join processing framework, which iteratively refine the plan for the next execution after executing the plan. During query processing, TripleBit employs semi-joins and full joins respectively in different phases in order to reduce intermediate results.

Software:

Publication:

Useful Link:

Feedback:

If you have comments, questions, or suggestions regarding TripleBit, please email Pingpeng Yuan


This page has been accessed 1141 times since August 22, 2013.
Copyright © 2011-2013 by Massive Data Management Group @ SCTS & CGCL, HUST.