What?

This library aids in building and querying indexes on top of HBase, in Google App Engine datastore-style. This basically means querying by range-scanning on specifically constructed index tables.

The goal of this library is to hide the details of playing with HBase's byte[] row keys to construct indexes. Actually pushing data towards the index is not handled by this package.

This library is complementary to the tableindexed contrib module of HBase, which does not do much regarding row key construction, though at this time it was not specifically designed to work together.

Some background on the indexing approach in this package can be found in this blog post.

State

This code is experimental. It should not do any harm and probably works correctly, but we have not used this for real applications yet. It is just a first iteration which we wanted to share with you.

Features

Download

July 22, 2010: Subversion access

The latest HBase indexing library is now available from Lily's source tree. The downloads will no longer be maintained.

The project can be found in the hbasindex subdirectory of the Lily source tree, its dependencies can be found in the pom.xml.

To get & build the code, use:

svn checkout http://dev.outerthought.org/svn_public/outerthought_lilycms lily-trunk
mvn -Pfast install

Old downloads

The download  includes source code, binary builds and javadocs, all under the Apache License.

April 6, 2010 snapshot

Download (application/x-gzip, 1.2 MB, info)

Indexes created with the previous snapshot are not compatible with those created by this release.

New in this snapshot:

Februari 22, 2010 snapshot

Download (application/x-gzip, 1.9 MB, info)

(initial release)

Usage

Be sure to read the javadocs for the org.lilycms.hbaseindex package.

The code was developed against HBase trunk (= 0.21), but the 0.20-branch (post 0.20.3) should also work (the new feature we need is the BinaryPrefixComparator class).

To get started, you will need the following on the classpath:

The dependencies are included, except for hbase and hadoop-core, for which it is recommended to use the jars from the actual HBase/Hadoop version that you are using.

The below is a complete sample application showing how to create an index and query it.

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.TableNotFoundException;
import org.apache.hadoop.hbase.util.Bytes;
import org.lilycms.hbaseindex.*;

import java.net.URL;

public class Test {
    public static void main(String[] args) throws Exception {
        Configuration config = new Configuration();
        config.addResource(new URL("file:/path/to/hbase-0.21.0-dev/conf/hbase-default.xml"));
        config.addResource(new URL("file:/path/to/hbase-0.21.0-dev/conf/hbase-site.xml"));
        config.reloadConfiguration();

        final String META_TABLE_NAME = "indexmeta";

        //
        // Create the IndexManager.
        // Create the indexmeta table if it would not yet exist.
        //
        IndexManager indexManager = null;
        try {
            indexManager = new IndexManager(config, META_TABLE_NAME);
        } catch (TableNotFoundException e) {
            if (e.getMessage().contains(META_TABLE_NAME)) {
                IndexManager.createIndexMetaTable(config, META_TABLE_NAME);
                indexManager = new IndexManager(config, META_TABLE_NAME);
            } else {
                System.out.println(e.getMessage());
                System.exit(1);
            }
        }

        //
        // Delete the index if it would already exist.
        // This is just to make it easy to run this sample multiple times.
        //
        try {
            indexManager.getIndex("index1");
            indexManager.deleteIndex("index1");
        } catch (IndexNotFoundException e) {
            // ok, the index does not exit
        }


        //
        // Define an index
        //
        IndexDefinition indexDef = new IndexDefinition("index1");
        StringIndexFieldDefinition stringField = indexDef.addStringField("stringfield");
        stringField.setCaseSensitive(false);
        indexManager.createIndex(indexDef);

        //
        // Add entries to the index
        //
        Index index = indexManager.getIndex("index1");

        String[] values = {"bird", "brown", "bee", "ape", "dog", "cat"};

        for (String value : values) {
            IndexEntry entry = new IndexEntry();
            entry.addField("stringfield", value);
            index.addEntry(entry, Bytes.toBytes("id-" + value));
        }

        //
        // Query the index
        //
        Query query = new Query();
        query.setRangeCondition("stringfield", "b", "b");
        QueryResult result = index.performQuery(query);

        System.out.println("The identifiers of the matching index entries are:");
        byte[] identifier;
        while ((identifier = result.next()) != null) {
            System.out.println(Bytes.toString(identifier));
        }        
    }
}

Feedback

We are interested in hearing what you think of this library: feedback is welcome on the Lily mailing list.

Related

Other HBase indexing solutions: