Lily is based on Apache HBase and SOLR. We spent a long time thinking about these choices as they're fundamental as to how Lily will shape up. Here's what we learned.

Selecting a storage solution: HBase

In selecting a storage technology for Lily, we were initially looking for:

Deepening our understanding of the different available NoSQL options, we learned we were also looking for:

By finally choosing HBase, we also got:

Selecting a search solution: SOLR

For search, the choice for Lucene as core technology was pretty much a given. In Daisy, our previous CMS, we used Lucene only for full-text search and performed structural searches on the SQL database. We merged the results from those two different search technologies on the fly, supporting mixed structural and full-text queries. However, this merging, combined with other high-level features of Daisy, was not designed to handle very large data sets. For Lily, we decided that a better approach would be to perform all searching using one technology, Lucene.

A downside to Lucene is that index updates are only visible with some delay to searchers, though work is ongoing to improve this. At its heart it is a text-search library, though with its fielded documents and the trie-range queries, it handles more data-oriented queries quite well.

Lucene in itself is a library, not a standalone application, nor a scalable search solution. But all this can be built on top. The best known standalone search server on top of Lucene is SOLR, which we decided to use in Lily.

But before we made that choice, we considered a lot of the available options: