Project description

This project allows you to create Lucene Indexes via a Lucene Directory object which uses Windows Azure BlobStorage for persistent storage.


About

This project allows you to create Lucene Indexes via a Lucene Directory object which uses Windows Azure BlobStorage for persistent storage.

Background

Lucene.NET

Lucene is a mature Java based open source full text indexing and search engine and property store.
Lucene.NET is a mature port of that to C#
Lucene provides:

  • Super simple API for storing documents with arbitrary properties
  • Complete control over what is indexed and what is stored for retrieval
  • Robust control over where and how things are indexed, how much memory is used, etc.
  • Superfast and super rich query capabilities
    • Sorted results
    • Rich constraint semantics AND/OR/NOT etc.
    • Rich text semantics (phrase match, wildcard match, near, fuzzy match etc)
    • Text query syntax (example: Title:(dog AND cat) OR Body:Lucen* )
    • Programmatic expressions
    • Ranked results with custom ranking algorithms

 

AzureDirectory

AzureDirectory smartly uses local file storage to cache files as they are created and automatically pushes them to blobstorage as appropriate. Likewise, it smartly caches blob files back to the a client when they change. This provides with a nice blend of just in time syncing of data local to indexers or searchers across multiple machines.

With the flexibility that Lucene provides over data in memory versus storage and the just in time blob transfer that AzureDirectory provides you have great control over the composibility of where data is indexed and how it is consumed.

To be more concrete: you can have 1..N worker roles adding documents to an index, and 1..N searcher webroles searching over the catalog in near real time.

Version History

Version 2.0
  • Updated to use > Lucene.NET 3.0 and Azure Storage Library 2.0 (thanks richorama)
 
Version 1.0.5
  •  Replaced existing of blob lock file with blob leases to prevent orphaned lock files from happening
Version 1.0.4
  • Replaced mutx with BlobMutexManager to solve local mutex permissions

Thanks to Andy Hitchman for the bug fixes

Version 1.0.3
  • Added a call to persist the CachedLength and CachedLastModified metadata properties to the blob (not included in the content upload).
  • AzureDirectory.FileLength was using the actual blob length rather than the CachedLength property. The latest version of lucene checks the length after closing an index to verify that its correct and was throwing an exception for compressed blobs.
  • Non-compressed blobs were not being uploaded
  • Updated the AzureDirectory constructor to use a CloudStorageAccount rather than the StorageCredentialsAccountAndKey so its possible to use the Development store for testing
  • works with Lucene.NET 2.9.2
thanks to Joel Fillmore for the bug fixes

Version 1.0.2
  • updated to use Azure SDK 1.2

Version 1.0.1
  • rewritten to use V1.1 of Azure SDK and the azure storage client
  • released to MSDN Code Gallery under the MS-PL license.

Version 1.0
  • Initial release- written for V1.0 CTP of Azure using the sample storage lib
  • Released under restrictive MSR license on http://research.microsoft.com

 

Related

There is a LINQ to Lucene provider http://linqtolucene.codeplex.com/Wiki/View.aspx?title=Project%20Documentation on codeplex which allows you to define your schema as a strongly typed object and execute LINQ expressions against the index.

Last edited Jul 10, 2013 at 7:50 PM by thermous, version 6