Lucene indexes get corrupted when we restart webrole

Feb 11, 2013 at 6:30 AM
Edited Feb 11, 2013 at 6:32 AM
I have already asked this question elsewhere (http://social.msdn.microsoft.com/Forums/en-US/windowsazuredata/thread/15949323-844d-491c-ba25-15570a414f00), but I am cross-posting so that I get a more focussed audience here:

We are using Lucene.NET in our project and using it through the AzureDirectory library.

We have a single webrole and a single worker role. The index is created and updated via a worker role thread. We search from the webrole by creating an IndexSearcher. Now the issue that I am facing is - when we upgrade the cspkg using the management console to upgrade the bits on the prodn server, the lucene index that's been created suddenly stops working. We get an error like:

File _2c.fdt not found (FileNotFoundException)

at Lucene.Net.Index.SegmentInfos.FindSegmentsFile.Run() in C:\Dev\code\Lucene.Net\Index\SegmentInfos.cs:line 741
at Lucene.Net.Index.DirectoryIndexReader.Open(Directory directory, Boolean closeDirectory, IndexDeletionPolicy deletionPolicy) in C:\Dev\code\Lucene.Net\Index\DirectoryIndexReader.cs:line 140
at Lucene.Net.Index.IndexReader.Open(Directory directory, Boolean closeDirectory, IndexDeletionPolicy deletionPolicy) in C:\Dev\code\Lucene.Net\Index\IndexReader.cs:line 257
at Lucene.Net.Index.IndexReader.Open(Directory directory) in C:\Dev\code\Lucene.Net\Index\IndexReader.cs:line 236
at Lucene.Net.Search.IndexSearcher..ctor(Directory directory) in C:\Dev\code\Lucene.Net\Search\IndexSearcher.cs:line 91
However, when I check back in the lucene blob container, the specific .fdt file does exist. Infact the search was working perfectly fine just before the upgrade. I even made sure that both the webrole as well as worker roles are shutdown before i upgrade the bits (just to be sure that the index is not getting updated while the upgrade happens) - but that also resulted in such a corruption.

Note that I am sure AzureDirectory with RAMDirectory as a cache.

Worker role code piece:
    public static void CreateNewEntities(List<string> smids)
    {
        AzureDirectory azureDirectory = GetAzureDir();
        IndexWriter indexWriter = new IndexWriter(azureDirectory, CommonAnalyzer.getAnalyzer());
        indexWriter.SetUseCompoundFile(false);

        foreach (string smid in smids)
        {
            List<Document> docs = GetDocs(smid);
            foreach (Document d in docs)
            {
                indexWriter.AddDocument(d);
            }
        }

        indexWriter.Close();
    }

    public static void EditEntityInIndex(List<string> smids)
    {
        // delete this surfmark from the index, and recreate the same
        AzureDirectory azureDirectory = GetAzureDir();
        IndexWriter indexWriter = new IndexWriter(azureDirectory, CommonAnalyzer.getAnalyzer());
        indexWriter.SetUseCompoundFile(false);

        foreach (string smid in smids)
        {
            indexWriter.DeleteDocuments(new Term(IndexingFields.ID, smid));
            List<Document> docs = GetDocs(smid);
            foreach (Document d in docs)
            {
                indexWriter.AddDocument(d);
            }
        }
        indexWriter.Flush();
        indexWriter.Close();
    }
Web Role code piece (for searching):
    public static IndexSearcher GetIndexSearcher()
    {//Method to get the indexsearcher obj which is refreshed every 10 mins
        long ctime = DateTime.Now.Ticks/TimeSpan.TicksPerMillisecond;
        if (_srchr == null || ctime - _srchrTime > 600000)  // refresh every 10 mins
        {
        _srchr = new IndexSearcher(GetAzureDir());
        _srchrTime = DateTime.Now.Ticks/TimeSpan.TicksPerMillisecond;

        }

        return _srchr;
    }

        string[] fields = { /*list of fields to be searched on*/};
        IndexSearcher searcher = GetIndexSearcher();
        Hits hits = searcher.Search(mainQuery);
Can someone please help out here?

Thanks

Kapil
Coordinator
Jul 8, 2013 at 10:55 PM
Hmm, make sure you call Flush() or Dispose() on the writer when the role is being restarted. BTW: I just updated new version which should work better with RAMDirectory() objects