This project is read-only.
2

Closed

File last modified time is incompatible between FSDirectory and RAMDirectory

description

Sometimes it's good to be able to create a RAMDirectory or FSDirectory interchangeably and agnostic to what Directory created the index in the first place. Unfortunately there's a bug which makes this a little tricky. It results in an error message a bit like
Cannot overwrite: _0.fdt
In AzureIndexInput.cs is gets the last modified blob timestamp from the blob meta data "CachedLastModified" and was populated on index creation.

If you create an index with index writer and RAM directory, the timestamp will be number of milliseconds up to 1970 PLUS number of milliseconds since 1970.

If you then read that index with an FSDirectory using IndexSearcher, then the constructor of AzureIndexInput will cache the blob locally, as you'd expect; however, subsequently, it'll check whether the blob has changed on blob storage comparing it to the locally cached file.

However, the FSDirectory.FileModified method will return the standard epoch time of milliseconds since 1970.

RAMDirectory.FileModified will return milliseconds elapsed since zero.

This means AzureIndexInput..ctor is comparing timestamps that are incompatible with each other.

The fix is to simply convert the timestamps to be compatible with each other.
Here's the fix starting at line 55
long cachedLength = CacheDirectory.FileLength(fileName);
                    long blobLength = blob.Properties.Length;
                    long.TryParse(blob.Metadata["CachedLength"], out blobLength);

                    long longLastModified = 0;
                    DateTime blobLastModifiedUTC = blob.Properties.LastModified.Value.UtcDateTime;
                    if (long.TryParse(blob.Metadata["CachedLastModified"], out longLastModified))
                    {
                        if (longLastModified > 62135596800000) longLastModified -= 62135596800000; 
                        blobLastModifiedUTC = new DateTime(longLastModified).ToUniversalTime();
                    }

                    if (cachedLength != blobLength)
                        fFileNeeded = true;
                    else
                    {
                        // there seems to be an error of 1 tick which happens every once in a while 
                        // for now we will say that if they are within 1 tick of each other and same length 

                        var elapsed = CacheDirectory.FileModified(fileName);
                        if (elapsed > 62135596800000) elapsed -= 62135596800000; 

                        DateTime cachedLastModifiedUTC = new DateTime(elapsed, DateTimeKind.Local).ToUniversalTime();
                        if (cachedLastModifiedUTC != blobLastModifiedUTC)
                        {
                            TimeSpan timeSpan = blobLastModifiedUTC.Subtract(cachedLastModifiedUTC);
                            if (timeSpan.TotalSeconds > 1)
                                fFileNeeded = true;
                            else
                            {
#if FULLDEBUG
                                Debug.WriteLine(timeSpan.TotalSeconds);
#endif
                                // file not needed
                            }
                        }
                    }
The difference between the timestamp calculations are shown in the Lucene.net source:
RAMDirectory,.cs
/// <summary>Returns the time the named file was last modified.</summary>
        /// <throws>  IOException if the file does not exist </throws>
        public override long FileModified(System.String name)
        {
            EnsureOpen();
            RAMFile file;
            lock (this)
            {
                file = fileMap[name];
            }
            if (file == null)
                throw new System.IO.FileNotFoundException(name);
            
            // RAMOutputStream.Flush() was changed to use DateTime.UtcNow.
            // Convert it back to local time before returning (previous behavior)
            return new DateTime(file.LastModified*TimeSpan.TicksPerMillisecond, DateTimeKind.Utc).ToLocalTime().Ticks/
                   TimeSpan.TicksPerMillisecond;
        }
___versus....

FSDirectory.cs
public override long FileModified(System.String name)
        {
            EnsureOpen();
            System.IO.FileInfo file = new System.IO.FileInfo(System.IO.Path.Combine(internalDirectory.FullName, name));
            return (long)file.LastWriteTime.ToUniversalTime().Subtract(new DateTime(1970, 1, 1, 0, 0, 0)).TotalMilliseconds; //{{LUCENENET-353}}
        }
I hope this helps
Disclaimer: This is a v quick solution which I have tested and works for my implementation, however, I recommend thorough testing anyway.

Many thanks
Kris
Closed Jul 8, 2013 at 11:54 PM by thermous
I adopted your changes with version 2.0.4937.26631

comments