Namespace Lucene.Net.Store
Binary i/o API, used for all index data.
Classes
BaseDirectory
Base implementation for a concrete Directory.
Note
This API is experimental and might change in incompatible ways in the next release.
BufferedChecksumIndexInput
Simple implementation of ChecksumIndexInput that wraps another input and delegates calls.
BufferedIndexInput
Base implementation class for buffered IndexInput.
BufferedIndexOutput
Base implementation class for buffered IndexOutput.
ByteArrayDataInput
DataInput backed by a byte array. WARNING: this class omits all low-level checks.
Note
This API is experimental and might change in incompatible ways in the next release.
ByteArrayDataOutput
DataOutput backed by a byte array. WARNING: this class omits most low-level checks, so be sure to test heavily with assertions enabled.
Note
This API is experimental and might change in incompatible ways in the next release.
ByteBufferIndexInput
Base IndexInput implementation that uses an array of J2N.IO.ByteBuffers to represent a file.
Because Java's J2N.IO.ByteBuffer uses an System.Int32 to address the values, it's necessary to access a file greater System.Int32.MaxValue in size using multiple byte buffers.
For efficiency, this class requires that the buffers
are a power-of-two (chunkSizePower
).
ChecksumIndexInput
Extension of IndexInput, computing checksum as it goes. Callers can retrieve the checksum via Checksum.
CompoundFileDirectory
Class for accessing a compound stream. This class implements a directory, but is limited to only read operations. Directory methods that would normally modify data throw an exception.
All files belonging to a segment have the same name with varying extensions.
The extensions correspond to the different file formats used by the Codec.
When using the Compound File format these files are collapsed into a
single .cfs
file (except for the LiveDocsFormat, with a
corresponding .cfe
file indexing its sub-files.
Files:
.cfs
: An optional "virtual" file consisting of all the other index files for systems that frequently run out of file handles..cfe
: The "virtual" compound file's entry table holding all entries in the corresponding .cfs file.
Description:
- Compound (.cfs) --> Header, FileData FileCount
- Compound Entry Table (.cfe) --> Header, FileCount, <FileName, DataOffset, DataLength> FileCount, Footer
- Header --> WriteHeader(DataOutput, String, Int32)
- FileCount --> WriteVInt32(Int32)
- DataOffset,DataLength --> WriteInt64(Int64)
- FileName --> WriteString(String)
- FileData --> raw file data
- Footer --> WriteFooter(IndexOutput)
Notes:
- FileCount indicates how many files are contained in this compound file. The entry table that follows has that many entries.
- Each directory entry contains a long pointer to the start of this file's data section, the files length, and a System.String with that file's name.
Note
This API is experimental and might change in incompatible ways in the next release.
CompoundFileDirectory.FileEntry
Offset/Length for a slice inside of a compound file
DataInput
Abstract base class for performing read operations of Lucene's low-level data types.
DataInput may only be used from one thread, because it is not thread safe (it keeps internal state like file position). To allow multithreaded use, every DataInput instance must be cloned before used in another thread. Subclasses must therefore implement Clone(), returning a new DataInput which operates on the same underlying resource, but positioned independently.
DataOutput
Abstract base class for performing write operations of Lucene's low-level data types.
DataOutput may only be used from one thread, because it is not thread safe (it keeps internal state like file position).
Directory
A Directory is a flat list of files. Files may be written once, when they are created. Once a file is created it may only be opened for read, or deleted. Random access is permitted both when reading and writing.
.NET's i/o APIs not used directly, but rather all i/o is through this API. This permits things such as:
- implementation of RAM-based indices;
- implementation indices stored in a database;
- implementation of an index as a single file;
Directory locking is implemented by an instance of LockFactory, and can be changed for each Directory instance using SetLockFactory(LockFactory).
Directory.IndexInputSlicer
Allows to create one or more sliced IndexInput instances from a single file handle. Some Directory implementations may be able to efficiently map slices of a file into memory when only certain parts of a file are required.
Note
This API is for internal purposes only and might change in incompatible ways in the next release.
Note
This API is experimental and might change in incompatible ways in the next release.
FileSwitchDirectory
Expert: A Directory instance that switches files between two other Directory instances.
Files with the specified extensions are placed in the primary directory; others are placed in the secondary directory. The provided ISet{string} must not change once passed to this class, and must allow multiple threads to call contains at once.
Note
This API is experimental and might change in incompatible ways in the next release.
FilterDirectory
Directory implementation that delegates calls to another directory. This class can be used to add limitations on top of an existing Directory implementation such as rate limiting (RateLimitedDirectoryWrapper) or to add additional sanity checks for tests. However, if you plan to write your own Directory implementation, you should consider extending directly Directory or BaseDirectory rather than try to reuse functionality of existing Directorys by extending this class.
Note
This API is for internal purposes only and might change in incompatible ways in the next release.
FlushInfo
A FlushInfo provides information required for a FLUSH context. It is used as part of an IOContext in case of FLUSH context.
FSDirectory
Base class for Directory implementations that store index files in the file system.
There are currently three core subclasses:
- SimpleFSDirectory is a straightforward implementation using System.IO.FileStream, which is ideal for writing without using much RAM. However, it has poor concurrent performance (multiple threads will bottleneck) as it synchronizes when multiple threads read from the same file.
- NIOFSDirectory uses System.IO.FileStream's positional seeking, which makes it slightly less efficient than using SimpleFSDirectory during reading, with similar write performance.
- MMapDirectory uses memory-mapped IO when reading. This is a good choice if you have plenty of virtual memory relative to your index size, eg if you are running on a 64 bit runtime, or you are running on a 32 bit runtime but your index sizes are small enough to fit into the virtual memory space.
Unfortunately, because of system peculiarities, there is no single overall best implementation. Therefore, we've added the Open(String) method (or one of its overloads), to allow Lucene to choose the best FSDirectory implementation given your environment, and the known limitations of each implementation. For users who have no reason to prefer a specific implementation, it's best to simply use Open(String) (or one of its overloads). For all others, you should instantiate the desired implementation directly.
The locking implementation is by default NativeFSLockFactory, but can be changed by passing in a custom LockFactory instance.
NOTE: Unlike in Java, it is not recommended to use
System.Threading.Thread.Interrupt in .NET
in conjunction with an open FSDirectory because it is not guaranteed to exit atomically.
Any lock
statement or System.Threading.Monitor.Enter(System.Object) call can throw a
System.Threading.ThreadInterruptedException, which makes shutting down unpredictable.
To exit parallel tasks safely, we recommend using System.Threading.Tasks.Tasks
and "interrupt" them with System.Threading.CancellationTokens.
FSDirectory.FSIndexOutput
Writes output with System.IO.FileStream.Write(System.Byte[],System.Int32,System.Int32)
FSLockFactory
Base class for file system based locking implementation.
IndexInput
Abstract base class for input from a file in a Directory. A random-access input stream. Used for all Lucene index input operations.
IndexInput may only be used from one thread, because it is not thread safe (it keeps internal state like file position). To allow multithreaded use, every IndexInput instance must be cloned before used in another thread. Subclasses must therefore implement Clone(), returning a new IndexInput which operates on the same underlying resource, but positioned independently. Lucene never closes cloned IndexInputs, it will only do this on the original one. The original instance must take care that cloned instances throw System.ObjectDisposedException when the original one is closed.
IndexInputExtensions
IndexOutput
Abstract base class for output to a file in a Directory. A random-access output stream. Used for all Lucene index output operations.
IndexOutput may only be used from one thread, because it is not thread safe (it keeps internal state like file position).
IndexOutputExtensions
InputStreamDataInput
A DataInput wrapping a plain System.IO.Stream.
IOContext
Lock
An interprocess mutex lock.
Typical use might look like:
var result = Lock.With.NewAnonymous<string>(
@lock: directory.MakeLock("my.lock"),
lockWaitTimeout: Lock.LOCK_OBTAIN_WAIT_FOREVER,
doBody: () =>
{
//... code to execute while locked ...
return "the result";
}).Run();
Lock.With<T>
Utility class for executing code with exclusive access.
LockFactory
Base class for Locking implementation. Directory uses instances of this class to implement locking.
Lucene uses NativeFSLockFactory by default for FSDirectory-based index directories.
Special care needs to be taken if you change the locking implementation: First be certain that no writer is in fact writing to the index otherwise you can easily corrupt your index. Be sure to do the LockFactory change on all Lucene instances and clean up all leftover lock files before starting the new configuration for the first time. Different implementations can not work together!
If you suspect that some LockFactory implementation is not working properly in your environment, you can easily test it by using VerifyingLockFactory, LockVerifyServer and LockStressTest.
LockObtainFailedException
This exception is thrown when the write.lock
could not be acquired. This
happens when a writer tries to open an index
that another writer already has open.
LockReleaseFailedException
This exception is thrown when the write.lock
could not be released.
LockStressTest
Simple standalone tool that forever acquires & releases a lock using a specific LockFactory. Run without any args to see usage.
LockVerifyServer
Simple standalone server that must be running when you use VerifyingLockFactory. This server simply verifies at most one process holds the lock at a time. Run without any args to see usage.
MergeInfo
A MergeInfo provides information required for a MERGE context. It is used as part of an IOContext in case of MERGE context.
MMapDirectory
File-based Directory implementation that uses System.IO.MemoryMappedFiles.MemoryMappedFile for reading, and FSDirectory.FSIndexOutput for writing.
NOTE: memory mapping uses up a portion of the virtual memory address space in your process equal to the size of the file being mapped. Before using this class, be sure your have plenty of virtual address space, e.g. by using a 64 bit runtime, or a 32 bit runtime with indexes that are guaranteed to fit within the address space. On 32 bit platforms also consult MMapDirectory(DirectoryInfo, LockFactory, Int32) if you have problems with mmap failing because of fragmented address space. If you get an System.OutOfMemoryException, it is recommended to reduce the chunk size, until it works.
NOTE: Unlike in Java, it is not recommended to use
System.Threading.Thread.Interrupt in .NET
in conjunction with an open FSDirectory because it is not guaranteed to exit atomically.
Any lock
statement or System.Threading.Monitor.Enter(System.Object) call can throw a
System.Threading.ThreadInterruptedException, which makes shutting down unpredictable.
To exit parallel tasks safely, we recommend using System.Threading.Tasks.Tasks
and "interrupt" them with System.Threading.CancellationTokens.
MMapDirectory.MMapIndexInput
NativeFSLockFactory
Implements LockFactory using native OS file locks. For NFS based access to an index, it's recommended that you try SimpleFSLockFactory first and work around the one limitation that a lock file could be left when the runtime exits abnormally.
The primary benefit of NativeFSLockFactory is that locks (not the lock file itsself) will be properly removed (by the OS) if the runtime has an abnormal exit.
Note that, unlike SimpleFSLockFactory, the existence of leftover lock files in the filesystem is fine because the OS will free the locks held against these files even though the files still remain. Lucene will never actively remove the lock files, so although you see them, the index may not be locked.
Special care needs to be taken if you change the locking implementation: First be certain that no writer is in fact writing to the index otherwise you can easily corrupt your index. Be sure to do the LockFactory change on all Lucene instances and clean up all leftover lock files before starting the new configuration for the first time. Different implementations can not work together!
If you suspect that this or any other LockFactory is not working properly in your environment, you can easily test it by using VerifyingLockFactory, LockVerifyServer and LockStressTest.
NIOFSDirectory
An FSDirectory implementation that uses System.IO.FileStream's positional read, which allows multiple threads to read from the same file without synchronizing.
This class only uses System.IO.FileStream when reading; writing is achieved with FSDirectory.FSIndexOutput.
NOTE: Since the .NET NIOFSDirectory uses additional seeking during reads, it will generally be slightly less efficient than SimpleFSDirectory. This class has poor concurrent read performance (multiple threads will bottleneck) as it synchronizes when multiple threads read from the same file. It's usually better to use MMapDirectory for reading.
NOTE: Unlike in Java, it is not recommended to use
System.Threading.Thread.Interrupt in .NET
in conjunction with an open FSDirectory because it is not guaranteed to exit atomically.
Any lock
statement or System.Threading.Monitor.Enter(System.Object) call can throw a
System.Threading.ThreadInterruptedException, which makes shutting down unpredictable.
To exit parallel tasks safely, we recommend using System.Threading.Tasks.Tasks
and "interrupt" them with System.Threading.CancellationTokens.
NIOFSDirectory.NIOFSIndexInput
Reads bytes with the Lucene.Net.Support.IO.StreamExtensions.Read(System.IO.Stream,J2N.IO.ByteBuffer,System.Int64) extension method for System.IO.Stream.
NoLockFactory
Use this LockFactory to disable locking entirely. Only one instance of this lock is created. You should call GetNoLockFactory() to get the instance.
NRTCachingDirectory
Wraps a RAMDirectory around any provided delegate directory, to be used during NRT search.
This class is likely only useful in a near-real-time context, where indexing rate is lowish but reopen rate is highish, resulting in many tiny files being written. This directory keeps such segments (as well as the segments produced by merging them, as long as they are small enough), in RAM.
This is safe to use: when your app calls Commit(), all cached files will be flushed from the cached and sync'd.
Here's a simple example usage:
Directory fsDir = FSDirectory.Open(new DirectoryInfo("/path/to/index"));
NRTCachingDirectory cachedFSDir = new NRTCachingDirectory(fsDir, 5.0, 60.0);
IndexWriterConfig conf = new IndexWriterConfig(Version.LUCENE_48, analyzer);
IndexWriter writer = new IndexWriter(cachedFSDir, conf);
This will cache all newly flushed segments, all merges whose expected segment size is <= 5 MB, unless the net cached bytes exceeds 60 MB at which point all writes will not be cached (until the net bytes falls below 60 MB).
Note
This API is experimental and might change in incompatible ways in the next release.
OutputStreamDataOutput
A DataOutput wrapping a plain System.IO.Stream.
RAMDirectory
A memory-resident Directory implementation. Locking implementation is by default the SingleInstanceLockFactory but can be changed with SetLockFactory(LockFactory).
Warning: This class is not intended to work with huge indexes. Everything beyond several hundred megabytes will waste resources (GC cycles), because it uses an internal buffer size of 1024 bytes, producing millions of byte[1024] arrays. This class is optimized for small memory-resident indexes. It also has bad concurrency on multithreaded environments.
It is recommended to materialize large indexes on disk and use MMapDirectory, which is a high-performance directory implementation working directly on the file system cache of the operating system, so copying data to heap space is not useful.
RAMFile
Represents a file in RAM as a list of byte[] buffers.
Note
This API is for internal purposes only and might change in incompatible ways in the next release.
RAMInputStream
A memory-resident IndexInput implementation.
Note
This API is for internal purposes only and might change in incompatible ways in the next release.
RAMOutputStream
A memory-resident IndexOutput implementation.
Note
This API is for internal purposes only and might change in incompatible ways in the next release.
RateLimitedDirectoryWrapper
A Directory wrapper that allows IndexOutput rate limiting using IO context (IOContext.UsageContext) specific rate limiters (RateLimiter).
Note
This API is experimental and might change in incompatible ways in the next release.
RateLimiter
Abstract base class to rate limit IO. Typically implementations are shared across multiple IndexInputs or IndexOutputs (for example those involved all merging). Those IndexInputs and IndexOutputs would call Pause(Int64) whenever they want to read bytes or write bytes.
RateLimiter.SimpleRateLimiter
Simple class to rate limit IO.
SimpleFSDirectory
A straightforward implementation of FSDirectory using System.IO.FileStream.
FSDirectory is ideal for use cases where efficient writing is required without utilizing too much RAM. However, reading is less efficient than when using MMapDirectory. This class has poor concurrent read performance (multiple threads will bottleneck) as it synchronizes when multiple threads read from the same file. It's usually better to use MMapDirectory for reading.
NOTE: Unlike in Java, it is not recommended to use
System.Threading.Thread.Interrupt in .NET
in conjunction with an open FSDirectory because it is not guaranteed to exit atomically.
Any lock
statement or System.Threading.Monitor.Enter(System.Object) call can throw a
System.Threading.ThreadInterruptedException, which makes shutting down unpredictable.
To exit parallel tasks safely, we recommend using System.Threading.Tasks.Tasks
and "interrupt" them with System.Threading.CancellationTokens.
SimpleFSDirectory.SimpleFSIndexInput
Reads bytes with System.IO.FileStream.Seek(System.Int64,System.IO.SeekOrigin) followed by System.IO.FileStream.Read(System.Byte[],System.Int32,System.Int32).
SimpleFSLockFactory
Implements LockFactory using System.IO.File.WriteAllText(System.String,System.String,System.Text.Encoding) (writes the file with UTF8 encoding and no byte order mark).
Special care needs to be taken if you change the locking implementation: First be certain that no writer is in fact writing to the index otherwise you can easily corrupt your index. Be sure to do the LockFactory change to all Lucene instances and clean up all leftover lock files before starting the new configuration for the first time. Different implementations can not work together!
If you suspect that this or any other LockFactory is not working properly in your environment, you can easily test it by using VerifyingLockFactory, LockVerifyServer and LockStressTest.
SingleInstanceLockFactory
Implements LockFactory for a single in-process instance, meaning all locking will take place through this one instance. Only use this LockFactory when you are certain all IndexReaders and IndexWriters for a given index are running against a single shared in-process Directory instance. This is currently the default locking for RAMDirectory.
TrackingDirectoryWrapper
A delegating Directory that records which files were written to and deleted.
VerifyingLockFactory
A LockFactory that wraps another LockFactory and verifies that each lock obtain/release is "correct" (never results in two processes holding the lock at the same time). It does this by contacting an external server (LockVerifyServer) to assert that at most one process holds the lock at a time. To use this, you should also run LockVerifyServer on the host & port matching what you pass to the constructor.
Enums
IOContext.UsageContext
IOContext.UsageContext is a enumeration which specifies the context in which the Directory is being used for.
NOTE: This was Context in Lucene