Namespace Lucene.Net.Store
Binary i/o API, used for all index data.
Classes
BaseDirectory
Base implementation for a concrete Directory.
BufferedChecksumIndexInput
Simple implementation of ChecksumIndexInput that wraps another input and delegates calls.
BufferedIndexInput
Base implementation class for buffered IndexInput.
BufferedIndexOutput
Base implementation class for buffered IndexOutput.
ByteArrayDataInput
DataInput backed by a byte array. WARNING: this class omits all low-level checks.
ByteArrayDataOutput
DataOutput backed by a byte array. WARNING: this class omits most low-level checks, so be sure to test heavily with assertions enabled.
ByteBufferIndexInput
Base IndexInput implementation that uses an array of J2N.IO.ByteBuffers to represent a file.
Because Java's J2N.IO.ByteBuffer uses an System.Int32 to address the values, it's necessary to access a file greater System.Int32.MaxValue in size using multiple byte buffers.
For efficiency, this class requires that the buffers
are a power-of-two (chunkSizePower
).
ChecksumIndexInput
Extension of IndexInput, computing checksum as it goes. Callers can retrieve the checksum via Checksum.
CompoundFileDirectory
Class for accessing a compound stream. This class implements a directory, but is limited to only read operations. Directory methods that would normally modify data throw an exception.
All files belonging to a segment have the same name with varying extensions.
The extensions correspond to the different file formats used by the Codec.
When using the Compound File format these files are collapsed into a
single .cfs
file (except for the LiveDocsFormat, with a
corresponding .cfe
file indexing its sub-files.
Files:
.cfs
: An optional "virtual" file consisting of all the other index files for systems that frequently run out of file handles..cfe
: The "virtual" compound file's entry table holding all entries in the corresponding .cfs file.
Description:
- Compound (.cfs) --> Header, FileData FileCount
- Compound Entry Table (.cfe) --> Header, FileCount, <FileName, DataOffset, DataLength> FileCount, Footer
- Header --> WriteHeader(DataOutput, String, Int32)
- FileCount --> WriteVInt32(Int32)
- DataOffset,DataLength --> WriteInt64(Int64)
- FileName --> WriteString(String)
- FileData --> raw file data
- Footer --> WriteFooter(IndexOutput)
Notes:
- FileCount indicates how many files are contained in this compound file. The entry table that follows has that many entries.
- Each directory entry contains a long pointer to the start of this file's data section, the files length, and a System.String with that file's name.
CompoundFileDirectory.FileEntry
Offset/Length for a slice inside of a compound file
DataInput
Abstract base class for performing read operations of Lucene's low-level data types.
DataInput may only be used from one thread, because it is not thread safe (it keeps internal state like file position). To allow multithreaded use, every DataInput instance must be cloned before used in another thread. Subclasses must therefore implement Clone(), returning a new DataInput which operates on the same underlying resource, but positioned independently.
DataOutput
Abstract base class for performing write operations of Lucene's low-level data types.
DataOutput may only be used from one thread, because it is not thread safe (it keeps internal state like file position).
Directory
A Directory is a flat list of files. Files may be written once, when they are created. Once a file is created it may only be opened for read, or deleted. Random access is permitted both when reading and writing.
.NET's i/o APIs not used directly, but rather all i/o is through this API. This permits things such as:
- implementation of RAM-based indices;
- implementation indices stored in a database;
- implementation of an index as a single file;
Directory locking is implemented by an instance of LockFactory, and can be changed for each Directory instance using SetLockFactory(LockFactory).
Directory.IndexInputSlicer
Allows to create one or more sliced IndexInput instances from a single file handle. Some Directory implementations may be able to efficiently map slices of a file into memory when only certain parts of a file are required.
FileSwitchDirectory
Expert: A Directory instance that switches files between two other Directory instances.
Files with the specified extensions are placed in the primary directory; others are placed in the secondary directory. The provided ISet{string} must not change once passed to this class, and must allow multiple threads to call contains at once.
FilterDirectory
Directory implementation that delegates calls to another directory. This class can be used to add limitations on top of an existing Directory implementation such as rate limiting (RateLimitedDirectoryWrapper) or to add additional sanity checks for tests. However, if you plan to write your own Directory implementation, you should consider extending directly Directory or BaseDirectory rather than try to reuse functionality of existing Directorys by extending this class.
FlushInfo
A FlushInfo provides information required for a FLUSH context. It is used as part of an IOContext in case of FLUSH context.
FSDirectory
Base class for Directory implementations that store index files in the file system.
There are currently three core subclasses:
- SimpleFSDirectory is a straightforward implementation using System.IO.FileStream. However, it has poor concurrent performance (multiple threads will bottleneck) as it synchronizes when multiple threads read from the same file.
- NIOFSDirectory uses java.nio's FileChannel's positional io when reading to avoid synchronization when reading from the same file. Unfortunately, due to a Windows-only Sun JRE bug this is a poor choice for Windows, but on all other platforms this is the preferred choice. Applications using System.Threading.Thread.Interrupt or System.Threading.Tasks.Task`1 should use SimpleFSDirectory instead. See NIOFSDirectory java doc for details.
- MMapDirectory uses memory-mapped IO when
reading. This is a good choice if you have plenty
of virtual memory relative to your index size, eg
if you are running on a 64 bit runtime, or you are
running on a 32 bit runtime but your index sizes are
small enough to fit into the virtual memory space.
Applications using System.Threading.Thread.Interrupt or System.Threading.Tasks.Task should use SimpleFSDirectory instead. See MMapDirectory doc for details.
Unfortunately, because of system peculiarities, there is no single overall best implementation. Therefore, we've added the Open(String) method (or one of its overloads), to allow Lucene to choose the best FSDirectory implementation given your environment, and the known limitations of each implementation. For users who have no reason to prefer a specific implementation, it's best to simply use Open(String) (or one of its overloads). For all others, you should instantiate the desired implementation directly.
The locking implementation is by default NativeFSLockFactory, but can be changed by passing in a custom LockFactory instance.
FSDirectory.FSIndexOutput
Writes output with System.IO.FileStream.Write(System.Byte[],System.Int32,System.Int32)
FSLockFactory
Base class for file system based locking implementation.
IndexInput
Abstract base class for input from a file in a Directory. A random-access input stream. Used for all Lucene index input operations.
IndexInput may only be used from one thread, because it is not thread safe (it keeps internal state like file position). To allow multithreaded use, every IndexInput instance must be cloned before used in another thread. Subclasses must therefore implement Clone(), returning a new IndexInput which operates on the same underlying resource, but positioned independently. Lucene never closes cloned IndexInputs, it will only do this on the original one. The original instance must take care that cloned instances throw System.ObjectDisposedException when the original one is closed.
IndexOutput
Abstract base class for output to a file in a Directory. A random-access output stream. Used for all Lucene index output operations.
IndexOutput may only be used from one thread, because it is not thread safe (it keeps internal state like file position).
InputStreamDataInput
A DataInput wrapping a plain System.IO.Stream.
IOContext
Lock
An interprocess mutex lock.
Typical use might look like:
var result = Lock.With.NewAnonymous<string>(
@lock: directory.MakeLock("my.lock"),
lockWaitTimeout: Lock.LOCK_OBTAIN_WAIT_FOREVER,
doBody: () =>
{
//... code to execute while locked ...
return "the result";
}).Run();
Lock.With<T>
Utility class for executing code with exclusive access.
LockFactory
Base class for Locking implementation. Directory uses instances of this class to implement locking.
Lucene uses NativeFSLockFactory by default for FSDirectory-based index directories.
Special care needs to be taken if you change the locking implementation: First be certain that no writer is in fact writing to the index otherwise you can easily corrupt your index. Be sure to do the LockFactory change on all Lucene instances and clean up all leftover lock files before starting the new configuration for the first time. Different implementations can not work together!
If you suspect that some LockFactory implementation is not working properly in your environment, you can easily test it by using VerifyingLockFactory, LockVerifyServer and LockStressTest.
LockObtainFailedException
This exception is thrown when the write.lock
could not be acquired. This
happens when a writer tries to open an index
that another writer already has open.
LockReleaseFailedException
This exception is thrown when the write.lock
could not be released.
LockStressTest
Simple standalone tool that forever acquires & releases a lock using a specific LockFactory. Run without any args to see usage.
LockVerifyServer
Simple standalone server that must be running when you use VerifyingLockFactory. This server simply verifies at most one process holds the lock at a time. Run without any args to see usage.
MergeInfo
A MergeInfo provides information required for a MERGE context. It is used as part of an IOContext in case of MERGE context.
MMapDirectory
File-based Directory implementation that uses System.IO.MemoryMappedFiles.MemoryMappedFile for reading, and FSDirectory.FSIndexOutput for writing.
NOTE: memory mapping uses up a portion of the virtual memory address space in your process equal to the size of the file being mapped. Before using this class, be sure your have plenty of virtual address space, e.g. by using a 64 bit runtime, or a 32 bit runtime with indexes that are guaranteed to fit within the address space. On 32 bit platforms also consult MMapDirectory(DirectoryInfo, LockFactory, Int32) if you have problems with mmap failing because of fragmented address space. If you get an System.OutOfMemoryException, it is recommended to reduce the chunk size, until it works.
NOTE: Accessing this class either directly or indirectly from a thread while it's interrupted can close the underlying channel immediately if at the same time the thread is blocked on IO. The channel will remain closed and subsequent access to MMapDirectory will throw a System.ObjectDisposedException.
MMapDirectory.MMapIndexInput
NativeFSLockFactory
Implements LockFactory using native OS file locks. For NFS based access to an index, it's recommended that you try SimpleFSLockFactory first and work around the one limitation that a lock file could be left when the runtime exits abnormally.
The primary benefit of NativeFSLockFactory is that locks (not the lock file itsself) will be properly removed (by the OS) if the runtime has an abnormal exit.
Note that, unlike SimpleFSLockFactory, the existence of leftover lock files in the filesystem is fine because the OS will free the locks held against these files even though the files still remain. Lucene will never actively remove the lock files, so although you see them, the index may not be locked.
Special care needs to be taken if you change the locking implementation: First be certain that no writer is in fact writing to the index otherwise you can easily corrupt your index. Be sure to do the LockFactory change on all Lucene instances and clean up all leftover lock files before starting the new configuration for the first time. Different implementations can not work together!
If you suspect that this or any other LockFactory is not working properly in your environment, you can easily test it by using VerifyingLockFactory, LockVerifyServer and LockStressTest.
NIOFSDirectory
An FSDirectory implementation that uses System.IO.FileStream's positional read, which allows multiple threads to read from the same file without synchronizing.
This class only uses System.IO.FileStream when reading; writing is achieved with FSDirectory.FSIndexOutput.
NOTE: NIOFSDirectory is not recommended on Windows because of a bug in how FileChannel.read is implemented in Sun's JRE. Inside of the implementation the position is apparently synchronized. See here for details.
NOTE: Accessing this class either directly or indirectly from a thread while it's interrupted can close the underlying file descriptor immediately if at the same time the thread is blocked on IO. The file descriptor will remain closed and subsequent access to NIOFSDirectory will throw a System.ObjectDisposedException. If your application uses either System.Threading.Thread.Interrupt or System.Threading.Tasks.Task you should use SimpleFSDirectory in favor of NIOFSDirectory.
NIOFSDirectory.NIOFSIndexInput
Reads bytes with the Lucene.Net.Support.IO.StreamExtensions.Read(System.IO.Stream,J2N.IO.ByteBuffer,System.Int64) extension method for System.IO.Stream.
NoLockFactory
Use this LockFactory to disable locking entirely. Only one instance of this lock is created. You should call GetNoLockFactory() to get the instance.
NRTCachingDirectory
Wraps a RAMDirectory around any provided delegate directory, to be used during NRT search.
This class is likely only useful in a near-real-time context, where indexing rate is lowish but reopen rate is highish, resulting in many tiny files being written. This directory keeps such segments (as well as the segments produced by merging them, as long as they are small enough), in RAM.
This is safe to use: when your app calls Commit(), all cached files will be flushed from the cached and sync'd.
Here's a simple example usage:
Directory fsDir = FSDirectory.Open(new DirectoryInfo("/path/to/index"));
NRTCachingDirectory cachedFSDir = new NRTCachingDirectory(fsDir, 5.0, 60.0);
IndexWriterConfig conf = new IndexWriterConfig(Version.LUCENE_48, analyzer);
IndexWriter writer = new IndexWriter(cachedFSDir, conf);
This will cache all newly flushed segments, all merges whose expected segment size is <= 5 MB, unless the net cached bytes exceeds 60 MB at which point all writes will not be cached (until the net bytes falls below 60 MB).
OutputStreamDataOutput
A DataOutput wrapping a plain System.IO.Stream.
RAMDirectory
A memory-resident Directory implementation. Locking implementation is by default the SingleInstanceLockFactory but can be changed with SetLockFactory(LockFactory).
Warning: This class is not intended to work with huge indexes. Everything beyond several hundred megabytes will waste resources (GC cycles), because it uses an internal buffer size of 1024 bytes, producing millions of byte[1024] arrays. This class is optimized for small memory-resident indexes. It also has bad concurrency on multithreaded environments.
It is recommended to materialize large indexes on disk and use MMapDirectory, which is a high-performance directory implementation working directly on the file system cache of the operating system, so copying data to heap space is not useful.
RAMFile
Represents a file in RAM as a list of byte[] buffers.
RAMInputStream
A memory-resident IndexInput implementation.
RAMOutputStream
A memory-resident IndexOutput implementation.
RateLimitedDirectoryWrapper
A Directory wrapper that allows IndexOutput rate limiting using IO context (IOContext.UsageContext) specific rate limiters (RateLimiter).
RateLimiter
Abstract base class to rate limit IO. Typically implementations are shared across multiple IndexInputs or IndexOutputs (for example those involved all merging). Those IndexInputs and IndexOutputs would call Pause(Int64) whenever they want to read bytes or write bytes.
RateLimiter.SimpleRateLimiter
Simple class to rate limit IO.
SimpleFSDirectory
A straightforward implementation of FSDirectory using System.IO.FileStream. However, this class has poor concurrent performance (multiple threads will bottleneck) as it synchronizes when multiple threads read from the same file. It's usually better to use NIOFSDirectory or MMapDirectory instead.
SimpleFSDirectory.SimpleFSIndexInput
Reads bytes with System.IO.FileStream.Seek(System.Int64,System.IO.SeekOrigin) followed by System.IO.FileStream.Read(System.Byte[],System.Int32,System.Int32).
SimpleFSLockFactory
Implements LockFactory using System.IO.File.WriteAllText(System.String,System.String,System.Text.Encoding) (writes the file with UTF8 encoding and no byte order mark).
Special care needs to be taken if you change the locking implementation: First be certain that no writer is in fact writing to the index otherwise you can easily corrupt your index. Be sure to do the LockFactory change to all Lucene instances and clean up all leftover lock files before starting the new configuration for the first time. Different implementations can not work together!
If you suspect that this or any other LockFactory is not working properly in your environment, you can easily test it by using VerifyingLockFactory, LockVerifyServer and LockStressTest.
SingleInstanceLockFactory
Implements LockFactory for a single in-process instance, meaning all locking will take place through this one instance. Only use this LockFactory when you are certain all IndexReaders and IndexWriters for a given index are running against a single shared in-process Directory instance. This is currently the default locking for RAMDirectory.
TrackingDirectoryWrapper
A delegating Directory that records which files were written to and deleted.
VerifyingLockFactory
A LockFactory that wraps another LockFactory and verifies that each lock obtain/release is "correct" (never results in two processes holding the lock at the same time). It does this by contacting an external server (LockVerifyServer) to assert that at most one process holds the lock at a time. To use this, you should also run LockVerifyServer on the host & port matching what you pass to the constructor.
Enums
IOContext.UsageContext
IOContext.UsageContext is a enumeration which specifies the context in which the Directory is being used for.
NOTE: This was Context in Lucene