Lucene.Net
3.0.3
Lucene.Net is a .NET port of the Java Lucene Indexing Library
|
This class accepts multiple added documents and directly writes a single segment file. It does this more efficiently than creating a single segment per document (with DocumentWriter) and doing standard merges on those segments. More...
Inherits IDisposable.
Classes | |
class | AnonymousClassIndexingChain |
class | ByteBlockAllocator |
class | DocState |
class | DocWriter |
Consumer returns this on each doc. This holds any state that must be flushed synchronized "in docID
order". We gather these and flush them in order. | |
class | IndexingChain |
The IndexingChain must define the GetChain(DocumentsWriter) method which returns the DocConsumer that the DocumentsWriter calls to process the documents. | |
class | PerDocBuffer |
class | SkipDocWriter |
class | WaitQueue |
Public Member Functions | |
void | Dispose () |
Properties | |
static int | BYTE_BLOCK_SIZE_ForNUnit [get] |
static int | CHAR_BLOCK_SIZE_ForNUnit [get] |
This class accepts multiple added documents and directly writes a single segment file. It does this more efficiently than creating a single segment per document (with DocumentWriter) and doing standard merges on those segments.
Each added document is passed to the DocConsumer, which in turn processes the document and interacts with other consumers in the indexing chain. Certain consumers, like StoredFieldsWriter and TermVectorsTermsWriter , digest a document and immediately write bytes to the "doc store" files (ie, they do not consume RAM per document, except while they are processing the document).
Other consumers, eg FreqProxTermsWriter and NormsWriter, buffer bytes in RAM and flush only when a new segment is produced. Once we have used our allowed RAM buffer, or the number of added docs is large enough (in the case we are flushing by doc count instead of RAM usage), we create a real segment and flush it to the Directory.
Threads:
Multiple threads are allowed into addDocument at once. There is an initial synchronized call to getThreadState which allocates a ThreadState for this thread. The same thread will get the same ThreadState over time (thread affinity) so that if there are consistent patterns (for example each thread is indexing a different content source) then we make better use of RAM. Then processDocument is called on that ThreadState without synchronization (most of the "heavy lifting" is in this call). Finally the synchronized "finishDocument" is called to flush changes to the directory.
When flush is called by IndexWriter we forcefully idle all threads and flush only once they are all idle. This means you can call flush with a given thread even while other threads are actively adding/deleting documents.
Exceptions:
Because this class directly updates in-memory posting lists, and flushes stored fields and term vectors directly to files in the directory, there are certain limited times when an exception can corrupt this state. For example, a disk full while flushing stored fields leaves this file in a corrupt state. Or, an OOM exception while appending to the in-memory posting lists can corrupt that posting list. We call such exceptions "aborting exceptions". In these cases we must call abort() to discard all docs added since the last flush.
All other exceptions ("non-aborting exceptions") can still partially update the index structures. These updates are consistent, but, they represent only a part of the document seen up until the exception was hit. When this happens, we immediately mark the document as deleted so that the document is always atomically ("all or none") added to the index.
Definition at line 103 of file DocumentsWriter.cs.
void Lucene.Net.Index.DocumentsWriter.Dispose | ( | ) |
Definition at line 868 of file DocumentsWriter.cs.
|
staticget |
Definition at line 2066 of file DocumentsWriter.cs.
|
staticget |
Definition at line 2071 of file DocumentsWriter.cs.