Lucene.Net  3.0.3
Lucene.Net is a .NET port of the Java Lucene Indexing Library
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Properties
Classes | Public Member Functions | Properties | List of all members
Lucene.Net.Index.DocumentsWriter Class Reference

This class accepts multiple added documents and directly writes a single segment file. It does this more efficiently than creating a single segment per document (with DocumentWriter) and doing standard merges on those segments. More...

Inherits IDisposable.

Classes

class  AnonymousClassIndexingChain
 
class  ByteBlockAllocator
 
class  DocState
 
class  DocWriter
 Consumer returns this on each doc. This holds any state that must be flushed synchronized "in docID order". We gather these and flush them in order.
 
class  IndexingChain
 The IndexingChain must define the GetChain(DocumentsWriter) method which returns the DocConsumer that the DocumentsWriter calls to process the documents.
 
class  PerDocBuffer
 
class  SkipDocWriter
 
class  WaitQueue
 

Public Member Functions

void Dispose ()
 

Properties

static int BYTE_BLOCK_SIZE_ForNUnit [get]
 
static int CHAR_BLOCK_SIZE_ForNUnit [get]
 

Detailed Description

This class accepts multiple added documents and directly writes a single segment file. It does this more efficiently than creating a single segment per document (with DocumentWriter) and doing standard merges on those segments.

Each added document is passed to the DocConsumer, which in turn processes the document and interacts with other consumers in the indexing chain. Certain consumers, like StoredFieldsWriter and TermVectorsTermsWriter , digest a document and immediately write bytes to the "doc store" files (ie, they do not consume RAM per document, except while they are processing the document).

Other consumers, eg FreqProxTermsWriter and NormsWriter, buffer bytes in RAM and flush only when a new segment is produced. Once we have used our allowed RAM buffer, or the number of added docs is large enough (in the case we are flushing by doc count instead of RAM usage), we create a real segment and flush it to the Directory.

Threads:

Multiple threads are allowed into addDocument at once. There is an initial synchronized call to getThreadState which allocates a ThreadState for this thread. The same thread will get the same ThreadState over time (thread affinity) so that if there are consistent patterns (for example each thread is indexing a different content source) then we make better use of RAM. Then processDocument is called on that ThreadState without synchronization (most of the "heavy lifting" is in this call). Finally the synchronized "finishDocument" is called to flush changes to the directory.

When flush is called by IndexWriter we forcefully idle all threads and flush only once they are all idle. This means you can call flush with a given thread even while other threads are actively adding/deleting documents.

Exceptions:

Because this class directly updates in-memory posting lists, and flushes stored fields and term vectors directly to files in the directory, there are certain limited times when an exception can corrupt this state. For example, a disk full while flushing stored fields leaves this file in a corrupt state. Or, an OOM exception while appending to the in-memory posting lists can corrupt that posting list. We call such exceptions "aborting exceptions". In these cases we must call abort() to discard all docs added since the last flush.

All other exceptions ("non-aborting exceptions") can still partially update the index structures. These updates are consistent, but, they represent only a part of the document seen up until the exception was hit. When this happens, we immediately mark the document as deleted so that the document is always atomically ("all or none") added to the index.

Definition at line 103 of file DocumentsWriter.cs.

Member Function Documentation

void Lucene.Net.Index.DocumentsWriter.Dispose ( )

Definition at line 868 of file DocumentsWriter.cs.

Property Documentation

int Lucene.Net.Index.DocumentsWriter.BYTE_BLOCK_SIZE_ForNUnit
staticget

Definition at line 2066 of file DocumentsWriter.cs.

int Lucene.Net.Index.DocumentsWriter.CHAR_BLOCK_SIZE_ForNUnit
staticget

Definition at line 2071 of file DocumentsWriter.cs.


The documentation for this class was generated from the following file: