[Missing <summary> documentation for "N:Lucene.Net.Index"]
Classes
Class | Description | ||
---|---|---|---|
AbstractAllTermDocs |
Base class for enumerating all but deleted docs.
NOTE: this class is meant only to be used internally
by Lucene; it's only public so it can be shared across
packages. This means the API is freely subject to
change, and, the class could be removed entirely, in any
Lucene release. Use directly at your own risk! */
| ||
ByteBlockPool | |||
ByteBlockPool..::..Allocator | |||
ByteSliceReader | |||
ByteSliceWriter | Class to write byte streams into slices of shared
byte[]. This is used by DocumentsWriter to hold the
posting list for many terms in RAM.
| ||
CheckIndex | Basic tool and API to check the health of an index and
write a new segments file that removes reference to
problematic segments.
As this tool checks every byte in the index, on a large
index it can take quite a long time to run.
WARNING: this tool and API is new and
experimental and is subject to suddenly change in the
next release. Please make a complete backup of your
index before using this to fix your index!
| ||
CheckIndex..::..Status | Returned from {@link #CheckIndex()} detailing the health and status of the index.
WARNING: this API is new and experimental and is
subject to suddenly change in the next release.
| ||
CheckIndex..::..Status..::..FieldNormStatus | Status from testing field norms. | ||
CheckIndex..::..Status..::..SegmentInfoStatus | Holds the status of each segment in the index.
See {@link #segmentInfos}.
WARNING: this API is new and experimental and is
subject to suddenly change in the next release.
| ||
CheckIndex..::..Status..::..StoredFieldStatus | Status from testing stored fields. | ||
CheckIndex..::..Status..::..TermIndexStatus | Status from testing term index. | ||
CheckIndex..::..Status..::..TermVectorStatus | Status from testing stored fields. | ||
CompoundFileReader | Class for accessing a compound stream.
This class implements a directory, but is limited to only read operations.
Directory methods that would normally modify data throw an exception.
| ||
CompoundFileReader..::..CSIndexInput | Implementation of an IndexInput that reads from a portion of the
compound file. The visibility is left as "package" *only* because
this helps with testing since JUnit test cases in a different class
can then access package fields of this class.
| ||
CompoundFileWriter | Combines multiple files into a single compound file.
The file format:
| ||
ConcurrentMergeScheduler | A {@link MergeScheduler} that runs each merge using a
separate thread, up until a maximum number of threads
({@link #setMaxThreadCount}) at which when a merge is
needed, the thread(s) that are updating the index will
pause until one or more merges completes. This is a
simple way to use concurrency in the indexing process
without having to create and manage application level
threads.
| ||
ConcurrentMergeScheduler..::..MergeThread | |||
CorruptIndexException | This exception is thrown when Lucene detects
an inconsistency in the index.
| ||
DirectoryReader | An IndexReader which reads indexes with multiple segments. | ||
DocumentsWriter | This class accepts multiple added documents and directly
writes a single segment file. It does this more
efficiently than creating a single segment per document
(with DocumentWriter) and doing standard merges on those
segments.
Each added document is passed to the {@link DocConsumer},
which in turn processes the document and interacts with
other consumers in the indexing chain. Certain
consumers, like {@link StoredFieldsWriter} and {@link
TermVectorsTermsWriter}, digest a document and
immediately write bytes to the "doc store" files (ie,
they do not consume RAM per document, except while they
are processing the document).
Other consumers, eg {@link FreqProxTermsWriter} and
{@link NormsWriter}, buffer bytes in RAM and flush only
when a new segment is produced.
Once we have used our allowed RAM buffer, or the number
of added docs is large enough (in the case we are
flushing by doc count instead of RAM usage), we create a
real segment and flush it to the Directory.
Threads:
Multiple threads are allowed into addDocument at once.
There is an initial synchronized call to getThreadState
which allocates a ThreadState for this thread. The same
thread will get the same ThreadState over time (thread
affinity) so that if there are consistent patterns (for
example each thread is indexing a different content
source) then we make better use of RAM. Then
processDocument is called on that ThreadState without
synchronization (most of the "heavy lifting" is in this
call). Finally the synchronized "finishDocument" is
called to flush changes to the directory.
When flush is called by IndexWriter, or, we flush
internally when autoCommit=false, we forcefully idle all
threads and flush only once they are all idle. This
means you can call flush with a given thread even while
other threads are actively adding/deleting documents.
Exceptions:
Because this class directly updates in-memory posting
lists, and flushes stored fields and term vectors
directly to files in the directory, there are certain
limited times when an exception can corrupt this state.
For example, a disk full while flushing stored fields
leaves this file in a corrupt state. Or, an OOM
exception while appending to the in-memory posting lists
can corrupt that posting list. We call such exceptions
"aborting exceptions". In these cases we must call
abort() to discard all docs added since the last flush.
All other exceptions ("non-aborting exceptions") can
still partially update the index structures. These
updates are consistent, but, they represent only a part
of the document seen up until the exception was hit.
When this happens, we immediately mark the document as
deleted so that the document is always atomically ("all
or none") added to the index.
| ||
DoubleFieldEnumerator |
Implementation for enumerating over all of the terms in a double numeric field.
| ||
EmptyVector |
A simple TermFreqVector implementation for an empty vector for use
with a deleted document or a document that does not have the field
that is being enumerated.
| ||
FieldEnumerator<(Of <(<'T>)>)> | |||
FieldEnumerator<(Of <(<'T>)>)>..::..TermEnumerator |
The enumerator over the terms in an index.
| ||
FieldInfo | |||
FieldInfos | Access to the Fieldable Info file that describes document fields and whether or
not they are indexed. Each segment has a separate Fieldable Info file. Objects
of this class are thread-safe for multiple readers, but only one thread can
be adding documents at a time, with no other reader or writer threads
accessing this object.
| ||
FieldInvertState | This class tracks the number and position / offset parameters of terms
being added to the index. The information collected in this class is
also used to calculate the normalization factor for a field.
WARNING: This API is new and experimental, and may suddenly
change. | ||
FieldReaderException | |||
FieldSortedTermVectorMapper | For each Field, store a sorted collection of {@link TermVectorEntry}s
This is not thread-safe.
| ||
FieldsReader | Class responsible for access to stored document fields.
It uses <segment>.fdt and <segment>.fdx; files.
| ||
FilterIndexReader | A CopyC# FilterIndexReader CopyC# FilterIndexReader CopyC# IndexReader CopyC# FilterIndexReader | ||
FilterIndexReader..::..FilterTermDocs | Base class for filtering {@link TermDocs} implementations. | ||
FilterIndexReader..::..FilterTermEnum | Base class for filtering {@link TermEnum} implementations. | ||
FilterIndexReader..::..FilterTermPositions | Base class for filtering {@link TermPositions} implementations. | ||
FloatFieldEnumerator |
Implementation for enumerating over all of the terms in a float numeric field.
| ||
IndexCommit | Expert: represents a single commit into an index as seen by the
{@link IndexDeletionPolicy} or {@link IndexReader}. Changes to the content of an index are made visible
only after the writer who made that change commits by
writing a new segments file
( CopyC# segments_N | ||
IndexFileDeleter | |||
IndexFileNameFilter | Filename filter that accept filenames and extensions only created by Lucene.
| ||
IndexFileNames | Useful constants representing filenames and extensions used by lucene
| ||
IndexModifier | Obsolete. [Note that as of 2.1, all but one of the
methods in this class are available via {@link
IndexWriter}. The one method that is not available is
{@link #DeleteDocument(int)}.]
A class to modify an index, i.e. to delete and add documents. This
class hides {@link IndexReader} and {@link IndexWriter} so that you
do not need to care about implementation details such as that adding
documents is done via IndexWriter and deletion is done via IndexReader.
Note that you cannot create more than one CopyC# IndexModifier
CopyC# IndexModifier | ||
IndexReader | IndexReader is an abstract class, providing an interface for accessing an
index. Search of an index is done entirely through this abstract interface,
so that any subclass which implements it is searchable.
Concrete subclasses of IndexReader are usually constructed with a call to
one of the static CopyC# open() CopyC# IndexReader CopyC# IndexReader | ||
IndexReader..::..FieldOption | Constants describing field properties, for example used for
{@link IndexReader#GetFieldNames(FieldOption)}.
| ||
IndexWriter | An CopyC# IndexWriter CopyC# create CopyC# create=true CopyC# create CopyC# autoCommit CopyC# false CopyC# segments_N CopyC# autoCommit CopyC# true | ||
IndexWriter..::..IndexReaderWarmer | If {@link #getReader} has been called (ie, this writer
is in near real-time mode), then after a merge
completes, this class can be invoked to warm the
reader on the newly merged segment, before the merge
commits. This is not required for near real-time
search, but will reduce search latency on opening a
new near real-time reader after a merge completes.
NOTE: This API is experimental and might
change in incompatible ways in the next release.NOTE: warm is called before any deletes have
been carried over to the merged segment.
| ||
IndexWriter..::..MaxFieldLength | Specifies maximum field length (in number of tokens/terms) in {@link IndexWriter} constructors.
{@link #SetMaxFieldLength(int)} overrides the value set by
the constructor.
| ||
IntFieldEnumerator |
Implementation for enumerating over all of the terms in an int numeric field.
| ||
KeepOnlyLastCommitDeletionPolicy | This {@link IndexDeletionPolicy} implementation that
keeps only the most recent commit and immediately removes
all prior commits after a new commit is done. This is
the default deletion policy.
| ||
LogByteSizeMergePolicy | This is a {@link LogMergePolicy} that measures size of a
segment as the total byte size of the segment's files.
| ||
LogDocMergePolicy | This is a {@link LogMergePolicy} that measures size of a
segment as the number of documents (not taking deletions
into account).
| ||
LogMergePolicy | This class implements a {@link MergePolicy} that tries
to merge segments into levels of exponentially
increasing size, where each level has fewer segments than
the value of the merge factor. Whenever extra segments
(beyond the merge factor upper bound) are encountered,
all segments within the level are merged. You can get or
set the merge factor using {@link #GetMergeFactor()} and
{@link #SetMergeFactor(int)} respectively.This class is abstract and requires a subclass to
define the {@link #size} method which specifies how a
segment's size is determined. {@link LogDocMergePolicy}
is one subclass that measures size by document count in
the segment. {@link LogByteSizeMergePolicy} is another
subclass that measures size as the total byte size of the
file(s) for the segment. | ||
LongFieldEnumerator |
Implementation for enumerating over all of the terms in a long numeric field.
| ||
MergePolicy | Expert: a MergePolicy determines the sequence of
primitive merge operations to be used for overall merge
and optimize operations.Whenever the segments in an index have been altered by
{@link IndexWriter}, either the addition of a newly
flushed segment, addition of many segments from
addIndexes* calls, or a previous merge that may now need
to cascade, {@link IndexWriter} invokes {@link
#findMerges} to give the MergePolicy a chance to pick
merges that are now required. This method returns a
{@link MergeSpecification} instance describing the set of
merges that should be done, or null if no merges are
necessary. When IndexWriter.optimize is called, it calls
{@link #findMergesForOptimize} and the MergePolicy should
then return the necessary merges.Note that the policy can return more than one merge at
a time. In this case, if the writer is using {@link
SerialMergeScheduler}, the merges will be run
sequentially but if it is using {@link
ConcurrentMergeScheduler} they will be run concurrently.The default MergePolicy is {@link
LogByteSizeMergePolicy}.NOTE: This API is new and still experimental
(subject to change suddenly in the next release)NOTE: This class typically requires access to
package-private APIs (e.g. CopyC# SegmentInfos | ||
MergePolicy..::..MergeAbortedException | |||
MergePolicy..::..MergeException | Exception thrown if there are any problems while
executing a merge.
| ||
MergePolicy..::..MergeSpecification | A MergeSpecification instance provides the information
necessary to perform multiple merges. It simply
contains a list of {@link OneMerge} instances.
| ||
MergePolicy..::..OneMerge | OneMerge provides the information necessary to perform
an individual primitive merge operation, resulting in
a single new segment. The merge spec includes the
subset of segments to be merged as well as whether the
new segment should use the compound file format.
| ||
MergeScheduler | Expert: {@link IndexWriter} uses an instance
implementing this interface to execute the merges
selected by a {@link MergePolicy}. The default
MergeScheduler is {@link ConcurrentMergeScheduler}.NOTE: This API is new and still experimental
(subject to change suddenly in the next release)NOTE: This class typically requires access to
package-private APIs (eg, SegmentInfos) to do its job;
if you implement your own MergePolicy, you'll need to put
it in package Lucene.Net.Index in order to use
these APIs.
| ||
MultipleTermPositions | Allows you to iterate over the {@link TermPositions} for multiple {@link Term}s as
a single {@link TermPositions}.
| ||
MultiReader | An IndexReader which reads multiple indexes, appending their content.
| ||
NumericFieldEnum<(Of <(<'T>)>)> |
Base for enumerating over numeric fields.
| ||
ParallelReader | An IndexReader which reads multiple, parallel indexes. Each index added
must have the same number of documents, but typically each contains
different fields. Each document contains the union of the fields of all
documents with the same document number. When searching, matches for a
query term are from the first index added that has the field.
This is useful, e.g., with collections that have large fields which
change rarely and small fields that change more frequently. The smaller
fields may be re-indexed in a new index and both indexes may be searched
together.
Warning: It is up to you to make sure all indexes
are created and modified the same way. For example, if you add
documents to one index, you need to add the same documents in the
same order to the other indexes. Failure to do so will result in
undefined behavior.
| ||
Payload | A Payload is metadata that can be stored together with each occurrence
of a term. This metadata is stored inline in the posting list of the
specific term.
To store payloads in the index a {@link TokenStream} has to be used that
produces payload data.
Use {@link TermPositions#GetPayloadLength()} and {@link TermPositions#GetPayload(byte[], int)}
to retrieve the payloads from the index. | ||
PositionBasedTermVectorMapper | For each Field, store position by position information. It ignores frequency information
This is not thread-safe.
| ||
PositionBasedTermVectorMapper..::..TVPositionInfo | Container for a term at a position | ||
ReadOnlyDirectoryReader | |||
ReadOnlySegmentReader | |||
SegmentInfo | Information about a segment such as it's name, directory, and files related
to the segment.
* NOTE: This API is new and still experimental
(subject to change suddenly in the next release) | ||
SegmentInfos | A collection of segmentInfo objects with methods for operating on
those segments in relation to the file system.
NOTE: This API is new and still experimental
(subject to change suddenly in the next release) | ||
SegmentInfos..::..FindSegmentsFile | Utility class for executing code that needs to do
something with the current segments file. This is
necessary with lock-less commits because from the time
you locate the current segments file name, until you
actually open it, read its contents, or check modified
time, etc., it could have been deleted due to a writer
commit finishing.
| ||
SegmentMerger | The SegmentMerger class combines two or more Segments, represented by an IndexReader ({@link #add},
into a single Segment. After adding the appropriate readers, call the merge method to combine the
segments.
If the compoundFile flag is set, then the segments will be merged into a compound file.
| ||
SegmentReader | NOTE: This API is new and still experimental
(subject to change suddenly in the next release) | ||
SegmentReader..::..CoreReaders | |||
SegmentReader..::..Norm | Byte[] referencing is used because a new norm object needs
to be created for each clone, and the byte array is all
that is needed for sharing between cloned readers. The
current norm referencing is for sharing between readers
whereas the byte[] referencing is for copy on write which
is independent of reader references (i.e. incRef, decRef).
| ||
SegmentReader..::..Ref | |||
SegmentsGenCommit |
Class that will force an index writer to open an index based
on the generation in the segments.gen file as opposed to the
highest generation found in a directory listing.
A use case for using this IndexCommit when opening an IndexWriter
would be if index snapshots (source) are being copied over an
existing index (target) and the source now has a lower generation
than the target due to initiating a rebuild of the index.
| ||
SegmentTermDocs | |||
SegmentTermEnum | |||
SegmentTermPositions | |||
SerialMergeScheduler | A {@link MergeScheduler} that simply does each merge
sequentially, using the current thread.
| ||
SnapshotDeletionPolicy | A {@link IndexDeletionPolicy} that wraps around any other
{@link IndexDeletionPolicy} and adds the ability to hold and
later release a single "snapshot" of an index. While
the snapshot is held, the {@link IndexWriter} will not
remove any files associated with it even if the index is
otherwise being actively, arbitrarily changed. Because
we wrap another arbitrary {@link IndexDeletionPolicy}, this
gives you the freedom to continue using whatever {@link
IndexDeletionPolicy} you would normally want to use with your
index. Note that you can re-use a single instance of
SnapshotDeletionPolicy across multiple writers as long
as they are against the same index Directory. Any
snapshot held when a writer is closed will "survive"
when the next writer is opened.
WARNING: This API is a new and experimental and
may suddenly change. | ||
SortedTermVectorMapper | Store a sorted collection of {@link Lucene.Net.Index.TermVectorEntry}s. Collects all term information
into a single, SortedSet.
NOTE: This Mapper ignores all Field information for the Document. This means that if you are using offset/positions you will not know what Fields they correlate with. This is not thread-safe | ||
StaleReaderException | This exception is thrown when an {@link IndexReader}
tries to make changes to the index (via {@link
IndexReader#deleteDocument}, {@link
IndexReader#undeleteAll} or {@link IndexReader#setNorm})
but changes have already been committed to the index
since this reader was instantiated. When this happens
you must open a new reader on the current index to make
the changes.
| ||
StringFieldEnumerator |
Implementation for enumerating over terms with a string value.
| ||
Term | A Term represents a word from text. This is the unit of search. It is
composed of two elements, the text of the word, as a string, and the name of
the field that the text occured in, an interned string.
Note that terms may represent more than words from text fields, but also
things like dates, email addresses, urls, etc.
| ||
TermDocEnumerator |
Class to handle creating a TermDocs and allowing for seeking and enumeration. Used
when you have a set of one or moreterms for which you want to enumerate over the
documents that contain those terms.
| ||
TermDocEnumerator..::..TermDocUsingTermsEnumerator |
Class to handle enumeration over the TermDocs that does NOT close them
on a call to Dispose!
| ||
TermEnum | Abstract class for enumerating terms.
Term enumerations are always ordered by Term.compareTo(). Each term in
the enumeration is greater than all that precede it.
| ||
TermVectorEntry | Convenience class for holding TermVector information. | ||
TermVectorEntryFreqSortedComparator | Compares {@link Lucene.Net.Index.TermVectorEntry}s first by frequency and then by
the term (case-sensitive)
| ||
TermVectorEnumerator |
Class to allow for enumerating over the documents in the index to
retrieve the term vector for each one.
| ||
TermVectorMapper | The TermVectorMapper can be used to map Term Vectors into your own
structure instead of the parallel array structure used by
{@link Lucene.Net.Index.IndexReader#GetTermFreqVector(int,String)}.
It is up to the implementation to make sure it is thread-safe.
| ||
TermVectorOffsetInfo | The TermVectorOffsetInfo class holds information pertaining to a Term in a {@link Lucene.Net.Index.TermPositionVector}'s
offset information. This offset information is the character offset as set during the Analysis phase (and thus may not be the actual offset in the
original content).
| ||
TermVectorsReader |
Interfaces
Interface | Description | |
---|---|---|
IndexCommitPoint | Obsolete. | |
IndexDeletionPolicy | Expert: policy for deletion of stale {@link IndexCommit index commits}.
Implement this interface, and pass it to one
of the {@link IndexWriter} or {@link IndexReader}
constructors, to customize when older
{@link IndexCommit point-in-time commits}
are deleted from the index directory. The default deletion policy
is {@link KeepOnlyLastCommitDeletionPolicy}, which always
removes old commits as soon as a new commit is done (this
matches the behavior before 2.2).One expected use case for this (and the reason why it
was first created) is to work around problems with an
index directory accessed via filesystems like NFS because
NFS does not provide the "delete on last close" semantics
that Lucene's "point in time" search normally relies on.
By implementing a custom deletion policy, such as "a
commit is only removed once it has been stale for more
than X minutes", you can give your readers time to
refresh to the new commit before {@link IndexWriter}
removes the old commits. Note that doing so will
increase the storage requirements of the index. See LUCENE-710
for details. | |
TermDocs | TermDocs provides an interface for enumerating <document, frequency>
pairs for a term. The document portion names each document containing
the term. Documents are indicated by number. The frequency portion gives
the number of times the term occurred in each document. The pairs are
ordered by document number.
| |
TermFreqVector | Provides access to stored term vector of
a document field. The vector consists of the name of the field, an array of the terms tha occur in the field of the
{@link Lucene.Net.Documents.Document} and a parallel array of frequencies. Thus, getTermFrequencies()[5] corresponds with the
frequency of getTerms()[5], assuming there are at least 5 terms in the Document.
| |
TermPositions | TermPositions provides an interface for enumerating the <document,
frequency, <position>* > tuples for a term. The document and
frequency are the same as for a TermDocs. The positions portion lists the ordinal
positions of each occurrence of a term in a document.
| |
TermPositionVector | Extends CopyC# TermFreqVector |
Enumerations
Enumeration | Description | |
---|---|---|
FieldParser |
The type of parser for the value of the term.
|