An
CopyC#
IndexWriter
creates and maintains an index.

The

CopyC#
create
argument to the {@link #IndexWriter(Directory, Analyzer, boolean) constructor} determines whether a new index is created, or whether an existing index is opened. Note that you can open an index with
CopyC#
create=true
even while readers are using the index. The old readers will continue to search the "point in time" snapshot they had opened, and won't see the newly created index until they re-open. There are also {@link #IndexWriter(Directory, Analyzer) constructors} with no
CopyC#
create
argument which will create a new index if there is not already an index at the provided path and otherwise open the existing index.

In either case, documents are added with {@link #AddDocument(Document) addDocument} and removed with {@link #DeleteDocuments(Term)} or {@link #DeleteDocuments(Query)}. A document can be updated with {@link #UpdateDocument(Term, Document) updateDocument} (which just deletes and then adds the entire document). When finished adding, deleting and updating documents, {@link #Close() close} should be called.

These changes are buffered in memory and periodically flushed to the {@link Directory} (during the above method calls). A flush is triggered when there are enough buffered deletes (see {@link #setMaxBufferedDeleteTerms}) or enough added documents since the last flush, whichever is sooner. For the added documents, flushing is triggered either by RAM usage of the documents (see {@link #setRAMBufferSizeMB}) or the number of added documents. The default is to flush when RAM usage hits 16 MB. For best indexing speed you should flush by RAM usage with a large RAM buffer. Note that flushing just moves the internal buffered state in IndexWriter into the index, but these changes are not visible to IndexReader until either {@link #Commit()} or {@link #close} is called. A flush may also trigger one or more segment merges which by default run with a background thread so as not to block the addDocument calls (see below for changing the {@link MergeScheduler}).

The optional

argument to the {@link #IndexWriter(Directory, boolean, Analyzer) constructors} controls visibility of the changes to {@link IndexReader} instances reading the same index. When this is
CopyC#
false
, changes are not visible until {@link #Close()} or {@link #Commit()} is called. Note that changes will still be flushed to the {@link Directory} as new files, but are not committed (no new
CopyC#
segments_N
file is written referencing the new files, nor are the files sync'd to stable storage) until {@link #Close()} or {@link #Commit()} is called. If something goes terribly wrong (for example the JVM crashes), then the index will reflect none of the changes made since the last commit, or the starting state if commit was not called. You can also call {@link #Rollback()}, which closes the writer without committing any changes, and removes any index files that had been flushed but are now unreferenced. This mode is useful for preventing readers from refreshing at a bad time (for example after you've done all your deletes but before you've done your adds). It can also be used to implement simple single-writer transactional semantics ("all or none"). You can do a two-phase commit by calling {@link #PrepareCommit()} followed by {@link #Commit()}. This is necessary when Lucene is working with an external resource (for example, a database) and both must either commit or rollback the transaction.

When

CopyC#
autoCommit
is
CopyC#
true
then the writer will periodically commit on its own. [Deprecated: Note that in 3.0, IndexWriter will no longer accept autoCommit=true (it will be hardwired to false). You can always call {@link #Commit()} yourself when needed]. There is no guarantee when exactly an auto commit will occur (it used to be after every flush, but it is now after every completed merge, as of 2.4). If you want to force a commit, call {@link #Commit()}, or, close the writer. Once a commit has finished, newly opened {@link IndexReader} instances will see the changes to the index as of that commit. When running in this mode, be careful not to refresh your readers while optimize or segment merges are taking place as this can tie up substantial disk space.

Expert:

CopyC#
IndexWriter
allows an optional {@link IndexDeletionPolicy} implementation to be specified. You can use this to control when prior commits are deleted from the index. The default policy is {@link KeepOnlyLastCommitDeletionPolicy} which removes all prior commits as soon as a new commit is done (this matches behavior before 2.2). Creating your own policy can allow you to explicitly keep previous "point in time" commits alive in the index for some time, to allow readers to refresh to the new commit without having the old commit deleted out from under them. This is necessary on filesystems like NFS that do not support "delete on last close" semantics, which Lucene's "point in time" search normally relies on.

Expert:

allows you to separately change the {@link MergePolicy} and the {@link MergeScheduler}. The {@link MergePolicy} is invoked whenever there are changes to the segments in the index. Its role is to select which merges to do, if any, and return a {@link MergePolicy.MergeSpecification} describing the merges. It also selects merges to do for optimize(). (The default is {@link LogByteSizeMergePolicy}. Then, the {@link MergeScheduler} is invoked with the requested merges and it decides when and how to run the merges. The default is {@link ConcurrentMergeScheduler}.

NOTE: if you hit an OutOfMemoryError then IndexWriter will quietly record this fact and block all future segment commits. This is a defensive measure in case any internal state (buffered documents and deletions) were corrupted. Any subsequent calls to {@link #Commit()} will throw an IllegalStateException. The only course of action is to call {@link #Close()}, which internally will call {@link #Rollback()}, to undo any changes to the index since the last commit. If you opened the writer with autoCommit false you can also just call {@link #Rollback()} directly.

NOTE: {@link

} instances are completely thread safe, meaning multiple threads can call any of its methods, concurrently. If your application requires external synchronization, you should not synchronize on the
CopyC#
IndexWriter
instance as this may cause deadlock; use your own (non-Lucene) objects instead.

The IndexWriter..::..MaxFieldLength type exposes the following members.

Constructors

  NameDescription
Public methodIndexWriter..::..MaxFieldLength
Public constructor to allow users to specify the maximum field size limit.

Methods

  NameDescription
Public methodEquals
Determines whether the specified Object is equal to the current Object.
(Inherited from Object.)
Protected methodFinalize
Allows an Object to attempt to free resources and perform other cleanup operations before the Object is reclaimed by garbage collection.
(Inherited from Object.)
Public methodGetHashCode
Serves as a hash function for a particular type.
(Inherited from Object.)
Public methodGetLimit
Public methodGetType
Gets the Type of the current instance.
(Inherited from Object.)
Protected methodMemberwiseClone
Creates a shallow copy of the current Object.
(Inherited from Object.)
Public methodToString (Overrides Object..::..ToString()()()().)

Fields

  NameDescription
Public fieldStatic memberLIMITED
Sets the maximum field length to {@link #DEFAULT_MAX_FIELD_LENGTH}
Public fieldStatic memberUNLIMITED
Sets the maximum field length to {@link Integer#MAX_VALUE}.

See Also