Apache Lucene.NET 4.8.0 Migration Guide
.NET API Conventions
Several Java conventions were replaced with their .NET counterparts:
Classes suffixed with
Comparator
are now suffixed withComparer
.Most iterator classes were converted to .NET enumerators.
Instead of
Iterator()
, callGetEnumerator()
(in some cases, it may beGetIterator()
).Instead of
HasNext()
, callMoveNext()
however note that this will advance the position of the enumerator.Instead of
Next()
the return value can be retrieved from theCurrent
property after callingMoveNext()
.
Classes and members that include numeric type names now use the language-agnostic .NET name. For example:
Instead of
Short
orGetShort()
useInt16
orGetInt16()
.Instead of
Integer
orGetInteger()
useInt32
orGetInt32()
.Instead of
Long
orGetLong()
useInt364
orGetInt64()
.Instead of
Float
useSingle
. Note thatLucene.Net.Queries.Function.ValueSources.SingleFunction
was renamedLucene.Net.Queries.Function.ValueSources.SingularFunction
to distinguish it from theSystem.Single
data type.
For collections, the
Size
property is now namedCount
.For arrays and files, the
Size
property is now namedLength
.For
IndexInput
andIndexOutput
subclasses,GetFilePointer()
method has been changed to aPosition
property to matchSystem.IO.FileStream.Position
.Some classes, enums, and interfaces have been de-nested from their original Lucene location to make them easier to find when using Intellisense.
Some methods were lacking a verb, so the verb
Get
was added to make the method's function more clear. For example, instead ofAnalysis.TokenStream()
we now haveAnalysis.GetTokenStream()
.
Four-dimensional enumerations
Flexible indexing changed the low level fields/terms/docs/positions enumeration APIs. Here are the major changes:
Terms are now binary in nature (arbitrary
byte[]
), represented by theBytesRef
class (which provides an offset + length "slice" into an existingbyte[]
).Fields are separately enumerated (
Fields.GetEnumerator()
) from the terms within each field (TermEnum
). So instead of this:TermEnum termsEnum = ...; while (termsEnum.Next()) { Term t = termsEnum.Term; Console.WriteLine("field=" + t.Field + "; text=" + t.Text); }
Do this:
foreach (string field in fields) { Terms terms = fields.GetTerms(field); TermsEnum termsEnum = terms.GetEnumerator(); BytesRef text; while(termsEnum.MoveNext()) { Console.WriteLine("field=" + field + "; text=" + termsEnum.Current.Utf8ToString()); } }
TermDocs
is renamed toDocsEnum
. Instead of this:while (td.Next()) { int doc = td.Doc; ... }
do this:
int doc; while ((doc = td.Next()) != DocsEnum.NO_MORE_DOCS) { ... }
Instead of this:
if (td.SkipTo(target)) { int doc = td.Doc; ... }
do this:
if ((doc = td.Advance(target)) != DocsEnum.NO_MORE_DOCS) { ... }
TermPositions
is renamed toDocsAndPositionsEnum
, and no longer extends the docs only enumerator (DocsEnum
).Deleted docs are no longer implicitly filtered from docs/positions enums. Instead, you pass a
IBits
SkipDocs
(set bits are skipped) when obtaining the enums. Also, you can now ask a reader for its deleted docs.The docs/positions enums cannot seek to a term. Instead,
TermsEnum
is able to seek, and then you request the docs/positions enum from thatTermsEnum
.TermsEnum
's seek method returns more information. So instead of this:Term t; TermEnum termEnum = reader.Terms(t); if (t.Equals(termEnum.Term)) { ... }
do this:
TermsEnum termsEnum = ...; BytesRef text; if (termsEnum.Seek(text) == TermsEnum.SeekStatus.FOUND) { ... }
SeekStatus
also containsEND
(enumerator is done) andNOT_FOUND
(term was not found but enumerator is now positioned to the next term).TermsEnum
has anOrd
property, returning the long numeric ordinal (ie, first term is 0, next is 1, and so on) for the term it's not positioned to. There is also a corresponding Seek(long ord) method. Note that these members are optional; in particular theMultiFields
TermsEnum
does not implement them.How you obtain the enums has changed. The primary entry point is the
Fields
class. If you know your reader is a single segment reader, do this:Fields fields = reader.Fields(); if (fields != null) { ... }
If the reader might be multi-segment, you must do this:
Fields fields = MultiFields.GetFields(reader); if (fields != null) { ... }
The fields may be
null
(eg if the reader has no fields).
Note that theMultiFields
approach entails a performance hit onMultiReaders
, as it must merge terms/docs/positions on the fly. It's generally better to instead get the sequential readers (useLucene.Net.Util.ReaderUtil
) and then step through those readers yourself, if you can (this is how Lucene drives searches).
If you pass aSegmentReader
toMultiFields.GetFields()
it will simply returnreader.GetFields()
, so there is no performance hit in that case.
Once you have a non-nullFields
you can do this:Terms terms = fields.GetTerms("field"); if (terms != null) { ... }
The terms may be
null
(eg if the field does not exist).
Once you have a non-null terms you can get an enum like this:TermsEnum termsEnum = terms.GetIterator();
The returned
TermsEnum
will not benull
.
You can then .Next() through the TermsEnum, or Seek. If you want aDocsEnum
, do this:IBits liveDocs = reader.GetLiveDocs(); DocsEnum docsEnum = null; docsEnum = termsEnum.Docs(liveDocs, docsEnum, needsFreqs);
You can pass in a prior
DocsEnum
and it will be reused if possible.
Likewise forDocsAndPositionsEnum
.
IndexReader
has several sugar methods (which just go through the above steps, under the hood). Instead of:Term t; TermDocs termDocs = reader.TermDocs; termDocs.Seek(t);
do this:
Term t; DocsEnum docsEnum = reader.GetTermDocsEnum(t);
Likewise for
DocsAndPositionsEnum
.
LUCENE-2380: FieldCache.GetStrings/Index --> FieldCache.GetDocTerms/Index
The field values returned when sorting by
SortField.STRING
are nowBytesRef
. You can callvalue.Utf8ToString()
to convert back to string, if necessary.In
FieldCache
,GetStrings
(returningstring[]
) has been replaced withGetTerms
(returning aBinaryDocValues
instance).BinaryDocValues
provides aGet
method, taking adocID
and aBytesRef
to fill (which must not benull
), and it fills it in with the reference to the bytes for that term.
If you had code like this before:string[] values = FieldCache.DEFAULT.GetStrings(reader, field); ... string aValue = values[docID];
you can do this instead:
BinaryDocValues values = FieldCache.DEFAULT.GetTerms(reader, field); ... BytesRef term = new BytesRef(); values.Get(docID, term); string aValue = term.Utf8ToString();
Note however that it can be costly to convert to
String
, so it's better to work directly with theBytesRef
.Similarly, in
FieldCache
, GetStringIndex (returning aStringIndex
instance, with direct arraysint[]
order andString[]
lookup) has been replaced withGetTermsIndex
(returning aSortedDocValues
instance).SortedDocValues
provides theGetOrd(int docID)
method to lookup the int order for a document,LookupOrd(int ord, BytesRef result)
to lookup the term from a given order, and the sugar methodGet(int docID, BytesRef result)
which internally callsGetOrd
and thenLookupOrd
.
If you had code like this before:StringIndex idx = FieldCache.DEFAULT.GetStringIndex(reader, field); ... int ord = idx.order[docID]; String aValue = idx.lookup[ord];
you can do this instead:
DocTermsIndex idx = FieldCache.DEFAULT.GetTermsIndex(reader, field); ... int ord = idx.GetOrd(docID); BytesRef term = new BytesRef(); idx.LookupOrd(ord, term); string aValue = term.Utf8ToString();
Note however that it can be costly to convert to
String
, so it's better to work directly with theBytesRef
.
DocTermsIndex
also has aGetTermsEnum()
method, which returns an iterator (TermsEnum
) over the term values in the index (ie, iterates ord = 0..NumOrd-1).FieldComparator.StringComparatorLocale
has been removed. (it was very CPU costly since it does not compare using indexed collation keys; use CollationKeyFilter for better performance), since it convertsBytesRef
->String
on the fly.FieldComparator.StringOrdValComparator
has been renamed toFieldComparer.TermOrdValComparer
, and now usesBytesRef
for its values. Likewise forStringValComparator
, renamed toTermValComparer
. This means when sorting bySortField.STRING
orSortField.STRING_VAL
(or directly invoking these comparers) the values returned in theFieldDoc.Fields
array will beBytesRef
notString
. You can call the.Utf8ToString()
method on theBytesRef
instances, if necessary.
LUCENE-2600: IndexReader
s are now read-only
Instead of IndexReader.IsDeleted(int n)
, do this:
using Lucene.Net.Util;
using Lucene.Net.Index;
IBits liveDocs = MultiFields.GetLiveDocs(indexReader);
if (liveDocs != null && !liveDocs.Get(docID))
{
// document is deleted...
}
LUCENE-2858, LUCENE-3733: IndexReader
--> AtomicReader
/CompositeReader
/DirectoryReader
refactoring
The abstract class IndexReader
has been
refactored to expose only essential methods to access stored fields
during display of search results. It is no longer possible to retrieve
terms or postings data from the underlying index, not even deletions are
visible anymore. You can still pass IndexReader
as constructor parameter
to IndexSearcher
and execute your searches; Lucene will automatically
delegate procedures like query rewriting and document collection atomic
subreaders.
If you want to dive deeper into the index and want to write own queries,
take a closer look at the new abstract sub-classes AtomicReader
and
CompositeReader
:
AtomicReader
instances are now the only source of Terms
, Postings
,
DocValues
and FieldCache
. Queries are forced to execute on an AtomicReader
on a per-segment basis and FieldCache
s are keyed by
AtomicReader
s.
Its counterpart CompositeReader
exposes a utility method to retrieve
its composites. But watch out, composites are not necessarily atomic.
Next to the added type-safety we also removed the notion of
index-commits and version numbers from the abstract IndexReader
, the
associations with IndexWriter
were pulled into a specialized
DirectoryReader
. To open Directory
-based indexes use
DirectoryReader.Open()
, the corresponding method in IndexReader
is now
deprecated for easier migration. Only DirectoryReader
supports commits,
versions, and reopening with OpenIfChanged()
. Terms, postings,
docvalues, and norms can from now on only be retrieved using
AtomicReader
; DirectoryReader
and MultiReader
extend CompositeReader
,
only offering stored fields and access to the sub-readers (which may be
composite or atomic).
If you have more advanced code dealing with custom Filter
s, you might
have noticed another new class hierarchy in Lucene (see LUCENE-2831):
IndexReaderContext
with corresponding Atomic-/CompositeReaderContext
.
The move towards per-segment search Lucene 2.9 exposed lots of custom
Query
s and Filter
s that couldn't handle it. For example, some Filter
implementations expected the IndexReader
passed in is identical to the
IndexReader
passed to IndexSearcher
with all its advantages like
absolute document IDs etc. Obviously this "paradigm-shift" broke lots of
applications and especially those that utilized cross-segment data
structures (like Apache Solr).
In Lucene 4.0, we introduce IndexReaderContext
s "searcher-private"
reader hierarchy. During Query
or Filter
execution Lucene no longer
passes raw readers down Query
s, Filter
s or Collector
s; instead
components are provided an AtomicReaderContext
(essentially a hierarchy
leaf) holding relative properties like the document-basis in relation to
the top-level reader. This allows Query
s and Filter
to build up logic
based on document IDs, albeit the per-segment orientation.
There are still valid use-cases where top-level readers ie. "atomic
views" on the index are desirable. Let say you want to iterate all terms
of a complete index for auto-completion or faceting, Lucene provides
utility wrappers like SlowCompositeReaderWrapper
(LUCENE-2597) emulating
an AtomicReader
. Note: using "atomicity emulators" can cause serious
slowdowns due to the need to merge terms, postings, DocValues
, and
FieldCache
, use them with care!
LUCENE-4306: GetSequentialSubReaders()
, ReaderUtil.Gather()
The method IndexReader.GetSequentialSubReaders()
was moved to CompositeReader
(see LUCENE-2858, LUCENE-3733) and made protected. It is solely used by CompositeReader
itself to build its reader tree. To get all atomic leaves
of a reader, use IndexReader.Leaves
, which also provides the doc base
of each leave. Readers that are already atomic return itself as leaf with
doc base 0. To emulate Lucene 3.x GetSequentialSubReaders()
,
use Context.Children
.
LUCENE-2413,LUCENE-3396: Analyzer package changes Lucene's core and contrib analyzers, along with Solr's analyzers,
were consolidated into lucene/analysis. During the refactoring some
package names have changed, and ReusableAnalyzerBase
was renamed to
Analyzer
:
Lucene.Net.Analysis.KeywordAnalyzer
->Lucene.Net.Analysis.Core.KeywordAnalyzer
Lucene.Net.Analysis.KeywordTokenizer
->Lucene.Net.Analysis.Core.KeywordTokenizer
Lucene.Net.Analysis.LetterTokenizer
->Lucene.Net.Analysis.Core.LetterTokenizer
Lucene.Net.Analysis.LowerCaseFilter
->Lucene.Net.Analysis.Core.LowerCaseFilter
Lucene.Net.Analysis.LowerCaseTokenizer
->Lucene.Net.Analysis.Core.LowerCaseTokenizer
Lucene.Net.Analysis.SimpleAnalyzer
->Lucene.Net.Analysis.Core.SimpleAnalyzer
Lucene.Net.Analysis.StopAnalyzer
->Lucene.Net.Analysis.Core.StopAnalyzer
Lucene.Net.Analysis.StopFilter
->Lucene.Net.Analysis.Core.StopFilter
Lucene.Net.Analysis.WhitespaceAnalyzer
->Lucene.Net.Analysis.Core.WhitespaceAnalyzer
Lucene.Net.Analysis.WhitespaceTokenizer
->Lucene.Net.Analysis.Core.WhitespaceTokenizer
Lucene.Net.Analysis.PorterStemFilter
->Lucene.Net.Analysis.En.PorterStemFilter
Lucene.Net.Analysis.ASCIIFoldingFilter
->Lucene.Net.Analysis.Miscellaneous.ASCIIFoldingFilter
Lucene.Net.Analysis.ISOLatin1AccentFilter
->Lucene.Net.Analysis.Miscellaneous.ISOLatin1AccentFilter
Lucene.Net.Analysis.KeywordMarkerFilter
->Lucene.Net.Analysis.Miscellaneous.KeywordMarkerFilter
Lucene.Net.Analysis.LengthFilter
->Lucene.Net.Analysis.Miscellaneous.LengthFilter
Lucene.Net.Analysis.PerFieldAnalyzerWrapper
->Lucene.Net.Analysis.Miscellaneous.PerFieldAnalyzerWrapper
Lucene.Net.Analysis.TeeSinkTokenFilter
->Lucene.Net.Analysis.Sinks.TeeSinkTokenFilter
Lucene.Net.Analysis.CharFilter
->Lucene.Net.Analysis.CharFilter.CharFilter
Lucene.Net.Analysis.BaseCharFilter
->Lucene.Net.Analysis.CharFilter.BaseCharFilter
Lucene.Net.Analysis.MappingCharFilter
->Lucene.Net.Analysis.CharFilter.MappingCharFilter
Lucene.Net.Analysis.NormalizeCharMap
->Lucene.Net.Analysis.CharFilter.NormalizeCharMap
Lucene.Net.Analysis.CharArraySet
->Lucene.Net.Analysis.Util.CharArraySet
Lucene.Net.Analysis.CharArrayMap
->Lucene.Net.Analysis.Util.CharArrayMap
Lucene.Net.Analysis.ReusableAnalyzerBase
->Lucene.Net.Analysis.Analyzer
Lucene.Net.Analysis.StopwordAnalyzerBase
->Lucene.Net.Analysis.Util.StopwordAnalyzerBase
Lucene.Net.Analysis.WordListLoader
->Lucene.Net.Analysis.Util.WordListLoader
Lucene.Net.Analysis.CharTokenizer
->Lucene.Net.Analysis.Util.CharTokenizer
Lucene.Net.Util.CharacterUtils
->Lucene.Net.Analysis.Util.CharacterUtils
LUCENE-2514: Collators
The option to use a Collator's order (instead of binary order) for
sorting and range queries has been moved to lucene/queries.
The Collated TermRangeQuery/Filter has been moved to SlowCollatedTermRangeQuery/Filter,
and the collated sorting has been moved to SlowCollatedStringComparer
.
Note: this functionality isn't very scalable and if you are using it, consider indexing collation keys with the collation support in the analysis module instead.
To perform collated range queries, use the collating analyzer: ICUCollationKeyAnalyzer
, and set qp.AnalyzeRangeTerms = true
.
TermRangeQuery
and TermRangeFilter
now work purely on bytes. Both have helper factory methods
(NewStringRange
) similar to the NumericRange
API, to easily perform range queries on String
s.
LUCENE-2883: ValueSource
changes
Lucene's Lucene.Net.Search.Function.ValueSource
based functionality, was consolidated
into Lucene.Net
/Lucene.Net.Queries
along with Solr's similar functionality. The following classes were moved:
Lucene.Net.Search.Function.CustomScoreQuery
->Lucene.Net.Queries.CustomScoreQuery
Lucene.Net.Search.Function.CustomScoreProvider
->Lucene.Net.Queries.CustomScoreProvider
Lucene.Net.Search.Function.NumericIndexDocValueSource
->Lucene.Net.Queries.Function.ValueSource.NumericIndexDocValueSource
The following lists the replacement classes for those removed:
Lucene.Net.Search.Function.DocValues
->Lucene.Net.Queries.Function.DocValues
Lucene.Net.Search.Function.FieldCacheSource
->Lucene.Net.Queries.Function.ValueSources.FieldCacheSource
Lucene.Net.Search.Function.FieldScoreQuery
->Lucene.Net.Queries.Function.FunctionQuery
Lucene.Net.Search.Function.FloatFieldSource
->Lucene.Net.Queries.Function.ValueSources.FloatFieldSource
Lucene.Net.Search.Function.IntFieldSource
->Lucene.Net.Queries.Function.ValueSources.IntFieldSource
Lucene.Net.Search.Function.OrdFieldSource
->Lucene.Net.Queries.Function.ValueSources.OrdFieldSource
Lucene.Net.Search.Function.ReverseOrdFieldSource
->Lucene.Net.Queries.Function.ValueSources.ReverseOrdFieldSource
Lucene.Net.Search.Function.ShortFieldSource
->Lucene.Net.Queries.Function.ValueSources.ShortFieldSource
Lucene.Net.Search.Function.ValueSource
->Lucene.Net.Queries.Function.ValueSource
Lucene.Net.Search.Function.ValueSourceQuery
->Lucene.Net.Queries.Function.FunctionQuery
DocValues
are now named FunctionValues
, to not confuse with Lucene's per-document values.
LUCENE-2392: Enable flexible scoring
The existing Similarity
API is now TFIDFSimilarity
, if you were extending
Similarity
before, you should likely extend this instead.
Weight.Normalize()
no longer takes a norm value that incorporates the top-level
boost from outer queries such as BooleanQuery
, instead it takes 2 parameters,
the outer boost (topLevelBoost
) and the norm. Weight.SumOfSquaredWeights
has
been renamed to Weight.GetValueForNormalization()
.
The ScorePayload()
method now takes a BytesRef
. It is never null
.
LUCENE-3283: Query parsers moved to separate module
Lucene's core Lucene.Net.QueryParsers
query parsers have been consolidated into lucene/queryparser,
where other QueryParser
s from the codebase will also be placed. The following classes were moved:
Lucene.Net.QueryParsers.CharStream
->Lucene.Net.QueryParsers.Classic.CharStream
Lucene.Net.QueryParsers.FastCharStream
->Lucene.Net.QueryParsers.Classic.FastCharStream
Lucene.Net.QueryParsers.MultiFieldQueryParser
->Lucene.Net.QueryParsers.Classic.MultiFieldQueryParser
Lucene.Net.QueryParsers.ParseException
->Lucene.Net.QueryParsers.Classic.ParseException
Lucene.Net.QueryParsers.QueryParser
->Lucene.Net.QueryParsers.Classic.QueryParser
Lucene.Net.QueryParsers.QueryParserBase
->Lucene.Net.QueryParsers.Classic.QueryParserBase
Lucene.Net.QueryParsers.QueryParserConstants
->Lucene.Net.QueryParsers.Classic.QueryParserConstants
Lucene.Net.QueryParsers.QueryParserTokenManager
->Lucene.Net.QueryParsers.Classic.QueryParserTokenManager
Lucene.Net.QueryParsers.QueryParserToken
->Lucene.Net.QueryParsers.Classic.Token
Lucene.Net.QueryParsers.QueryParserTokenMgrError
->Lucene.Net.QueryParsers.Classic.TokenMgrError
LUCENE-2308, LUCENE-3453: Separate IndexableFieldType
from Field
instances
With this change, the indexing details (indexed, tokenized, norms,
indexOptions, stored, etc.) are moved into a separate FieldType
instance (rather than being stored directly on the Field
).
This means you can create the FieldType instance once, up front, for a given field, and then re-use that instance whenever you instantiate the Field.
Certain field types are pre-defined since they are common cases:
StringField
: indexes aString
value as a single token (ie, does not tokenize). This field turns off norms and indexes only doc IDS (does not index term frequency nor positions). This field does not store its value, but exposesTYPE_STORED
as well.TextField
: indexes and tokenizes aString
,Reader
orTokenStream
value, without term vectors. This field does not store its value, but exposesTYPE_STORED
as well.StoredField
: field that stores its valueDocValuesField
: indexes the value as aDocValues
fieldNumericField
: indexes the numeric value so thatNumericRangeQuery
can be used at search-time.
If your usage fits one of those common cases you can simply
instantiate the above class. If you need to store the value, you can
add a separate StoredField
to the document, or you can use
TYPE_STORED
for the field:
Field f = new Field("field", "value", StringField.TYPE_STORED);
Alternatively, if an existing type is close to what you want but you need to make a few changes, you can copy that type and make changes:
FieldType bodyType = new FieldType(TextField.TYPE_STORED)
{
StoreTermVectors = true
};
You can of course also create your own FieldType
from scratch:
FieldType t = new FieldType
{
Indexed = true,
Stored = true,
OmitNorms = true,
IndexOptions = IndexOptions.DOCS_AND_FREQS
};
t.Freeze();
FieldType
has a Freeze()
method to prevent further changes.
There is also a deprecated transition API, providing the same Index
,
Store
, TermVector
enums from 3.x, and Field
constructors taking these
enums.
When migrating from the 3.x API, if you did this before:
new Field("field", "value", Field.Store.NO, Field.Indexed.NOT_ANALYZED_NO_NORMS)
you can now do this:
new StringField("field", "value")
(though note that StringField
indexes DOCS_ONLY
).
If instead the value was stored:
new Field("field", "value", Field.Store.YES, Field.Indexed.NOT_ANALYZED_NO_NORMS)
you can now do this:
new Field("field", "value", TextField.TYPE_STORED)
If you didn't omit norms:
new Field("field", "value", Field.Store.YES, Field.Indexed.NOT_ANALYZED)
you can now do this:
FieldType ft = new FieldType(TextField.TYPE_STORED)
{
OmitNorms = false
};
new Field("field", "value", ft)
If you did this before (value can be String
or TextReader
):
new Field("field", value, Field.Store.NO, Field.Indexed.ANALYZED)
you can now do this:
new TextField("field", value, Field.Store.NO)
If instead the value was stored:
new Field("field", value, Field.Store.YES, Field.Indexed.ANALYZED)
you can now do this:
new TextField("field", value, Field.Store.YES)
If in addition you omit norms:
new Field("field", value, Field.Store.YES, Field.Indexed.ANALYZED_NO_NORMS)
you can now do this:
FieldType ft = new FieldType(TextField.TYPE_STORED)
{
OmitNorms = true
};
new Field("field", value, ft)
If you did this before (bytes is a byte[]
):
new Field("field", bytes)
you can now do this:
new StoredField("field", bytes)
If you previously used the setter of Document.Boost
, you must now pre-multiply
the document boost into each Field.Boost
. If you have a
multi-valued field, you should do this only for the first Field
instance (ie, subsequent Field instance sharing the same field name
should only include their per-field boost and not the document level
boost) as the boost for multi-valued field instances are multiplied
together by Lucene.
Other changes
LUCENE-2674: A new
IdfExplain
method was added toSimilarity
(which is nowTFIDFSimilarity
), that accepts an incoming docFreq. If you subclassTFIDFSimilarity
, make sure you also override this method on upgrade, otherwise your customizations won't run for certainMultiTermQuery
s.LUCENE-2691: The near-real-time API has moved from
IndexWriter
toDirectoryReader
. Instead ofIndexWriter.GetReader()
, callDirectoryReader.Open(IndexWriter)
orDirectoryReader.OpenIfChanged(IndexWriter)
.LUCENE-2690:
MultiTermQuery
boolean rewrites per segment. AlsoMultiTermQuery.GetTermsEnum()
now takes anAttributeSource
.FuzzyTermsEnum
is both consumer and producer of attributes:MultiTermQuery.BoostAttribute
is added to theFuzzyTermsEnum
andMultiTermQuery
's rewrite mode consumes it. The other way roundMultiTermQuery.TopTermsBooleanQueryRewrite
supplies a globalAttributeSource
to each segmentsTermsEnum
. TheTermsEnum
is consumer and gets the current minimum competitive boosts (MultiTermQuery.MaxNonCompetitiveBoostAttribute
).LUCENE-2374: The backwards layer in
Attribute
was removed. To support correct reflection ofAttribute
instances, where the reflection was done using deprecatedToString()
parsing, you have to now overrideReflectWith()
to customize output.ToString()
is no longer implemented byAttribute
, so if you have overriddenToString()
, port your customization over toReflectWith()
.ReflectAsString()
would then return whatToString()
did before.LUCENE-2236, LUCENE-2912:
DefaultSimilarity
can no longer be set statically (and dangerously) for the entireAppDomain
.Similarity
can now be configured on a per-field basis (viaPerFieldSimilarityWrapper
)Similarity
has a lower-level API, if you want the higher-level vector-space API like in previous Lucene releases, then look atTFIDFSimilarity
.LUCENE-1076:
TieredMergePolicy
is now the default merge policy. It's able to merge non-contiguous segments; this may cause problems for applications that rely on Lucene's internal document ID assignment. If so, you should instead useLogByteSize
/DocMergePolicy
during indexing.LUCENE-3722:
Similarity
methods and collection/term statistics now takelong
instead ofint
(to enable distributed scoring of > 2B docs). For example, inTFIDFSimilarity
Idf(int, int)
is nowIdf(long, long)
.LUCENE-3559: The members
DocFreq()
andMaxDoc
onIndexSearcher
were removed, as these are no longer used by the scoring system. If you were using these casually in your code for reasons unrelated to scoring, call them on theIndexSearcher
's reader instead:IndexSearcher.IndexReader
. If you were subclassingIndexSearcher
and overriding these members to alter scoring, overrideIndexSearcher
'sTermStatistics()
andCollectionStatistics()
methods instead.LUCENE-3396:
Analyzer.TokenStream()
has been renamedAnalyzer.GetTokenStream()
.Analyzer.TokenStream()
has been made sealed..ReusableTokenStream()
has been removed. It is now necessary to useAnalyzer.GetTokenStreamComponents()
to define an analysis process.Analyzer
also has its own way of managing the reuse ofTokenStreamComponents
(either globally, or per-field). To define anotherStrategy
, implementReuseStrategy
.LUCENE-3464:
IndexReader.Reopen()
has been renamed toDirectoryReader.OpenIfChanged()
(a static method), and now returnsnull
(instead of the old reader) if there are no changes to the index, to prevent the common pitfall of accidentally closing the old reader.LUCENE-3687:
Similarity.ComputeNorm()
now expects aNorm
object to set the computed norm value instead of returning a fixed single byte value. Custom similarities can now set integer, float and byte values if a single byte is not sufficient.LUCENE-2621: Term vectors are now accessed via flexible indexing API. If you used
IndexReader.GetTermFreqVectors()
before, you should now useIndexReader.GetTermVectors()
. The new method returns aFields
instance exposing the inverted index of the one document. FromFields
you can enumerate all fields, terms, positions, offsets.LUCENE-4227: If you were previously using
Instantiated
index, you may want to useDirectPostingsFormat
after upgrading: it stores all postings in simple arrays (byte[]
for terms,int[]
for docs, freqs, positions, offsets). Note that this only covers postings, whereasInstantiated
covered all other parts of the index as well.LUCENE-3309: The expert
FieldSelector
API has been replaced withStoredFieldVisitor
. The idea is the same (you have full control over which fields should be loaded). Instead of a single accept method,StoredFieldVisitor
has aNeedsField()
method: if that method returnstrue
then the field will be loaded and the appropriate type-specific method will be invoked with that fields's value.LUCENE-4122: Removed the
Payload
class and replaced withBytesRef
.PayloadAttribute
's name is unchanged, it just uses theBytesRef
class to refer to the payload bytes/start offset/end offset (ornull
if there is no payload).