Namespace Lucene.Net.Replicator
Files replication framework
The Replicator allows replicating files between a server and client(s). Producers publish revisions and consumers update to the latest revision available. ReplicationClient is a helper utility for performing the update operation. It can be invoked either manually or periodically by starting an update thread. HttpReplicator can be used to replicate revisions by consumers that reside on a different node than the producer.
The replication framework supports replicating any type of files, with built-in support for a single search index as well as an index and taxonomy pair. For a single index, the application should publish an IndexRevision and set IndexReplicationHandler on the client. For an index and taxonomy pair, the application should publish an IndexAndTaxonomyRevision and set IndexAndTaxonomyReplicationHandler on the client.
When the replication client detects that there is a newer revision available, it copies the files of the revision and then invokes the handler to complete the operation (e.g. copy the files to the index directory, sync them, reopen an index reader etc.). By default, only files that do not exist in the handler's current revision files are copied, however this can be overridden by extending the client.
Using the ReplicatorService
Because there are a number of different hosting frameworks to choose from on .NET and they don't implement common abstractions for requests and responses, the ReplicatorService provides abstractions so that it can be integrated easily into any of these frameworks.
To integrate the replicator into an existing hosting framework, the IReplicationRequest and IReplicationResponse interfaces must be implemented for the chosen framework.
An ASP.NET Core Implementation
Below is an example of how these wrappers can be implemented for the ASP.NET Core framework. The example only covers the absolute minimum needed in order for it to become functional within ASP.NET Core.
It does not go as far as to implement custom middleware, action results for controllers or anything else, while this would be a natural to do, such implementations extends beyond the scope of this document.
ASP.NET Core Request Wrapper
The first thing to do is to wrap the ASP.NET Core Request object in a class that implements the IReplicationRequest interface. This is very straight forward.
// Wrapper class for the Microsoft.AspNetCore.Http.HttpRequest
public class AspNetCoreReplicationRequest : IReplicationRequest
{
private readonly HttpRequest request;
// Inject the actual request object in the constructor.
public AspNetCoreReplicationRequest(HttpRequest request)
=> this.request = request;
// Provide the full path relative to the host.
// In the common case in AspNetCore this should just return the full path, so PathBase + Path are concatenated and returned.
//
// The path expected by the ReplicatorService is {context}/{shard}/{action} where:
// - action may be Obtain, Release or Update
// - context is the same context that is provided to the ReplicatorService constructor and defaults to '/replicate'
public string Path
=> request.PathBase + request.Path;
// Return values for parameters used by the ReplicatorService
// The ReplicatorService will call this with:
// - version: The index revision
// - sessionid: The ID of the session
// - source: The source index for the files
// - filename: The file name
//
// In this implementation a exception is thrown in the case that parameters are provided multiple times.
public string QueryParam(string name)
=> request.Query[name].SingleOrDefault();
}
ASP.NET Core Response Wrapper
Secondly the ASP.NET Core Response object is wrapped in a class that implements the IReplicationResponse interface. This is also very straight forward.
// Wrapper class for the Microsoft.AspNetCore.Http.HttpResponse
public class AspNetCoreReplicationResponse : IReplicationResponse
{
private readonly HttpResponse response;
// Inject the actual response object in the constructor.
public AspNetCoreReplicationResponse(HttpResponse response)
=> this.response = response;
// Getter and Setter for the http Status code, in case of failure the ReplicatorService will set this
// Property.
public int StatusCode
{
get => response.StatusCode;
set => response.StatusCode = value;
}
// Return a stream where the ReplicatorService can write to for the response.
// Depending on the action either the file or the sesssion token will be written to this stream.
public Stream Body => response.Body;
// Called when the ReplicatorService is done writing data to the response.
// Here it is mapped to the flush method on the "body" stream on the response.
public void Flush() => response.Body.Flush();
}
ASP.NET Core Utility Extension Method
This part is not nessesary, however by providing a extension method as a overload to the ReplicatorService Perform method that instead takes the ASP.NET Core HttpRequest and HttpResponse response objects, it's easier to call the ReplicatorService from either ASP.NET Core MVC controllers, inside of middleare or for the absolute minimal solution directly in the delegate parameter of a IApplicationBuilder.Run() method.
public static class AspNetCoreReplicationServiceExtentions
{
// Optionally, provide a extension method for calling the perform method directly using the specific request
// and response objects from AspNetCore
public static void Perform(this ReplicationService self, HttpRequest request, HttpResponse response)
=> self.Perform(
new AspNetCoreReplicationRequest(request),
new AspNetCoreReplicationResponse(response));
}
Using the Implementation
Now the implementation can be used within ASP.NET Core in order to service Lucene.NET Replicator requests over HTTP.
In order to enable replication of indexes, the <xref:Lucene.Net.Index.IndexWriter> that writes the index should be created with a <xref:Lucene.Net.Index.SnapshotDeletionPolicy>.
IndexWriterConfig config = new IndexWriterConfig(...ver..., new StandardAnalyzer(...ver...));
config.IndexDeletionPolicy = new SnapshotDeletionPolicy(config.IndexDeletionPolicy);
IndexWriter writer = new IndexWriter(FSDirectory.Open("..."), config);
For the absolute minimal solution we can wire the ReplicationService up on the server side as:
LocalReplicator replicator = new LocalReplicator();
ReplicatorService service = new ReplicationService(new Dictionary<string, IReplicator>{
["shard_name"] = replicator
}, "/api/replicate");
app.Map("/api/replicate", builder => {
builder.Run(async context => {
await Task.Yield();
service.Perform(context.Request, context.Response);
});
});
Now in order to publish a Revision call the Publish() method in the LocalReplicator:
IndexWriter writer = ...;
LocalReplicator replicator = ...;
replicator.Publish(new IndexRevision(writer));
On the client side create a new HttpReplicator and start replicating, e.g.:
IReplicator replicator = new HttpReplicator("http://{host}:{port}/api/replicate/shard_name");
ReplicationClient client = new ReplicationClient(
replicator,
new IndexReplicationHandler(
FSDirectory.Open(...directory...),
() => ...onUpdate...),
new PerSessionDirectoryFactory(...temp-working-directory...));
//Now either start the Update Thread or do manual pulls periodically.
client.UpdateNow(); //Manual Pull
client.StartUpdateThread(1000, "Replicator Thread"); //Pull automatically every second if there is any changes.
From here it would be natural to use a SearcherManager over the directory in order to get Searchers updated automatically. But this cannot be created before the first actual replication as the SearcherManager will fail because there is no index.
We can use the onUpdate handler to perform the first initialization in this case.
Classes
IndexAndTaxonomyReplicationHandler
A IReplicationHandler for replication of an index and taxonomy pair. See IReplicationHandler for more detail. This handler ensures that the search and taxonomy indexes are replicated in a consistent way.
IndexAndTaxonomyRevision
A IRevision of a single index and taxonomy index files which comprises the list of files from both indexes. This revision should be used whenever a pair of search and taxonomy indexes need to be replicated together to guarantee consistency of both on the replicating (client) side.
IndexInputStream
A Stream which wraps an Lucene.Net.Store.IndexInput.
IndexReplicationHandler
A IReplicationHandler for replication of an index. Implements RevisionReady(string, IDictionary<string, IList<RevisionFile>>, IDictionary<string, IList<string>>, IDictionary<string, Directory>) by copying the files pointed by the client resolver to the index Lucene.Net.Store.Directory and then touches the index with Lucene.Net.Index.IndexWriter to make sure any unused files are deleted.
IndexRevision
A IRevision of a single index files which comprises the list of files that are part of the current Lucene.Net.Index.IndexCommit. To ensure the files are not deleted by Lucene.Net.Index.IndexWriter for as long as this revision stays alive (i.e. until Release(), the current commit point is snapshotted, using Lucene.Net.Index.SnapshotDeletionPolicy (this means that the given writer's Lucene.Net.Index.IndexWriterConfig.IndexDeletionPolicy should return Lucene.Net.Index.SnapshotDeletionPolicy).
When this revision is Release()d, it releases the obtained snapshot as well as calls Lucene.Net.Index.IndexWriter.DeleteUnusedFiles() so that the snapshotted files are deleted (if they are no longer needed).LocalReplicator
A IReplicator implementation for use by the side that publishes IRevisions, as well for clients to CheckForUpdate(string) check for updates}. When a client needs to be updated, it is returned a SessionToken through which it can ObtainFile(string, string, string) the files of that revision. As long as a revision is being replicated, this replicator guarantees that it will not be Release().
Replication sessions expire by default afterPerSessionDirectoryFactory
A ISourceDirectoryFactory which returns Lucene.Net.Store.FSDirectory under a dedicated session directory. When a session is over, the entire directory is deleted.
ReplicationClient
A client which monitors and obtains new revisions from a IReplicator. It can be used to either periodically check for updates by invoking StartUpdateThread(long, string), or manually by calling UpdateNow().
Whenever a new revision is available, the RequiredFiles(IDictionary<string, IList<RevisionFile>>) are copied to the Lucene.Net.Store.Directory specified by PerSessionDirectoryFactory and a handler is notified.RevisionFile
Describes a file in a IRevision. A file has a source, which allows a single revision to contain files from multiple sources (e.g. multiple indexes).
SessionExpiredException
Exception indicating that a revision update session was expired due to lack of activity.
SessionToken
Token for a replication session, for guaranteeing that source replicated files will be kept safe until the replication completes.
SnapshotDirectoryTaxonomyIndexWriterFactory
An implementation of Lucene.Net.Facet.Taxonomy.Directory.DirectoryTaxonomyIndexWriterFactory which sets the underlying Lucene.Net.Index.IndexWriter's Lucene.Net.Index.IndexDeletionPolicy to Lucene.Net.Index.SnapshotDeletionPolicy.
Interfaces
IReplicationHandler
Handler for revisions obtained by the client.
IReplicator
An interface for replicating files. Allows a producer to Publish(IRevision)IRevisions and consumers to CheckForUpdate(string). When a client needs to be updated, it is given a SessionToken through which it can ObtainFile(string, string, string) the files of that revision. After the client has finished obtaining all the files, it should Release(string) the given session, so that the files can be reclaimed if they are not needed anymore.
A client is always updated to the newest revision available. That is, if a client is on revision r1 and revisions r2 and r3 were published, then when the client will next check for update, it will receive r3.IRevision
A revision comprises lists of files that come from different sources and need to be replicated together to e.g. guarantee that all resources are in sync. In most cases an application will replicate a single index, and so the revision will contain files from a single source. However, some applications may require to treat a collection of indexes as a single entity so that the files from all sources are replicated together, to guarantee consistency beween them. For example, an application which indexes facets will need to replicate both the search and taxonomy indexes together, to guarantee that they match at the client side.
ISourceDirectoryFactory
Resolves a session and source into a Lucene.Net.Store.Directory to use for copying the session files to.