Introduction

In a previous post, we covered the steps to build a custom Lucene analyzer. In this second part, I'll go over the steps to register our analyzer as an Elasticsearch plugin.

All of the code below, as well as instructions for installation and use, is available on github at https://github.com/CitrineInformatics/plugin-plussign

Integrating with Elasticsearch

To use our custom analyzer, we're going to have to register it using a plugin. For the most part, this is boilerplate code. We'll need to set up the following classes:

* `PlusSignAnalyzerProvider extends AbstractIndexAnalyzerProvider`
* `PlusSignBinderProcessor extends AnalysisModule.AnalysisBinderProcessor`
* `PlusSignPlugin extends AbstractPlugin`

I'll go through each of these classes and then show how we can use our new plugin in Elasticsearch.

Building an analyzer provider

A provider is used to generate an instance of a PlusSignAnalyzer object. We just need to create a constructor for our class and override the get() function. We'll use the variable `NAME` to store the name of the analyzer that it's registered under in Elasticsearch. This will be the name that you use to refer to the custom analyzer whenever calling it from Elastisearch.


public class PlusSignAnalyzerProvider extends
	AbstractIndexAnalyzerProvider {
		
		/* Constructor. Nothing special here. */
		@Inject
		public PlusSignAnalyzerProvider(Index index,
		@IndexSettings Settings indexSettings, Environment env,
		@Assisted String name, @Assisted Settings settings) throws IOException {
        		super(index, indexSettings, name, settings);
		}
		
		/* This function needs to be overridden to return an instance of
		 * PlusSignAnalyzer.
		 */
		@Override
		public PlusSignAnalyzer get() {
			return this.analyzer;
		}
		
		/* Instance of PlusSignAnalyzer class that is returned by this class. */
		protected PlusSignAnalyzer analyzer = new PlusSignAnalyzer();
		
		/* Name to associate with this class. We will use this in
		 * PlusSignBinderProcessor.
		 */
		public static final String NAME = "plus_sign";
	}
    

Building a binder processor

The binder processor class is used to register the provider we just created, saving its name and class in a list of analyzers that are being used by Elasticsearch.



public class PlusSignBinderProcessor extends
	AnalysisModule.AnalysisBinderProcessor {
		
		/* This is the only function that you need. It simply adds our
		 * PlusSignAnalyzerProvider class to a list of bindings.
		 */
		@Override
		public void processAnalyzers(AnalyzersBindings analyzersBindings) {
			analyzersBindings.processAnalyzer(PlusSignAnalyzerProvider.NAME,
				PlusSignAnalyzerProvider.class);
		}
	}


Building an Elasticsearch plugin

Finally, we're ready to define our custom plugin. We need to register the name and description of the plugin and set the onModule function. In this case, we need to add an instance of our PlusSignBinderProcessor class to analysis modules. This class is the one that Elasticsearch uses to register our custom analyzer.


public class PlusSignPlugin extends AbstractPlugin {
		
		/* Set the name that will be assigned to this plugin. */
		@Override
		public String name() {
			return "plugin-plussign";
		}
		
		/* Return a description of this plugin. */
		@Override
		public String description() {
			return "Analyzer to split a string at + symbols, remove tokens " +
				"containing empty strings, and convert all strings to " +
				"lowercase";
		}
		
		/* This is the function that will register our analyzer with
		 * Elasticsearch.
		 */
		public void onModule(AnalysisModule analysisModule) {
			analysisModule.addProcessor(new PlusSignBinderProcessor());
		}
	}



Installing and using our plugin with Elasticsearch

The last file that we need to generate is named `es-plugin.properties`. It contains a single line, which for our plugin is:


    plugin=io.citrine.pluginplussign.plugin.PlusSignPlugin

where we have packaged our PlusSignPlugin class in io.citrine.pluginplussign.plugin. Elasticsearch will expect that this file exists in the class path when initializing a plugin.

Now that we have built all of the code to register our analyzer plugin, we will need to install it in Elasticsearch. Making sure that Elasticsearch is not running, we can install the plugin from the top directory of Elasticsearch using:


bin/plugin --url file:///path/to/plugin --install plugin-plussign

   where `<path>/<to>/<plugin>` is the path the plugin that we are installing. Start Elasticsearch with:


bin/elasticsearch

and you should see that our plugin has been registered - something like the following will be printed to standard out:

    [2015-02-10 12:47:04,398][INFO][plugins] [Scorpion] loaded [plugin-plussign], sites []

With our plugin installed, we can test it by issuing the following command:


curl -XPUT 'localhost:9200/test'
curl -XGET 'localhost:9200/test/_analyze?analyzer=plus_sign&pretty=true' -d 'This+is++some+text'

Here, the first line creates an Elasticsearch index named test, and the second line analyzes the phase "This+is++some+text" using our custom analyzer. For the second command, you should see the following response from Elasticsearch:


{
      "tokens" : [ {
        "token" : "this",
        "start_offset" : 0,
        "end_offset" : 0,
        "type" : "word",
        "position" : 1
      }, {
        "token" : "is",
        "start_offset" : 0,
        "end_offset" : 0,
        "type" : "word",
        "position" : 2
      }, {
        "token" : "some",
        "start_offset" : 0,
        "end_offset" : 0,
        "type" : "word",
        "position" : 3
      }, {
        "token" : "text",
        "start_offset" : 0,
        "end_offset" : 0,
        "type" : "word",
        "position" : 4
      } ]
    }



 You can see that our analyzer does exactly what we had intended, taking the string "This+is++some+text" and generating the series of tokens "this", "is", "some", and "text".

Of course you can customize Elasticsearch in even more complex and interesting ways. You can see how we have used Elasticsearch at http://www.citrination.com

3 Comments