Running SolrMeter without a UI

What is SolrMeter

SolrMeter is a nice Java program that allows you to test performances of your Solr installation under heavy load. This tool is not capable of handling very high load with coordination of multiple clients, nor it can test installation with multiple Solr Machines with a round robin to distribute queries across all machines, but it does a really decent work in helping you to have some raw numbers about how your Solr installation perform.

SolrMeter can really helps you to understand how Solr perform especially if you need to compare different schema.xml solution, or different hardware or core configurations.

Using the tool is straightforward: you just need to follow the tutorial here, just compile the latest version with Maven, then it is just a matter of creating some files containing: the queries you want to issue (param q=) ,filter queries you want to use (param fq=) and finally a series of fields used for faceting.

In settings menu you can simply configure everything you need. The first tab allows you to specify all files that contains data to generate the query, as well as the address of the core/collection you want to test.

SolrMeter configuration to issue queries to a SolrCore 

Figure 1: A typical SolrMeter configuration

Now you can just specify how many query per second you want to issue to your server, press play and the test starts.

How to specify number of query per seconds and then running the test pressing start button.

Figure 2: Choose query per seconds and then run the test

What type of data you can grab from SolrMeter

The tool gives you lots of information and graphs that can allow you to understand how well your Solr installation performs with the set of queries you are issuing to the server. A typical Graph you can look at is the distribution of average response time of the query during the run.

Query history graph shows you the average execution time of a query during time

Figure 3: Query History

In Figure 3 I can see that the average query time for the first 10 seconds is about 375 ms, while for subsequent queries the average response time is under 50 ms. Another interesting graph is distribution of the queries execution time during the load.

This graph shows the distribution of response time for queries

Figure 4: Average response time for queries

In X axes you have execution time, in Y axes you have number of queries. From the Figure 4 graph you can verify that almost of the queries were executed under 500ms, with some high peak of some queries that needed 12 seconds to be executed.

Running without a UI

There are other cool features about SolrMeter, but I want to show you a feature that is less known and that is really useful, the ability to run without a UI. Once you’ve setup a test load with UI, you can choose file/export to export the actual SolrMeter configuration to a file.

Exporting configuration to a file generates a nice XML file with all the options configured with the UI.

Once you have exported the configuration file, you can copy it, along with SolrMeter jar file and all the various configuration files (the ones with queries, filters, etc) to your target server with SSH (solr is usually installed in a Linux machine without UI).

Once all the files were copied, it is really simple to edit the main XML configuration files if you need to change something. You can update the address of the core to test, he location of the various configuration files, etc.

Running SolrMeter in the very same machine that is running Solr is not usually a good advice, but there are scenarios where running without a UI is the only way to go. As an example, consider an installation where you have your Solr Machines and a web application server installed on Linux without an UI and in an isolated environment where you can access only with SSH. In this scenario, to run a load test you should run the test from the Web Application machine, to really simulate a real scenario from the web application machine to the Solr Machine.

In many scenario you should run your Load test from the machine that is accessing solar, and it has no UI.

Now you need to create an xml file, usually named solrmeter.smc.xml that contains all solrmeter configuration for the run. Here is a possible content.

<?xml version="1.0" encoding="UTF-8" standalone="no"?>

<!DOCTYPE properties SYSTEM "">


<comment>Solr Meter Configuration File. Generated at 15-gen-2016</comment>

<entry key="solr.query.queryMode">standard</entry>

<entry key="headless.performQueryOperations">true</entry>

<entry key="solr.update.documentsToCommit">100</entry>

<entry key="executor.optimizeExecutor">ondemand</entry>

<entry key="solr.server.configuration.httpAuthUser"/>

<entry key="statistic.refreshTime">2000</entry>

<entry key="solr.queryMethod">GET</entry>

<entry key="guice.headlessModule">com.plugtree.solrmeter.HeadlessModule</entry>

<entry key="statistic.configuration.filePath">statistics-config.xml</entry>

<entry key="headless.numUpdates">100</entry>

<entry key="solr.query.useFacets">true</entry>

<entry key="solr.query.useFilterQueries">true</entry>

<entry key="solr.server.configuration.followRedirect">false</entry>

<entry key="solr.query.addRandomExtraParams">true</entry>

<entry key="solr.queriesFiles">C:\Develop\xxxxxx\SolrMeter\Test1\Set1_Queries.txt</entry>

<entry key="solrConnectedButton.pingInterval">5000</entry>

<entry key="statistic.timeRange.range501_1000">true</entry>

<entry key="solr.query.filterQueriesFile">C:\Develop\xxxxxx\Tools\SolrMeter\Test1\Set1_FilterQueries.txt</entry>

<entry key="guice.solrMeterRunModeModule">com.plugtree.solrmeter.SolrMeterRunModeModule</entry>

<entry key="solr.update.timeToCommit">10000</entry>

<entry key="guice.standalonePresentationModule">com.plugtree.solrmeter.StandalonePresentationModule</entry>

<entry key="statistic.timeRange.range0_500">true</entry>

<entry key="solr.documentIdField">id</entry>

<entry key="solr.update.solrAutocommit">false</entry>

<entry key="solr.documentFieldsFile">C:\Develop\xxxxxx\Tools\SolrMeter\Test1\Set1_fields.txt</entry>

<entry key="solr.testTime">1</entry>

<entry key="files.charset">UTF-8</entry>

<entry key="statistic.timeRange.range1001_2000">true</entry>

<entry key="solr.server.configuration.allowCompression">true</entry>

<entry key="solr.server.configuration.soTimeout">60000</entry>

<entry key="headless.outputDirectory">./solrmeter-headless</entry>

<entry key="statistic.showingStatistics">all</entry>

<entry key="solr.query.echoParams">false</entry>

<entry key="solr.query.extraParameters">indent=true,debugQuery=false</entry>

<entry key="executor.queryExecutor">random</entry>

<entry key="solr.searchUrl">http://localhost:8983/solr/MyCore</entry>

<entry key="guice.modelModule">com.plugtree.solrmeter.ModelModule</entry>

<entry key="solr.query.extraParams">C:\Develop\xxxxxx\Tools\SolrMeter\Test1\Set1_extraparams.txt</entry>

<entry key="solr.server.configuration.maxTotalConnections">1000000</entry>

<entry key="statistic.timeRange.range2001_2147483647">true</entry>

<entry key="solr.errorLogStatistic.maxStored">400</entry>

<entry key="guice.statisticsModule">com.plugtree.solrmeter.StatisticsModule</entry>

<entry key="executor.updateExecutor">random</entry>

<entry key="solr.updatesFiles"/>

<entry key="headless.performUpdateOperations">false</entry>

<entry key="headless.numQueries">100</entry>

<entry key="solr.load.queriespersecond">5</entry>

<entry key=""/>

<entry key="solr.server.configuration.maxRetries">1</entry>

<entry key="solr.addUrl">http://localhost:8983/solr/techproducts</entry>

<entry key="solr.load.updatespersecond">1</entry>

<entry key="solr.server.configuration.defaultMaxConnectionsPerHost">100000</entry>

<entry key="solr.queryLogStatistic.maxStored">1000</entry>

<entry key="solr.query.facetMethod">fc</entry>

<entry key="solr.server.configuration.connectionTimeout">60000</entry>


As you can see you have various option to configure and all of them are really self-explanatory. If you are interested, in this github issue lots of the option are explained. The most important options are the address of core and the path of files that contains queries. Now you only need to run solrmeter in headless mode with the following command line (note the solrmeter.runMode=headless):

java -Dsolrmeter.runMode=headless -Dsolrmeter.configurationFile=solrmeter.smc.xml -jar solrmeter-0.3.1-SNAPSHOT-jar-with-dependencies.jar

Then wait for the test to finish and you will find all the output in the folder you specify with the headless.outputDirectory settings. Now you only need to grab the result, put in an excel spreadsheet and do whatever you want with the data.

Thanks to the headless mode, you can run SolrMeter even in scenario where you have not an Ui, and most important you can automate the test with a script, because you have not the need to interact with a UI.

Gian Maria.

Index documents content with Solr and Tika

I’ve blogged in the past about indexing entire folders of documents with solr and Tika with Data Import Handler. This approach has pro and cons. On the good side, once you’ve understand the basics, setting everything up and running is a matter of a couple of hours max, on the wrong side, using a DIH gives you little controls over the entire process.

As an example, I’ve had problem with folder with jpg images, because the extractor crashes due to a missing library. If you do not configure correctly the import handler, every error stops the entire import process. Another problem is that document content is not subdivided into pages even if Tika can give you this kind of information. Finally, you need to have all of your documents inside a folder to be indexed. In real situation it is quite often preferable to have more control over the index process. Lets examine how you can use tika from your C# code.

The easiest way is directly invoking the tika.jar file with Java, it is quick and does not requires any other external library, just install java and uncompress tika in a local folder.

public TikaDocument ExtractDataFromDocument(string pathToFile)

    var arguments = String.Format("-jar \"{0}\" -h \"{1}\"", Configuration.TikaJarLocation, pathToFile);

    using (Process process = new Process())
        process.StartInfo.FileName = Configuration.JavaExecutable;
        process.StartInfo.Arguments = arguments;
        process.StartInfo.WorkingDirectory = Path.GetDirectoryName(pathToFile);
        process.StartInfo.WindowStyle = ProcessWindowStyle.Minimized;
        process.StartInfo.UseShellExecute = false;
        process.StartInfo.ErrorDialog = false;
        process.StartInfo.CreateNoWindow = true;
        process.StartInfo.RedirectStandardOutput = true;
        var result = process.Start();
        if (!result) return TikaDocument.Error;
        var fullContent = process.StandardOutput.ReadToEnd();
        return new TikaDocument(fullContent);


This snippet of code simply invoke Tika passing the file you want to analyze as argument, it uses standard System.Diagnostics.Process .NET object and intercept all standard output to grab Tika output. This output is parsed with an helper object called TikaDocument that takes care of understanding how the document is structured. If you are interested in the code you can find everything in the included sample, but it is just a matter of HTML parsing with HtmlAgilityToolkit. ES.

Meta = new MetaHelper(meta);
var pagesList = new List<TikaPage>();
Pages = pagesList;
Success = true;
FullHtmlContent = fullContent;
HtmlDocument doc = new HtmlDocument();
FullTextContent = HttpUtility.HtmlDecode(doc.DocumentNode.InnerText);

var titleNode = doc.DocumentNode.SelectSingleNode("//title");
if (titleNode != null) 
    Title = HttpUtility.HtmlDecode(titleNode.InnerText);

var pages = doc.DocumentNode.SelectNodes(@"//div[@class='page']");
if (pages != null)
    foreach (var page in pages)
        pagesList.Add(new TikaPage(page));
var metaNodes = doc.DocumentNode.SelectNodes("//meta");
if (metaNodes != null)
    foreach (var metaNode in metaNodes)

Thanks to TikaDocument class you can index content of single pages, in my example I simply send to Solr the entire content of the document (I do not care subdividing document into pages). This is the XML message for standard document update

public System.Xml.Linq.XDocument SolarizeTikaDocument(String fullPath, TikaDocument document)
    XElement elementNode;
    XDocument doc = new XDocument(
        new XElement("add", elementNode = new XElement("doc")));

    elementNode.Add(new XElement("field", new XAttribute("name", "id"), fullPath));
    elementNode.Add(new XElement("field", new XAttribute("name", "fileName"), Path.GetFileName(fullPath)));
    elementNode.Add(new XElement("field", new XAttribute("name", "title"), document.Title));
    elementNode.Add(new XElement("field", new XAttribute("name", "content"), document.FullTextContent));
    return doc;

To mimic how DIH works, you can use File System Watcher to monitor a folder, and index the document as soon some of the documents gets updated or added. In my sample I only care about file being added to the directory,

static void watcher_Created(object sender, FileSystemEventArgs e)
    var document = _tikaHandler.ExtractDataFromDocument(e.FullPath);
    var solrDocument = _solarizer.SolarizeTikaDocument(e.FullPath, document);

This approach is more complex than using a plain DIH but gives you more control over the entire process and it is also suitable if documents are stored inside databases or in other locations.

Code is available here:

Gian Maria.

Index a folder of multilanguage documents in Solr with Tika

Previous Posts on the serie

Everything is up and running, but now requirements change, documents can have multiple languages (italian and english in my scenario) and we want to do the simplest thing that could possibly work. First of all I change the schema of the core in solr to support language specific fields with wildcards.


Figure 1: Configuration of solr core to support multiple language field.

This is a simple modification, all fields are indexed and stored (for highlighting) and multivalued. Now we can leverage another interesting functionality of Solr+Tika, an update handler that identifies the language of every document that got indexed. This time we need to modify solrconfig.xml file, locating the section of the /update handler and modify in this way.

<requestHandler name="/update" class="solr.UpdateRequestHandler">
   <lst name="defaults">
	 <str name="update.chain">langid</str>

<updateRequestProcessorChain >
  <processor name="langid" class="org.apache.solr.update.processor.TikaLanguageIdentifierUpdateProcessorFactory">
	<lst name="defaults">
	  <bool name="langid">true</bool>
	  <str name="langid.fl">title,content</str>
	  <str name="langid.langField">lang</str>
	  <str name="langid.fallback">en</str>
	  <bool name="">true</bool>
	  <bool name="">true</bool>
  <processor class="solr.LogUpdateProcessorFactory" />
  <processor class="solr.RunUpdateProcessorFactory" />

I use a TikaLanguageIndentifierUpdateProcessorFactory to identify the language of documents, this runs for every documents that gets indexed, because it is injected in the chain of UpdateRequests. The configuration is simple and you can find full details in solr wiki. Basically I want it to analyze both the title and content field of the document and enable mapping of fields. This means that if the document is detected as Italian language it will contain content_it and title_it fields not only content field. Thanks to previous modification of solr.xml schema to match dynamicField with the correct language all content_xx files are indexed using the correct language.

This way to proceed consumes memory and disk space, because for each field I have the original Content stored as well as the content localized, but it is needed for highlighting and makes my core simple to use.

Now I want to be able to do a search in this multilanguage core, basically I have two choices:

  • Identify the language of terms in query and query the correct field
  • Query all the field with or.

Since detecting language of term used in query gives a lots of false positive, the secondo technique sounds better. Suppose you want to find italian term “tipografia”, You can issue query: content_it:tipografia OR content_en:tipografia. Everything works as expected as you can see from the following picture.


Figure 2: Sample search in all content fields.

Now if you want highlights in the result, you must specify all localized fields, you cannot simply use Content field. As an example, if I simply ask to highlight the result of previous query using original content field, I got no highlight.


Figure 3: No highlighting found if you use the original Content field.

This happens because the match in the document was not an exact match, I ask for word tipografia but in my document the match is on the term tipografo, thanks to language specific indexing Solr is able to match with stemming, this a typical full text search. The problem is, when is time to highlight, if you specify the content field, solr is not able to find any match of word tipografia in it, so you got no highlight.

 To avoid problem, you should specify all localized fields in hl parameters, this has no drawback because a single document have only one non-null localized field and the result is the expected one:


Figure 4: If you specify localized content fields you can have highlighting even with a full-text match.

In this example when is time to highlight Solr will use both content_it and content_en. In my document content_en is empty, but Solr is able to find a match in content_it and is able to highlight with the original content, because content_it has stored=”true” in configuration.

Clearly using a single core with multiple field can slow down performances a little bit, but probably is the easiest way to deal to index Multilanguage files  automatically with Tika and Solr.

Gian Maria.

Installing Solr on Tomcat on windows, Error solr SEVERE: Error filterStart

If you are used in installing Solr in Windows environment and you install for the first time a version greater than 4.2.1 you can have trouble in letting your Solr server to start. The symptom is: service is stopped in Tomcat Application Manager and if you press start you got a simple error telling you that the application could not start.

To troubleshoot these kind of problems, you can go to Tomcat Log directory and looking at Catilina log, but usually you probably find a little information there.

Mar 06, 2014 7:02:07 PM org.apache.catalina.core.StandardContext startInternal
SEVERE: Error filterStart
Mar 06, 2014 7:02:07 PM org.apache.catalina.core.StandardContext startInternal
SEVERE: Context [/solr47] startup failed due to previous errors

The reason of this is a change in the logging subsystem done in version 4.2.1, that is explained in the installation guide: Switching from Log4J back to JUL. I’ve blogged about this problem in the past, but it seems to still bite some person so it worth spending another post on the subject. The solution is in the above link, but essentially you should open the folder where you unzipped solr distribution, go to the solr/example/ext and copy all jar files you find there inside Tomcat Lib subdirectory.


Figure 1: Jar files needed by Solr to start

After you copied these jar files into Tomcat lib directory you should restart Tomcat and now Solr should starts without problem.


Figure 2: Et Voilà, Solr is started.

Gian Maria.

Install Solr 4.3, pay attention to log libraries

After I configured Solr 4.3 on a Virtual Machine (side by side with a 4.0) it refuses to start, and the only error I have in catilina log files is

SEVERE: Error filterStart

This leaved me puzzled, but thanks to Alexandre and the exceptional Solr Mailing list I was directed toward the solution. Solr 4.3 changed logging mechanism; and in this link you can read about what changed and how to enable logging for Solr 4.3.

It turns out that I’ve entirely missed this step

  • Copy the jars from solr/example/lib/ext into your container’s main lib directory. For tomcat this is usually tomcat/lib. These jars will set up SLF4J and log4j.

And this is the only reason why my Solr Instance refused to start, after libs are inside Tomcat/lib everything works as expected. It could be not your problem, but once logging libraries are there, surely you will get a better log that will help you troubleshoot why Solr refuses to start.

Gian Maria