Tired of troubleshooting idle search resources? Use the OpenSearch Benchmark for Performance Tuning By Noam Schwartz | November, 2022

Learn how to install OpenSearch Benchmark, create “workloads” and benchmark them between two computing devices

Photo by Ben White on Unsplash

OpenSearch users often want to know how their searches will perform in different environments, host types, and cluster configurations. The OpenSearch benchmark, a community-driven, open-source fork of Rally, is the ideal tool for that purpose.

OpenSearch-Benchmark helps you reduce infrastructure costs by optimizing OpenSearch resource usage. The tool also enables you to find performance regressions and improve performance by running periodic benchmarks. Before benchmarking, you should try several other steps to improve performance – a topic I discussed in an earlier article.

In this article, I will compare a widely used EC2 instance with a new computing accelerator – the Associative Processing Unit (APU) by Searchium.ai – through installing the OpenSearch benchmark and running search performance benchmarking.

We’ll be using a m5.4xlarge (us-west-1) EC2 machine, on which I installed OpenSearch and indexed a 9.1 M-sized vector index called lion_text. The index is a subset of the larger Lion dataset where I have converted the text fields to a vector representation (using the CLIP model):

Install Python 3.8+, including pip3, Git 1.9+, and the appropriate JDK to run OpenSearch. Make sure JAVA_HOME points to that JDK. Then run the following command:

sudo python3.8 -m pip install opensearch-benchmark

tip: You may need to install each dependency manually.

  • sudo apt install python3.8-dev
  • sudo apt install python3.8-distutils
  • python3.8 -m pip install multidict –upgrade
  • python3.8 -m pip install attrs — upgrade
  • python3.8 -m pip install yarl –upgrade
  • python3.8 -m pip install async_timeout –upgrade
  • python3.8 -m pip install charset_normalizer –upgrade
  • python3.8 -m pip install aiosignal — upgrade

To verify that the installation was successful, run the following:

opensearch-benchmark list workloads

You should see the following details:

screenshot by author

By default, the OpenSearch benchmark reports “in-memory”. If set to “in-memory”, all metrics will be kept in memory while the benchmark is running. If set to “OpenSearch”, all metrics will be written to a persistent metrics store, and the data will be available for further analysis.

To save the reported results to your OpenSearch cluster, open opensearch-benchmark.ini file, which can be found in ~/.benchmark Modify the Result Publishing section in the highlighted area to write to the folder and then to the OpenSearch cluster:

screenshot by author
Photo by Scott Blake on Unsplash

Now that we have the OpenSearch benchmark properly set up, it’s time to start benchmarking!

The plan is to use the OpenSearch benchmark to compare searches between the two computing devices. You can use the following method to benchmark and compare any example you want. In this example, we will test the commonly used KNN flat search (an ANN example using IVF and HNSW will be covered in my next article) and compare the APU to the m5.4xlarge EC2 example.

You can access the APU through a plugin downloaded from Searchium.ai’s SaaS platform. You can test the following benchmarking process on your own environment and data. A free trial is available, and registration is simple.

Each test/track in OpenSearch Benchmark is called a “workload”. We’ll build the workload to find at m5.4xlarge , which will serve as our baseline. We will also create a workload for discovery on the APU that will serve as our contender. Later, we will compare the performance of both workloads.

Let’s start by creating workloads for both the m5.4xlarge (CPU) and the APU. laion_text index (make sure you run these commands from within .benchmark directory):

opensearch-benchmark create-workload --workload=laion_text_cpu --target-hosts=localhost:9200 --indices="laion_text”
opensearch-benchmark create-workload --workload=laion_text_apu --target-hosts=localhost:9200 --indices="laion_text”

Comment: if saved in workload a workloads in your folder home folders, you will need to copy them .benchmark/benchmarks/workloads/default directory.

run opensearch-benchmark list workload again and note that both laion_text_cpu And laion_text_apu are listed.

Next, we’ll add operations to the test schedule. You can add as many benchmarking tests as you want in this section. Add each exam to the schedule workload.json file, which can be found in the folder with the index name you want to benchmark.

In our case, it can be found in the following areas:

  • ./benchmark/benchmarks/workloads/default/laion_text_apu
  • ./benchmark/benchmarks/workloads/default/laion_text_cpu

We want to test our OpenSearch search. Create an operation called “single vector search” (or any other name) and include a query vector. I truncated the vector itself as a 512 dimension vector would be a bit long… add in the desired query vector and be sure to copy the same vector to m5.4xlarge (CPU) and APU workload.json files!

Next, add any parameters you want. In this example, I’ll stick with the default eight clients and 1,000 iterations.

m5.4xlarge (CPU) workload.json,

"schedule":[
{
"operation":{
"name":"single-vector-search",
"operation-type":"search",
"body":{
"size":"10",
"query":{
"script_score":{
"query":{
"match_all":{}
},
"script":{
"source":"knn_score",
"lang":"knn",
"params":{
"field":"vector",
"query_value":[INSERT VECTOR HERE],
"space_type":"cosinesimil"
}
}
}
}
}
},
"clients":8,
"warmup-iterations":1000,
"iterations":1000,
"target-throughput":100
}
]

apu workload.json,

"schedule":[
{
"operation":{
"name":"single-vector-search",
"operation-type":"search",
"body":{
"size":"10",
"query":{
"gsi_knn":{
"field":"vector",
"vector":[INSERT VECTOR HERE],
"topk":"10"
}
}
}
},
"clients":8,
"warmup-iterations":1000,
"iterations":1000,
"target-throughput":100
}
]
Photo by Tim Gow on Unsplash

It’s time to run our assignment! We are interested in running our search workloads on a running OpenSearch cluster. I have added some parameters to it execute_test command:

Distribution-version — Be sure to add your correct OpenSearch version.

Workload – Name of our assignment.

Other parameters are available. I added pipeline, client-optionsAnd on-errorWhich simplifies the whole process.

Go ahead and run the following command, which will run our workload:

opensearch-benchmark execute_test --distribution-version=2.2.0 --workload=laion_text_apu --pipeline=benchmark-only --client-options=verify_certs:false,use_ssl:false --on-error=abort --client-options="timeout:320"
opensearch-benchmark execute_test --distribution-version=2.2.0 --workload=laion_text_cpu --pipeline=benchmark-only --client-options=verify_certs:false,use_ssl:false --on-error=abort --client-options="timeout:320"

And now we wait…

Our results should look like the following:

lion_text_apu(APU) Result
lion_text_cpu (m5.4xlarge) result

We are finally ready to see our test results. Drumroll, please… 🥁

First, we noticed that the running time of each workload was different. The M5.4xlarge (CPU) workload took 6.45 hours, while the APU workload took 2.78 minutes (139 times faster). This is because the APU supports query aggregation, allowing for greater throughput.

Now, we want a more comprehensive comparison between our workloads. The OpenSearch benchmark enables us to create a CSV file where we can easily make comparisons between workloads.

First, we need to find the Workload ID for each case. This can be done either by looking in the OpenSearch benchmark-test-performance index (which was created in step 2) or benchmarks Folder:

Using the workload ID, run the following command to compare the two workloads and display the output in a CSV file:

opensearch-benchmark compare --results-format=csv --show-in-results=search --results-file=data.csv --baseline=ecb4af7a-d53c-4ac3-9985-b5de45daea0d --contender=b714b13a-af8e-4103-a4c6-558242b8fe6a

Here is a brief summary of our results:

image by author

Brief explanation of the results in the table:

  1. Throughput: The number of operations that OpenSearch can perform within a given period of time, usually per second.

2. Latency: The time between submitting a request and receiving a complete response. It also includes latency, which is the time spent waiting for a request to be ready to be served by OpenSearch.

3. Service Time: The time between sending a request and receiving the corresponding response. This metric can easily be confused with latency but does not include wait times. Most load testing tools refer to this incorrectly as “latency”.

4. Test execution time: Total runtime from task start to completion.

When we look at our results, we can see that the service time for the APU workload was 127 times faster than the m5.4xlarge workload. From a cost perspective, it costs $0.23 to run the same workload on the APU, compared to $5.78 on the M5.4XL (25 times less expensive), and we got our search results about 6.45 hours earlier.

Now, imagine the magnitude of these benefits when scaled up to large datasets, which is likely in our data-driven, fast-paced world.

I hope this helped you understand more about the power of OpenSearch’s benchmarking tool and how you can use it to benchmark your search performance.

For more information about Searchium.ai’s plugin and APU, please visit www.searchium.ai. They even offer free trials!

Many thanks to Dmitry Sosnovsky and Yaniv Vaknin for their help!

Leave a Comment