Performance Comparison: Reduce Storage vs Minio

We often use blob storage like S3, if we need to store data of different formats and sizes in the cloud or somewhere in our internal storage. Mineo is an S3 compatible storage that you can run on your own personal cloud, bare-metal server, or even edge devices. You can also customize it to have historical data as a time series of drops. The simplest solution would be to create a folder for each data source and save the objects with timestamps in their names:

bucket
 |
 |---cv_camera
        |---1666225094312397.bin
        |---1666225094412397.bin
        |---1666225094512397.bin
enter fullscreen mode

exit fullscreen mode

If you need to query data, you should request a list of objects cv_camera folders and filter them by the names that are in the given time interval.
This approach is simple to implement, but has some disadvantages:

  • The more objects in the folder, the longer the query will take.
  • Large overhead for small objects: timestamp as string and minimum file size is 1Kb or 512 due to file system’s block size
  • FIFO quota, to delete old data when we reach some limit, may not work for intensive write operations.

Reduct Storage aims to address these issues. It has a strong FIFO quota, an HTTP API for querying data through time intervals, and it builds objects (or records) into blocks for efficient disk usage and search.

Mineo and RedactStorage have Python SDKs, so we can use them to implement write and read operations and compare performance.

read/write data with minio

For the benchmark, we create two functions for write and read CHUNK_COUNT pieces:

from minio import Minio
import time

minio_client = Minio("127.0.0.1:9000", access_key="minioadmin", secret_key="minioadmin", secure=False)

def write_to_minio():
    count = 0
    for i in range(CHUNK_COUNT):
        count += CHUNK_SIZE
        object_name = f"data/{str(int(time.time_ns() / 1000))}.bin"
        minio_client.put_object(BUCKET_NAME, object_name, io.BytesIO(CHUNK),
                                CHUNK_SIZE)
    return count  # count data to print it in main function


def read_from_minio(t1, t2):
    count = 0

    t1 = str(int(t1 * 1000_000))
    t2 = str(int(t2 * 1000_000))

    for obj in minio_client.list_objects("test", prefix="data/"):
        if t1 <= obj.object_name[5:-4] <= t2:
            resp = minio_client.get_object("test", obj.object_name)
            count += len(resp.read())

    return count
enter fullscreen mode

exit fullscreen mode

you can minio_client No API provides query data with patterns, so we have to browse the entire folder on the client side to find the required item. If you have billions of objects, it stops working. You should store the object path in some time series database or create a hierarchy of folders, for example, creating one folder per day.

read/write data with redact storage

It’s an easy way with Redact Storage:

from reduct import Client as ReductClient
reduct_client = ReductClient("http://127.0.0.1:8383")

async def write_to_reduct():
    count = 0
    bucket = await reduct_client.create_bucket("test", exist_ok=True)
    for i in range(CHUNK_COUNT):
        await bucket.write("data", CHUNK)
        count += CHUNK_SIZE
    return count


async def read_from_reduct(t1, t2):
    count = 0
    bucket = await reduct_client.get_bucket("test")
    async for rec in bucket.query("data", int(t1 * 1000000), int(t2 * 1000000)):
        count += len(await rec.read_all())
    return count
enter fullscreen mode

exit fullscreen mode

standard

When we have write/read functions, we can finally write our benchmarks:

import io
import random
import time
import asyncio

from minio import Minio
from reduct import Client as ReductClient

CHUNK_SIZE = 100000
CHUNK_COUNT = 10000
BUCKET_NAME = "test"

CHUNK = random.randbytes(CHUNK_SIZE)

minio_client = Minio("127.0.0.1:9000", access_key="minioadmin", secret_key="minioadmin", secure=False)
reduct_client = ReductClient("http://127.0.0.1:8383")

# Our function were here..

if __name__ == "__main__":
    print(f"Chunk size={CHUNK_SIZE/1000_000} Mb, count={CHUNK_COUNT}")
    ts = time.time()
    size = write_to_minio()
    print(f"Write {size / 1000_000} Mb to Minio: {time.time() - ts} s")

    ts_read = time.time()
    size = read_from_minio(ts, time.time())
    print(f"Read {size / 1000_000} Mb from Minio: {time.time() - ts_read} s")

    loop = asyncio.new_event_loop();
    ts = time.time()
    size = loop.run_until_complete(write_to_reduct())
    print(f"Write {size / 1000_000} Mb to Reduct Storage: {time.time() - ts} s")

    ts_read = time.time()
    size = loop.run_until_complete(read_from_reduct(ts, time.time()))
    print(f"Read {size / 1000_000} Mb from Reduct Storage: {time.time() - ts_read} s")

enter fullscreen mode

exit fullscreen mode

For testing, we need to run the database. It’s easy to do with docker-compose:

services:
  reduct-storage:
    image: reductstorage/engine:v1.0.1
    volumes:
      - ./reduct-data:/data
    ports:
      - 8383:8383

  minio:
    image: minio/minio
    volumes:
      - ./minio-data:/data
    command: minio server /data --console-address :9002
    ports:
      - 9000:9000
      - 9002:9002
enter fullscreen mode

exit fullscreen mode

Run Docker Compose Configuration and Benchmark:

docker-compose up -d
python3 main.py
enter fullscreen mode

exit fullscreen mode

Result

Script given CHUNK_SIZE and . prints the results of CHUNK_COUNT, On my device, I got the following numbers:

piece Operation favored reduce storage
10.0 MB (100 requests) To write 8.69 s 0.53 s
Reading 1.19 s 0.57 s
1.0 MB (1000 requests) To write 12.66 s 1.30 s
Reading 2.04 s 1.38 s
.1 MB (10000 requests) To write 61.86 s 13.73 s
Reading 9.39 seconds 15.02 sec

As you can see, redact storage is always faster for write operations (16 times for 10MB blobs!!!) and slightly slower for reads when we have many small objects. You can see that when we reduce the size of the fragments the speed decreases for both the databases. This can be explained with HTTP overhead as we spend a dedicated HTTP request for each write or read operation.

conclusion

Reduct storage can be a good choice for applications where you must historically store blobs with timestamps and write data all the time. It has a strong FIFO quota to avoid problems with disk space, and is very fast for intensive write operations.

References:

Leave a Comment