How to Benchmark FreeBSD Server Performance

Benchmarking is the foundation of every tuning decision. Without baseline numbers, you are guessing. This guide covers the complete benchmarking toolkit for FreeBSD servers: CPU, memory, disk I/O, ZFS, network throughput, web server performance, and database benchmarks.

Every benchmark in this guide runs on FreeBSD using packages from the official repository. Each section explains what the benchmark measures, how to run it, and how to interpret the results. For applying the results, see the FreeBSD Performance Tuning guide.

Before You Benchmark

Rules of Benchmarking

Isolate the variable. Test one thing at a time. Do not benchmark disk I/O while a database is running.
Run multiple iterations. A single run is an anecdote. Run at least 3-5 iterations and take the median.
Document the environment. Record the FreeBSD version, kernel config, hardware specs, and any sysctl tuning.
Benchmark under realistic conditions. Synthetic benchmarks reveal potential. Real workload benchmarks reveal truth.
Compare against a known baseline. Raw numbers mean nothing in isolation. Compare before/after tuning, or against a known-good reference system.

Documenting Your System

Before any benchmark, capture the environment:

sh
# FreeBSD version
freebsd-version -ku

# Hardware summary
sysctl hw.model hw.ncpu hw.physmem hw.realmem

# Memory
sysctl hw.physmem | awk '{printf "%.1f GB\n", $2/1073741824}'

# Disk controller and drives
camcontrol devlist

# ZFS pool status
zpool list
zpool status

# Network interfaces
ifconfig -a | grep -E '^[a-z]|inet '

# Current sysctl tuning
sysctl -a > /tmp/sysctl-snapshot.txt

Save this output alongside every benchmark result.

CPU Benchmarks

sysbench

sysbench is the standard cross-platform CPU benchmark. It is useful for comparing CPU performance across different FreeBSD systems or against Linux baselines.

sh
pkg install sysbench

Single-Threaded CPU Test

sh
sysbench cpu --threads=1 --time=30 run

Key metric: events per second. Higher is better. This measures single-thread integer arithmetic performance.

Multi-Threaded CPU Test

sh
# Use all available cores
sysbench cpu --threads=$(sysctl -n hw.ncpu) --time=30 run

Compare multi-threaded events/sec to single-threaded to measure scaling efficiency. Perfect linear scaling is rare; look for diminishing returns that indicate NUMA effects or thermal throttling.

CPU Prime Number Test

sh
# Higher prime limit = longer computation per event
sysbench cpu --cpu-max-prime=20000 --threads=$(sysctl -n hw.ncpu) --time=60 run

Increasing --cpu-max-prime stresses the CPU harder. Use 20000 for standard comparison benchmarks.

openssl speed

OpenSSL is in the FreeBSD base system. Its built-in benchmark measures cryptographic throughput, which is directly relevant for TLS-heavy workloads:

sh
# AES-256-GCM throughput (common TLS cipher)
openssl speed -evp aes-256-gcm

# SHA-256 throughput
openssl speed sha256

# RSA signing performance (relevant for TLS handshakes)
openssl speed rsa2048

# Test with multiple processes
openssl speed -multi $(sysctl -n hw.ncpu) -evp aes-256-gcm

Key metric: KB/s for symmetric ciphers, sign/verify per second for RSA. These numbers directly predict TLS throughput.

UNIX Bench

For a comprehensive CPU and system call benchmark:

sh
pkg install unixbench
cd /usr/local/share/unixbench
./Run

UnixBench produces an index score normalized against a reference system. It tests process creation, shell scripts, pipe throughput, file copy, and more. The composite index is useful for overall system comparison.

Memory Benchmarks

sysbench Memory

sh
# Sequential write throughput
sysbench memory --memory-block-size=1K --memory-total-size=10G --threads=1 run

# Sequential read throughput
sysbench memory --memory-block-size=1K --memory-total-size=10G --memory-oper=read --threads=1 run

# Multi-threaded memory bandwidth
sysbench memory --memory-block-size=1M --memory-total-size=100G --threads=$(sysctl -n hw.ncpu) run

Key metric: MiB/sec. Vary the block size to understand cache behavior: 1K hits L1, 1M likely hits L3 or main memory.

STREAM Benchmark

For raw memory bandwidth measurement:

sh
pkg install stream
stream

STREAM reports Copy, Scale, Add, and Triad bandwidth in MB/s. This is the standard benchmark for measuring memory subsystem performance, especially useful for NUMA systems.

Disk I/O Benchmarks

fio (Flexible I/O Tester)

fio is the definitive disk I/O benchmark. It can simulate virtually any I/O pattern.

sh
pkg install fio

Sequential Read Throughput

sh
fio --name=seq-read --ioengine=posixaio --rw=read \
    --bs=1m --size=4g --numjobs=1 --runtime=60 \
    --time_based --group_reporting \
    --filename=/tmp/fio-test

Sequential Write Throughput

sh
fio --name=seq-write --ioengine=posixaio --rw=write \
    --bs=1m --size=4g --numjobs=1 --runtime=60 \
    --time_based --group_reporting \
    --filename=/tmp/fio-test

Random Read IOPS (4K)

This is the most important benchmark for database workloads:

sh
fio --name=rand-read --ioengine=posixaio --rw=randread \
    --bs=4k --size=4g --numjobs=4 --runtime=60 \
    --time_based --group_reporting --iodepth=32 \
    --filename=/tmp/fio-test

Key metric: IOPS (I/O operations per second). A good NVMe drive delivers 500K+ random read IOPS. A SATA SSD delivers 50-90K. A spinning disk delivers 100-200.

Random Write IOPS (4K)

sh
fio --name=rand-write --ioengine=posixaio --rw=randwrite \
    --bs=4k --size=4g --numjobs=4 --runtime=60 \
    --time_based --group_reporting --iodepth=32 \
    --filename=/tmp/fio-test

Mixed Random Read/Write (Database Simulation)

sh
fio --name=mixed-rw --ioengine=posixaio --rw=randrw \
    --rwmixread=70 --bs=4k --size=4g --numjobs=4 \
    --runtime=60 --time_based --group_reporting \
    --iodepth=32 --filename=/tmp/fio-test

A 70/30 read/write mix approximates typical OLTP database patterns.

Latency Measurement

sh
fio --name=latency --ioengine=posixaio --rw=randread \
    --bs=4k --size=1g --numjobs=1 --runtime=30 \
    --time_based --iodepth=1 \
    --filename=/tmp/fio-test

Setting --iodepth=1 forces synchronous I/O and measures true single-request latency. For NVMe, expect p99 latency under 200 microseconds. For SATA SSD, under 1 ms. For spinning disks, 5-15 ms.

bonnie++

bonnie++ is a classic filesystem benchmark that tests sequential I/O, random seeks, and metadata operations:

sh
pkg install bonnie++

# Run with 2x RAM size to defeat caching
bonnie++ -d /tmp -s $(( $(sysctl -n hw.physmem) / 1073741824 * 2 ))G -u root

bonnie++ is useful for quick before/after comparisons but less configurable than fio.

ZFS-Specific Benchmarks

ZFS performance depends on ARC size, record size, compression, and pool topology. Benchmark ZFS separately from raw disk I/O.

ZFS ARC Hit Rate

Before benchmarking, check your ARC effectiveness:

sh
# ARC statistics
sysctl kstat.zfs.misc.arcstats.hits kstat.zfs.misc.arcstats.misses

# Calculate hit rate
sysctl kstat.zfs.misc.arcstats.hits kstat.zfs.misc.arcstats.misses | \
  awk 'BEGIN{h=0;m=0} /hits/{h=$2} /misses/{m=$2} END{printf "ARC hit rate: %.1f%%\n", h/(h+m)*100}'

A healthy ARC hit rate is above 90%. Below 80% indicates the ARC is undersized for the workload.

ZFS Sequential Throughput

sh
# Test on actual ZFS pool
fio --name=zfs-seq-write --ioengine=posixaio --rw=write \
    --bs=1m --size=10g --numjobs=1 --runtime=60 \
    --time_based --group_reporting \
    --filename=/zpool/benchmark/fio-test

# Read back
fio --name=zfs-seq-read --ioengine=posixaio --rw=read \
    --bs=1m --size=10g --numjobs=1 --runtime=60 \
    --time_based --group_reporting \
    --filename=/zpool/benchmark/fio-test

ZFS Record Size Impact

Test with different record sizes to find the optimal value for your workload:

sh
# Create test datasets with different record sizes
zfs create -o recordsize=4k zpool/bench4k
zfs create -o recordsize=128k zpool/bench128k
zfs create -o recordsize=1m zpool/bench1m

# Benchmark each
for rs in 4k 128k 1m; do
  echo "=== Record size: $rs ==="
  fio --name=zfs-$rs --ioengine=posixaio --rw=randrw --rwmixread=70 \
      --bs=4k --size=4g --numjobs=4 --runtime=60 \
      --time_based --group_reporting \
      --filename=/zpool/bench${rs}/fio-test
done

# Cleanup
zfs destroy zpool/bench4k
zfs destroy zpool/bench128k
zfs destroy zpool/bench1m

For database workloads, recordsize=16k often outperforms the default 128K. For large sequential files, 1M is faster.

ZFS Compression Benchmark

sh
# Create datasets with different compression
zfs create -o compression=off zpool/bench-nocomp
zfs create -o compression=lz4 zpool/bench-lz4
zfs create -o compression=zstd zpool/bench-zstd

# Write test data and compare
for comp in nocomp lz4 zstd; do
  echo "=== Compression: $comp ==="
  fio --name=comp-$comp --ioengine=posixaio --rw=write \
      --bs=128k --size=4g --numjobs=1 --runtime=60 \
      --time_based --group_reporting \
      --filename=/zpool/bench-${comp}/fio-test
done

LZ4 compression is almost always a net win on FreeBSD because it reduces the amount of data written to disk while adding negligible CPU overhead.

Network Benchmarks

iperf3

iperf3 measures raw network throughput between two systems.

sh
pkg install iperf3

TCP Throughput

On the remote server:

sh
iperf3 -s

On the FreeBSD client:

sh
# Single-stream TCP throughput
iperf3 -c server-ip -t 30

# Multi-stream (saturate high-bandwidth links)
iperf3 -c server-ip -t 30 -P 4

# Reverse mode (measure download from server)
iperf3 -c server-ip -t 30 -R

Key metric: Gbits/sec. A healthy 1 Gbps link should deliver 940+ Mbps. If you see significantly less, investigate NIC offload settings, interrupt coalescing, or sysctl network buffer tuning.

UDP Throughput and Jitter

sh
# UDP test at specified bandwidth
iperf3 -c server-ip -u -b 500M -t 30

Key metrics: throughput, jitter, and packet loss percentage. For VoIP and real-time applications, jitter under 5 ms and loss under 0.1% is the target.

netperf

For latency-focused network benchmarks:

sh
pkg install netperf

On the server:

sh
netserver

On the client:

sh
# Request/response latency (simulates small transactions)
netperf -H server-ip -t TCP_RR -l 30

# Bulk transfer throughput
netperf -H server-ip -t TCP_STREAM -l 30

TCP_RR (request/response) measures transactions per second. This is more relevant than raw throughput for web server and API workloads.

Web Server Benchmarks

wrk

wrk is a modern HTTP benchmarking tool with Lua scripting support.

sh
pkg install wrk

Basic HTTP Throughput

sh
# 4 threads, 100 concurrent connections, 30 seconds
wrk -t4 -c100 -d30s http://localhost/

Key metrics: requests/sec and latency distribution. The latency percentiles (p50, p90, p99) matter more than average latency.

HTTPS Throughput

sh
wrk -t4 -c100 -d30s https://localhost/

Compare HTTP vs HTTPS results. The difference is your TLS overhead. Modern CPUs with AES-NI should show less than 10% overhead.

Simulating Real Traffic Patterns

Create a Lua script for POST requests:

sh
cat > /tmp/post.lua << 'LUAEOF'
wrk.method = "POST"
wrk.body   = '{"key": "value", "data": "benchmark"}'
wrk.headers["Content-Type"] = "application/json"
LUAEOF

wrk -t4 -c100 -d30s -s /tmp/post.lua http://localhost/api/endpoint

h2load

For HTTP/2 benchmarking:

sh
pkg install nghttp2

# 100 concurrent streams, 4 clients, 10000 requests
h2load -n 10000 -c 4 -m 100 https://localhost/

HTTP/2 multiplexing changes performance characteristics significantly. Always benchmark with h2load if your production traffic uses HTTP/2.

Database Benchmarks

PostgreSQL with pgbench

pgbench is included with PostgreSQL:

sh
pkg install postgresql16-server

# Initialize benchmark database
pgbench -i -s 50 benchdb

# Run TPC-B-like benchmark
pgbench -c 10 -j 4 -T 60 benchdb

Key metric: transactions per second (TPS). Run with different -c (client) counts to find your saturation point.

MariaDB with sysbench

sh
pkg install sysbench

# Prepare test tables
sysbench /usr/local/share/sysbench/oltp_read_write.lua \
  --mysql-host=localhost --mysql-user=root --mysql-db=bench \
  --tables=10 --table-size=100000 prepare

# Run OLTP read/write benchmark
sysbench /usr/local/share/sysbench/oltp_read_write.lua \
  --mysql-host=localhost --mysql-user=root --mysql-db=bench \
  --tables=10 --table-size=100000 --threads=8 --time=60 run

# Cleanup
sysbench /usr/local/share/sysbench/oltp_read_write.lua \
  --mysql-host=localhost --mysql-user=root --mysql-db=bench \
  --tables=10 --table-size=100000 cleanup

Key metrics: transactions/sec, queries/sec, and latency percentiles.

Automating Benchmark Runs

Create a script that runs your standard benchmark suite and saves results:

sh
#!/bin/sh
# /usr/local/bin/benchmark-suite.sh
# Run standard benchmark suite on FreeBSD

RESULTS_DIR="/var/benchmarks/$(date +%Y%m%d_%H%M%S)"
mkdir -p "$RESULTS_DIR"

# System info
freebsd-version -ku > "$RESULTS_DIR/system-info.txt"
sysctl hw.model hw.ncpu hw.physmem >> "$RESULTS_DIR/system-info.txt"

# CPU benchmark
echo "Running CPU benchmark..."
sysbench cpu --threads=$(sysctl -n hw.ncpu) --time=30 run > "$RESULTS_DIR/cpu.txt" 2>&1

# Memory benchmark
echo "Running memory benchmark..."
sysbench memory --memory-block-size=1M --memory-total-size=10G --threads=$(sysctl -n hw.ncpu) run > "$RESULTS_DIR/memory.txt" 2>&1

# Disk I/O benchmark
echo "Running disk I/O benchmark..."
fio --name=rand-rw --ioengine=posixaio --rw=randrw --rwmixread=70 \
    --bs=4k --size=4g --numjobs=4 --runtime=60 \
    --time_based --group_reporting \
    --filename=/tmp/fio-test \
    --output="$RESULTS_DIR/disk-io.txt"
rm -f /tmp/fio-test

echo "Results saved to $RESULTS_DIR"

sh
chmod +x /usr/local/bin/benchmark-suite.sh

Run this before and after any tuning changes to quantify the impact.

FAQ

Should I benchmark on a production server?

Avoid it. Benchmarks generate heavy I/O and CPU load that will impact production traffic. Use a staging server with identical hardware, or run benchmarks during a maintenance window with services stopped.

How do I compare FreeBSD performance to Linux?

Use the same benchmarks (sysbench, fio, iperf3, wrk) on both systems. Ensure identical hardware, kernel parameters scaled equivalently, and the same compiler flags on both platforms. Single benchmarks lie; run the full suite.

What is a good baseline for a modern server?

For a typical Xeon E-2300 / EPYC 7003 server with NVMe storage: sysbench CPU 10,000+ events/sec per core, fio random 4K read 400K+ IOPS on NVMe, iperf3 930+ Mbps on 1G links, wrk 50K+ req/sec for static files on NGINX.

How does ZFS overhead compare to UFS?

ZFS adds 5-15% overhead for random small I/O compared to UFS, primarily from copy-on-write semantics and checksumming. For sequential I/O with compression, ZFS often outperforms UFS because it writes less data to disk. The ARC cache typically makes up for the write overhead on read-heavy workloads.

Why do my fio results differ between runs?

Filesystem caching. After the first run, data may be cached in the ARC (ZFS) or buffer cache (UFS). For consistent results, either: clear the cache between runs (sysctl vfs.zfs.arc.max=1 then restore), use O_DIRECT with --direct=1 in fio, or use a test file larger than available RAM.

How do I benchmark network latency specifically?

Use netperf -t TCP_RR for TCP round-trip latency, or ping -c 100 -i 0.01 server-ip for ICMP latency. For application-level latency, wrk's percentile output is the most useful metric. Sub-millisecond latency requires testing on the same L2 segment.

What benchmarks matter most for a web server?

In order of importance: (1) wrk requests/sec and p99 latency, (2) random read IOPS (content serving), (3) network throughput (iperf3), (4) TLS handshake throughput (openssl speed). Optimize the bottleneck that your real traffic hits first.