How to Benchmark FreeBSD Server Performance
Benchmarking is the foundation of every tuning decision. Without baseline numbers, you are guessing. This guide covers the complete benchmarking toolkit for FreeBSD servers: CPU, memory, disk I/O, ZFS, network throughput, web server performance, and database benchmarks.
Every benchmark in this guide runs on FreeBSD using packages from the official repository. Each section explains what the benchmark measures, how to run it, and how to interpret the results. For applying the results, see the FreeBSD Performance Tuning guide.
Before You Benchmark
Rules of Benchmarking
- Isolate the variable. Test one thing at a time. Do not benchmark disk I/O while a database is running.
- Run multiple iterations. A single run is an anecdote. Run at least 3-5 iterations and take the median.
- Document the environment. Record the FreeBSD version, kernel config, hardware specs, and any sysctl tuning.
- Benchmark under realistic conditions. Synthetic benchmarks reveal potential. Real workload benchmarks reveal truth.
- Compare against a known baseline. Raw numbers mean nothing in isolation. Compare before/after tuning, or against a known-good reference system.
Documenting Your System
Before any benchmark, capture the environment:
sh# FreeBSD version freebsd-version -ku # Hardware summary sysctl hw.model hw.ncpu hw.physmem hw.realmem # Memory sysctl hw.physmem | awk '{printf "%.1f GB\n", $2/1073741824}' # Disk controller and drives camcontrol devlist # ZFS pool status zpool list zpool status # Network interfaces ifconfig -a | grep -E '^[a-z]|inet ' # Current sysctl tuning sysctl -a > /tmp/sysctl-snapshot.txt
Save this output alongside every benchmark result.
CPU Benchmarks
sysbench
sysbench is the standard cross-platform CPU benchmark. It is useful for comparing CPU performance across different FreeBSD systems or against Linux baselines.
shpkg install sysbench
Single-Threaded CPU Test
shsysbench cpu --threads=1 --time=30 run
Key metric: events per second. Higher is better. This measures single-thread integer arithmetic performance.
Multi-Threaded CPU Test
sh# Use all available cores sysbench cpu --threads=$(sysctl -n hw.ncpu) --time=30 run
Compare multi-threaded events/sec to single-threaded to measure scaling efficiency. Perfect linear scaling is rare; look for diminishing returns that indicate NUMA effects or thermal throttling.
CPU Prime Number Test
sh# Higher prime limit = longer computation per event sysbench cpu --cpu-max-prime=20000 --threads=$(sysctl -n hw.ncpu) --time=60 run
Increasing --cpu-max-prime stresses the CPU harder. Use 20000 for standard comparison benchmarks.
openssl speed
OpenSSL is in the FreeBSD base system. Its built-in benchmark measures cryptographic throughput, which is directly relevant for TLS-heavy workloads:
sh# AES-256-GCM throughput (common TLS cipher) openssl speed -evp aes-256-gcm # SHA-256 throughput openssl speed sha256 # RSA signing performance (relevant for TLS handshakes) openssl speed rsa2048 # Test with multiple processes openssl speed -multi $(sysctl -n hw.ncpu) -evp aes-256-gcm
Key metric: KB/s for symmetric ciphers, sign/verify per second for RSA. These numbers directly predict TLS throughput.
UNIX Bench
For a comprehensive CPU and system call benchmark:
shpkg install unixbench cd /usr/local/share/unixbench ./Run
UnixBench produces an index score normalized against a reference system. It tests process creation, shell scripts, pipe throughput, file copy, and more. The composite index is useful for overall system comparison.
Memory Benchmarks
sysbench Memory
sh# Sequential write throughput sysbench memory --memory-block-size=1K --memory-total-size=10G --threads=1 run # Sequential read throughput sysbench memory --memory-block-size=1K --memory-total-size=10G --memory-oper=read --threads=1 run # Multi-threaded memory bandwidth sysbench memory --memory-block-size=1M --memory-total-size=100G --threads=$(sysctl -n hw.ncpu) run
Key metric: MiB/sec. Vary the block size to understand cache behavior: 1K hits L1, 1M likely hits L3 or main memory.
STREAM Benchmark
For raw memory bandwidth measurement:
shpkg install stream stream
STREAM reports Copy, Scale, Add, and Triad bandwidth in MB/s. This is the standard benchmark for measuring memory subsystem performance, especially useful for NUMA systems.
Disk I/O Benchmarks
fio (Flexible I/O Tester)
fio is the definitive disk I/O benchmark. It can simulate virtually any I/O pattern.
shpkg install fio
Sequential Read Throughput
shfio --name=seq-read --ioengine=posixaio --rw=read \ --bs=1m --size=4g --numjobs=1 --runtime=60 \ --time_based --group_reporting \ --filename=/tmp/fio-test
Sequential Write Throughput
shfio --name=seq-write --ioengine=posixaio --rw=write \ --bs=1m --size=4g --numjobs=1 --runtime=60 \ --time_based --group_reporting \ --filename=/tmp/fio-test
Random Read IOPS (4K)
This is the most important benchmark for database workloads:
shfio --name=rand-read --ioengine=posixaio --rw=randread \ --bs=4k --size=4g --numjobs=4 --runtime=60 \ --time_based --group_reporting --iodepth=32 \ --filename=/tmp/fio-test
Key metric: IOPS (I/O operations per second). A good NVMe drive delivers 500K+ random read IOPS. A SATA SSD delivers 50-90K. A spinning disk delivers 100-200.
Random Write IOPS (4K)
shfio --name=rand-write --ioengine=posixaio --rw=randwrite \ --bs=4k --size=4g --numjobs=4 --runtime=60 \ --time_based --group_reporting --iodepth=32 \ --filename=/tmp/fio-test
Mixed Random Read/Write (Database Simulation)
shfio --name=mixed-rw --ioengine=posixaio --rw=randrw \ --rwmixread=70 --bs=4k --size=4g --numjobs=4 \ --runtime=60 --time_based --group_reporting \ --iodepth=32 --filename=/tmp/fio-test
A 70/30 read/write mix approximates typical OLTP database patterns.
Latency Measurement
shfio --name=latency --ioengine=posixaio --rw=randread \ --bs=4k --size=1g --numjobs=1 --runtime=30 \ --time_based --iodepth=1 \ --filename=/tmp/fio-test
Setting --iodepth=1 forces synchronous I/O and measures true single-request latency. For NVMe, expect p99 latency under 200 microseconds. For SATA SSD, under 1 ms. For spinning disks, 5-15 ms.
bonnie++
bonnie++ is a classic filesystem benchmark that tests sequential I/O, random seeks, and metadata operations:
shpkg install bonnie++ # Run with 2x RAM size to defeat caching bonnie++ -d /tmp -s $(( $(sysctl -n hw.physmem) / 1073741824 * 2 ))G -u root
bonnie++ is useful for quick before/after comparisons but less configurable than fio.
ZFS-Specific Benchmarks
ZFS performance depends on ARC size, record size, compression, and pool topology. Benchmark ZFS separately from raw disk I/O.
ZFS ARC Hit Rate
Before benchmarking, check your ARC effectiveness:
sh# ARC statistics sysctl kstat.zfs.misc.arcstats.hits kstat.zfs.misc.arcstats.misses # Calculate hit rate sysctl kstat.zfs.misc.arcstats.hits kstat.zfs.misc.arcstats.misses | \ awk 'BEGIN{h=0;m=0} /hits/{h=$2} /misses/{m=$2} END{printf "ARC hit rate: %.1f%%\n", h/(h+m)*100}'
A healthy ARC hit rate is above 90%. Below 80% indicates the ARC is undersized for the workload.
ZFS Sequential Throughput
sh# Test on actual ZFS pool fio --name=zfs-seq-write --ioengine=posixaio --rw=write \ --bs=1m --size=10g --numjobs=1 --runtime=60 \ --time_based --group_reporting \ --filename=/zpool/benchmark/fio-test # Read back fio --name=zfs-seq-read --ioengine=posixaio --rw=read \ --bs=1m --size=10g --numjobs=1 --runtime=60 \ --time_based --group_reporting \ --filename=/zpool/benchmark/fio-test
ZFS Record Size Impact
Test with different record sizes to find the optimal value for your workload:
sh# Create test datasets with different record sizes zfs create -o recordsize=4k zpool/bench4k zfs create -o recordsize=128k zpool/bench128k zfs create -o recordsize=1m zpool/bench1m # Benchmark each for rs in 4k 128k 1m; do echo "=== Record size: $rs ===" fio --name=zfs-$rs --ioengine=posixaio --rw=randrw --rwmixread=70 \ --bs=4k --size=4g --numjobs=4 --runtime=60 \ --time_based --group_reporting \ --filename=/zpool/bench${rs}/fio-test done # Cleanup zfs destroy zpool/bench4k zfs destroy zpool/bench128k zfs destroy zpool/bench1m
For database workloads, recordsize=16k often outperforms the default 128K. For large sequential files, 1M is faster.
ZFS Compression Benchmark
sh# Create datasets with different compression zfs create -o compression=off zpool/bench-nocomp zfs create -o compression=lz4 zpool/bench-lz4 zfs create -o compression=zstd zpool/bench-zstd # Write test data and compare for comp in nocomp lz4 zstd; do echo "=== Compression: $comp ===" fio --name=comp-$comp --ioengine=posixaio --rw=write \ --bs=128k --size=4g --numjobs=1 --runtime=60 \ --time_based --group_reporting \ --filename=/zpool/bench-${comp}/fio-test done
LZ4 compression is almost always a net win on FreeBSD because it reduces the amount of data written to disk while adding negligible CPU overhead.
Network Benchmarks
iperf3
iperf3 measures raw network throughput between two systems.
shpkg install iperf3
TCP Throughput
On the remote server:
shiperf3 -s
On the FreeBSD client:
sh# Single-stream TCP throughput iperf3 -c server-ip -t 30 # Multi-stream (saturate high-bandwidth links) iperf3 -c server-ip -t 30 -P 4 # Reverse mode (measure download from server) iperf3 -c server-ip -t 30 -R
Key metric: Gbits/sec. A healthy 1 Gbps link should deliver 940+ Mbps. If you see significantly less, investigate NIC offload settings, interrupt coalescing, or sysctl network buffer tuning.
UDP Throughput and Jitter
sh# UDP test at specified bandwidth iperf3 -c server-ip -u -b 500M -t 30
Key metrics: throughput, jitter, and packet loss percentage. For VoIP and real-time applications, jitter under 5 ms and loss under 0.1% is the target.
netperf
For latency-focused network benchmarks:
shpkg install netperf
On the server:
shnetserver
On the client:
sh# Request/response latency (simulates small transactions) netperf -H server-ip -t TCP_RR -l 30 # Bulk transfer throughput netperf -H server-ip -t TCP_STREAM -l 30
TCP_RR (request/response) measures transactions per second. This is more relevant than raw throughput for web server and API workloads.
Web Server Benchmarks
wrk
wrk is a modern HTTP benchmarking tool with Lua scripting support.
shpkg install wrk
Basic HTTP Throughput
sh# 4 threads, 100 concurrent connections, 30 seconds wrk -t4 -c100 -d30s http://localhost/
Key metrics: requests/sec and latency distribution. The latency percentiles (p50, p90, p99) matter more than average latency.
HTTPS Throughput
shwrk -t4 -c100 -d30s https://localhost/
Compare HTTP vs HTTPS results. The difference is your TLS overhead. Modern CPUs with AES-NI should show less than 10% overhead.
Simulating Real Traffic Patterns
Create a Lua script for POST requests:
shcat > /tmp/post.lua << 'LUAEOF' wrk.method = "POST" wrk.body = '{"key": "value", "data": "benchmark"}' wrk.headers["Content-Type"] = "application/json" LUAEOF wrk -t4 -c100 -d30s -s /tmp/post.lua http://localhost/api/endpoint
h2load
For HTTP/2 benchmarking:
shpkg install nghttp2 # 100 concurrent streams, 4 clients, 10000 requests h2load -n 10000 -c 4 -m 100 https://localhost/
HTTP/2 multiplexing changes performance characteristics significantly. Always benchmark with h2load if your production traffic uses HTTP/2.
Database Benchmarks
PostgreSQL with pgbench
pgbench is included with PostgreSQL:
shpkg install postgresql16-server # Initialize benchmark database pgbench -i -s 50 benchdb # Run TPC-B-like benchmark pgbench -c 10 -j 4 -T 60 benchdb
Key metric: transactions per second (TPS). Run with different -c (client) counts to find your saturation point.
MariaDB with sysbench
shpkg install sysbench # Prepare test tables sysbench /usr/local/share/sysbench/oltp_read_write.lua \ --mysql-host=localhost --mysql-user=root --mysql-db=bench \ --tables=10 --table-size=100000 prepare # Run OLTP read/write benchmark sysbench /usr/local/share/sysbench/oltp_read_write.lua \ --mysql-host=localhost --mysql-user=root --mysql-db=bench \ --tables=10 --table-size=100000 --threads=8 --time=60 run # Cleanup sysbench /usr/local/share/sysbench/oltp_read_write.lua \ --mysql-host=localhost --mysql-user=root --mysql-db=bench \ --tables=10 --table-size=100000 cleanup
Key metrics: transactions/sec, queries/sec, and latency percentiles.
Automating Benchmark Runs
Create a script that runs your standard benchmark suite and saves results:
sh#!/bin/sh # /usr/local/bin/benchmark-suite.sh # Run standard benchmark suite on FreeBSD RESULTS_DIR="/var/benchmarks/$(date +%Y%m%d_%H%M%S)" mkdir -p "$RESULTS_DIR" # System info freebsd-version -ku > "$RESULTS_DIR/system-info.txt" sysctl hw.model hw.ncpu hw.physmem >> "$RESULTS_DIR/system-info.txt" # CPU benchmark echo "Running CPU benchmark..." sysbench cpu --threads=$(sysctl -n hw.ncpu) --time=30 run > "$RESULTS_DIR/cpu.txt" 2>&1 # Memory benchmark echo "Running memory benchmark..." sysbench memory --memory-block-size=1M --memory-total-size=10G --threads=$(sysctl -n hw.ncpu) run > "$RESULTS_DIR/memory.txt" 2>&1 # Disk I/O benchmark echo "Running disk I/O benchmark..." fio --name=rand-rw --ioengine=posixaio --rw=randrw --rwmixread=70 \ --bs=4k --size=4g --numjobs=4 --runtime=60 \ --time_based --group_reporting \ --filename=/tmp/fio-test \ --output="$RESULTS_DIR/disk-io.txt" rm -f /tmp/fio-test echo "Results saved to $RESULTS_DIR"
shchmod +x /usr/local/bin/benchmark-suite.sh
Run this before and after any tuning changes to quantify the impact.
FAQ
Should I benchmark on a production server?
Avoid it. Benchmarks generate heavy I/O and CPU load that will impact production traffic. Use a staging server with identical hardware, or run benchmarks during a maintenance window with services stopped.
How do I compare FreeBSD performance to Linux?
Use the same benchmarks (sysbench, fio, iperf3, wrk) on both systems. Ensure identical hardware, kernel parameters scaled equivalently, and the same compiler flags on both platforms. Single benchmarks lie; run the full suite.
What is a good baseline for a modern server?
For a typical Xeon E-2300 / EPYC 7003 server with NVMe storage: sysbench CPU 10,000+ events/sec per core, fio random 4K read 400K+ IOPS on NVMe, iperf3 930+ Mbps on 1G links, wrk 50K+ req/sec for static files on NGINX.
How does ZFS overhead compare to UFS?
ZFS adds 5-15% overhead for random small I/O compared to UFS, primarily from copy-on-write semantics and checksumming. For sequential I/O with compression, ZFS often outperforms UFS because it writes less data to disk. The ARC cache typically makes up for the write overhead on read-heavy workloads.
Why do my fio results differ between runs?
Filesystem caching. After the first run, data may be cached in the ARC (ZFS) or buffer cache (UFS). For consistent results, either: clear the cache between runs (sysctl vfs.zfs.arc.max=1 then restore), use O_DIRECT with --direct=1 in fio, or use a test file larger than available RAM.
How do I benchmark network latency specifically?
Use netperf -t TCP_RR for TCP round-trip latency, or ping -c 100 -i 0.01 server-ip for ICMP latency. For application-level latency, wrk's percentile output is the most useful metric. Sub-millisecond latency requires testing on the same L2 segment.
What benchmarks matter most for a web server?
In order of importance: (1) wrk requests/sec and p99 latency, (2) random read IOPS (content serving), (3) network throughput (iperf3), (4) TLS handshake throughput (openssl speed). Optimize the bottleneck that your real traffic hits first.