Benchmark Cassandra & CockroachDB on multi-node cluster with Cassandra-stress tool/YCSB

The group I was in at Microsoft Research, Asia, was on database, distributed database in particular. GraphView, which is a middleware within Microsoft’s CosmosDB that accepts graph operations and translates them to T-SQL executed in SQL Server or Azure SQL Database, is our focus. In order to  provide a Transaction-as-a-service at a global and persistent states, our team was formulating a new concurrency control protocol, which is based on a protocol from MIT, called TicToc and several KV backends. At memory level, Redis is picked and at disk level Facebook’s CassandraDB is picked.

Hence, a thorough study on Cassandra’s performance on cluster is in need. CockroachDB, as  the flagship product of a startup by some ex-Googlers, is a distributed SQL database built on a transactional and strongly-consistent key-value store, which is a perfect match against our own datastore. So I will also put down the guide to benchmark CockroachDB on our own cluster with YCSB.

    1. Environment:
      • 10 Ubuntu server nodes w. 1 Ubuntu client node
      • Cassandra 3.11.2 binaries
      • CockroachDB 2.0.2 binaries
    2. Cassandra Benchmark with Cassandra-stress tool
      • Install Java8 via:
        sudo add-apt-repository ppa:webupd8team/java
        sudo apt-get update
        sudo apt-get install oracle-java8-installer
      • Download Cassandra via:
        wget http://www-us.apache.org/dist/cassandra/3.11.2/apache-cassandra-3.11.2-bin.tar.gz
        
      • Make the following changes to conf/cassandra.yaml to setup the cluster
        • seeds: Interneal IP address of each seed node. DO NOT MAKE ALL NODES SEED NODES. It is a best practice to have more than one seed node per datacenter. In our 10-node cluster, there is one seed node - 10.6.0.4
        • listen_address: The IP address or hostname that Cassandra binds to for conencting to other Cassandra nodes. A.K.A. the private IP address of the currect node
        • [Optional] rpc_address: Private IP address of the current node.
      • Start, Stop and Monitor Cassandra cluster
        ./bin/cassandra &
        pkill -9 java
        ./bin/nodetool status
        
      • Run Cassandra-stress tool Populate the cluster with 1 million record (default) and 500 threads
        ./tools/bin/cassandra-stress write -node 10.6.0.4,10.6.0.5,10.6.0.6,10.6.0.12,10.6.0.13,10.6.0.14,10.6.0.15,10.6.0.16,10.6.0.17,10.6.0.18 -rate threads=500
        Test the cluster with 1 million read-requests and 500 threads
        ./tools/bin/cassandra-stress read -node 10.6.0.4,10.6.0.5,10.6.0.6,10.6.0.12,10.6.0.13,10.6.0.14,10.6.0.15,10.6.0.16,10.6.0.17,10.6.0.18 -rate threads=500
        Test the cluster with a mixed workload (R/W: 50%:50%) and 500 threads
        ./tools/bin/cassandra-stress mixed ratio\(write=1,read=1\) -node 10.6.0.4,10.6.0.5,10.6.0.6,10.6.0.12,10.6.0.13,10.6.0.14,10.6.0.15,10.6.0.16,10.6.0.17,10.6.0.18 -rate threads=500
    3. Cockroach Benchmark via YCSB
      • Install Cockroach cluster Download Cockroach binaries
        wget -qO- https://binaries.cockroachdb.com/cockroach-v2.0.2.linux-amd64.tgz | tar  xvz
        Expand the limit on the number of file descriptors a process may have
        ulimit -HSn 51200
        Launch the root node
        nohup cockroach start --cache=0.4 --max-sql-memory=0.4 --insecure --host=10.6.0.4 &
        Launch other nodes to join this cluster
        nohup cockroach start --cache=0.4 --max-sql-memory=0.4 --insecure --host=10.6.0.5 --join=10.6.0.4:26257 &
        Verify the cluster is successfully launched by visiting on client node:
        https://10.6.0.4:8080
        Kill Cockroach process on current node
        pkill -9 cockroach
      • Create YCSB database and usertable Open SQL shell from any node within our cluster
        ./cockroach sql --insecure --host=10.6.0.4
        Create database "ycsb" if not exists
        create database ycsb;
        Create single table "usertable" if not exists
        CREATE TABLE usertable ( YCSB_KEY VARCHAR(255) PRIMARY KEY, 
        	FIELD0 TEXT, FIELD1 TEXT, 
        	FIELD2 TEXT, FIELD3 TEXT, 
        	FIELD4 TEXT, FIELD5 TEXT, 
        	FIELD6 TEXT, FIELD7 TEXT,
        	FIELD8 TEXT, FIELD9 TEXT );
      • Launch HA proxy as load balancer Synchronize time on each node since Cockroach node fails easily if out of sync
        sudo apt-get install ntp
        sudo service stop ntp
        Edit /etc/ntp.conf by commenting any lines start with "server" or "pool" and add:
        server ntp.int.scaleway.com iburst
        server ntp.ubuntu.com iburst
        Start ntp and verify if the server has been changed
        sudo service start ntp
        sudo ntpq -p
        
        Output should be something like this
             remote           refid      st t when poll reach   delay   offset  jitter
        ==============================================================================
         10.1.31.39      .INIT.          16 u    - 1024    0    0.000    0.000   0.000
        *alphyn.canonica 192.53.103.108   2 u  832 1024  377  237.330   -0.197   0.648
        
        Generate HAproxy configuration file from root(primary) node
        ./cockroach gen haproxy --insecure --host=10.6.0.4 --port=26257
        Copy it to the client node, install HAproxy and run it with this .cfg file
        scp haproxy.cfg [email protected]:~/
        sudo apt-get install haproxy
        ulimit -HSn 51200
        haproxy -f haproxy.cfg
      • Run YCSB script Download YCSB binaries to client node
        https://github.com/brianfrankcooper/YCSB/releases
        Load data from workloads
        python bin/ycsb load jdbc -P workloads/workloadc -p db.driver=org.postgresql.Driver -p db.url=jdbc:postgresql://10.6.0.9:26257/ycsb -p db.user=root -p db.passwd=root -p recordcount=200000 -threads 32 -p operationcount=500000 -p db.batchsize=100 -s
        Run workloads
        python bin/ycsb run jdbc -P workloads/workloadRO -p db.driver=org.postgresql.Driver -p db.url=jdbc:postgresql://10.6.0.9:26257/ycsb -p db.user=root -p db.passwd=root -p recordcount=200000 -threads 1024 -p operationcount=500000 -p db.batchsize=100 -s
      • Since I have done this benchmarks a couple times, in order to simplify this process, I put a start.sh file in each cockroach folder. With our own servers, since all binaries are downloaded with cluster & haproxy configured, you only need to change to cockroach directory and run start.sh (start-haproxy.sh for client node).
      • For Read-only & Write-only workloads, change directory to ycsb-ro/ycsb-wo on ubuntu client node and run start.sh

Useful links:

  1. Cassandra-stress tool
  2. Interpreting the output of cassandra-stress
  3. Cassandra-stress docs
  4. Learn CockroachDB SQL
  5. Install a multi-node Cockroach Database with HA proxy
  6. Troubleshoot Cluster Setup