ScyllaDB vs CockroachDB


#1

Hello.

Lately, I’ve been looking around for a database and the one I’ve been using more is CRDB. That said, recently I found ScyllaDB. Can anyone highlight the major differences between the two? As far as I can tell, ScyllaDB’s performance is way better than CRDB’s, and it also as the same capabilities: replication, horizontal scaling, etc.

Thanks,
Maria


(Tim O'Brien) #2

Hi @marigonzes

ScyllaDB is eventually consistent. All statements in CRDB are transactions, and all transactions are serializable.

Really, it’s apples and oranges - Scylla is an eventually consistent AP system, while CockroachDB is a strongly consistent CP system. As Ben notes in that last article:

For most applications, a CAP-Consistent database like CockroachDB is often the better choice, despite potentially longer latencies, because it offers a simple contract to the application developer:

  • The most recent write is always visible to subsequent readers (single register linearizability).
  • Other developers cannot compromise an app’s consistency with optional write settings.
  • In the event of partitions, the system will block rather than return inconsistent data.

That said, if you share some details about your schema, DML, and test, we may be able to help close the performance gap. It’d also be helpful to know which consistency level you’re using in Scylla, since the benefit might go away if the test is run with QUORUM as the consistency level.

Best,

Tim


(Piyush Katariya) #3

CRDB supports SQL spec. Isnt it a great difference ?
Trying to do that with Scylla using Apache Spark or Drill would make the deployment complex


(Tim O'Brien) #4

That is indeed a big difference as well - thanks for calling it out @piyush.


(Piyush Katariya) #5

Your welcome. :slight_smile:

IMHO Comparing the Distributed KV database (NoSQL) with Distributed SQL (NewSQL as they say) database is like comparing the Branch of the tree with the whole tree itself.

Any distributed database like CRDB which not only handles the storage layer but also Intelligent Replication, Higher level (SQL or anything similar) Query engine, monitoring tools is just simply a blessing. The (Declarative) code is coming to data rather than the reverse, which I think makes a great difference in terms of computation time needed.

Looking forward CRDB to support DB procedures and recursive common table expressions, which will make many data-intensive jobs easier.


#6

These are very different systems.

Scylla is a high-performance C++ version of Cassandra. Scylla/Cassandra are wide-column databases, or better referred to as advanced key/value databases. The original research is from Amazon’s Dynamo white paper and it’s similar to Amazon DynamoDB, Google’s BigTable, Apache HBase, Microsoft Azure Table Storage, and several other systems.

Scylla/Cassandra has consistency levels that can be changed per-query, for eventual to strong consistency, but it is designed as a high-speed, write-heavy, multi-region, highly available, distributed sorted key-value store. It might look like a relational database with SQL and tables but it acts completely different internally.

It has tables with a primary key, where the entire row is stored as 1 big chunk that’s accessed by the key, and each column within that row is another key/value pair, which is why it’s better called a nested key/value store. The primary key is the only way to access data and get a row, or a prefix of the key to scan rows. Scylla/Cassandra support CQL which stands for the Cassandra Query Language. It looks like SQL but is very minimal and just an easy way to insert, update, and delete data by key.

Scylla does not support joins, there are no full secondary indexes (although you can create your own by just writing duplicate data), updates and inserts are the same (everything is an upsert), there is TTL and table compaction, anything column can be null and schema is very lightly enforced, and there are no aggregations or CTEs or any other advanced queries.

CockroachDB is a full-fledged relational database designed for automated sharding and replication of your data. It won’t match the pure throughput and performance of Scylla but it is designed for much higher level querying and general usability. Unless you really need Scylla and the specific data model it has (which doesn’t seem so from your question), then I would recommend sticking with CockroachDB.