Scalability questions

Source: Issue #380

  1. What’s the max number of nodes that can be in a cluster?

  2. Does the scalability increase by 1 with each node (assuming even usage on all ranges) or something <1 like in postgresql-xl?

  3. What’s the storage limit per node?

  4. What’s the max rows/kv per node/range/table?

  5. What’s the max number of columns in a table (ex: we store column-id as 1-byte so 256 max or something similar)??

  6. What’s the max number of tables in a database?

  7. What’s the max number of ranges (each range has overhead, does each node know the position of each range, or only range.ranges?)

Response by @peter:

  1. What’s the max number of nodes that can be in a cluster?

    CockroachDB is designed to scale to hundreds and even thousands of nodes. In practice, 10s of nodes is more realistic at this time. We’re working hard at making larger clusters work smoothly.

  2. Does the scalability increase by 1 with each node (assuming even usage on all ranges) or something <1 like in postgresql-xl?

    The scalability depends on your workload, but the short answer is that you’re going to see <1 performance increase for each additional node if you have any sort of realistic schema with multiple tables and indexes. I don’t have a more concrete answer for you at this time, but agree it would be good to provide guidance here.

  3. What’s the max storage per node (some kind of limit on the lower storage layer, ex like pg has)?

    Postgres has a limit on the disk storage it will use? There is no hardcoded limit in CockroachDB.

  4. What’s the max rows/kv per node/range/table?

    There is no limit on the number of rows/kvs per node or table. Ranges hold up to 64MB of row/kv data, but will split automatically. The 64MB size is configurable, but you shouldn’t set it larger right now as larger ranges have some problems which we’re working to resolve (i.e. certain operations use memory proportional to the range size). This also implies that row/kv data should be significantly smaller than a range.

  5. What’s the max number of columns in a table (ex: we store column-id as 1-byte so 256 max or something similar)?

    There is no hardcoded limit. Column IDs are 32-bit variable length encoded integers.

  6. What’s the max number of tables in a database?

    There is no hardcoded limit, though each table will consume a small amount of memory (a few KB) on every node.

  7. What’s the max number of ranges (each range has overhead, does each node know the position of each range, or only range.ranges?)

    There is no hardcoded limit. The range indexing information is cached on demand. The design here is similar to BigTable/HBase and is intended to scale to very large numbers of ranges.