Migration from Existing Database

I am currently looking for distributed database which can support the maximum features we need natively. Currently we are using (latest) MongoDB 3.6 community edition. I am satisfied with developer productivity, feature set (except Transaction and multi collection update support) and cluster deployment ease but it sucks at latency for read and write when traffic goes up suddenly.

Must to have features :

  1. Distributed clock sync and transaction when updating multiple collections/tables/nodes
  2. Declarative query language to push maximum computation to database as it is more aware of data locality and constraint.
    a. complex JOINS idioms (something similar to aggregation $lookUp and $graphLookup in MongoDB 3.6 )
    b. composition of Select/Insert / update /Delete into single query (like OpenCypher)
    c. Oracle/Postgres SQL like parallel execution semantics/hints when needed
    d. ability to define custom functions and aggregate functions (to avoid the frequent round trips to application server for such computations back and forth)
  3. Optional schema validation at db level
  4. DB driver for JS and Clojure(or Java)
    a. Async support
    b. Connection Pool
    c. Cluster awareness (should not require HA proxy to load balance)
    d. maintain transaction across multiple calls to DB
    e. TLS 1.2 or later support
  5. Indexing
    a. Regular (Btree) index and composite index
    b. Unique index
    c. Text search with score and weight ordering
    d. Geo Index
  6. Custom Table/collection partitioning based on value of attribute or function
  7. Configurable replication factor or default as 3 if not configurable
  8. Either Good admin UI for efficient cluster management or Fairly nice command line tools to do the same
  9. No significant loss of throughput (read/write) while new node is joining in or leaving from cluster
  10. DB users authentication and roles authorization level security
  11. Should support all features on Single server for local development and testing environment
  12. Either Hot back up or Multi DC support
  13. Basic data encryption support

Good to have features :

  1. Materialized View (implicitly triggered computed table)
  2. Auto balancing of sharded table chunks in cluster by learning query patterns over period of time to achieve data locality (or any other mechanism to speed up queries in distributed fashion)
  3. Indexing
    a. Partial indexes - index over subset of table/collection data
    b. functional indexes - index over subset of table/collection data categorized by function output value
  4. Cloud hosting support (AWS/Azure/GCP)

Dear Piyush,
thank you for your interest in CockroachDB.
I will try to answer some of these points but as a disclaimer I think it would be better if you were to talk with someone at CockroachLabs to make this a real conversation.

  1. yes
  2. SQL 2a. yes 2b. somewhat yes 2c. somewhat yes 2d. not yet
  3. yes
  4. JDBC driver from postgres 4a not sure 4b yes 4c yes 4d unsure what you mean 4e yes
  5. regular SQL indexes 5a. yes 5b. yes 5c. not yet 5d. not yet
  6. yes
  7. yes
  8. yes
  9. yes
  10. yes
  11. yes
  12. yes
  13. yes

Good to have:

  1. not yet
  2. partial
  3. not yet
  4. yes

Hope this helps. I still advise you talk to someone at Cockroach Labs.

1 Like

Thanks @knz.

I am waiting for Recursive CTE to be a part of CockroachDB as i want to push maximum computation to DB in my project, How soon will it happen ? any casual estimates ?

About 4.d “maintain transaction across multiple calls to DB”, I meant to ask does it maintain transaction across multiple DB connections ? Postgres does not do that.

Recursive CTEs (and I assume stored procedures, based on your description) will not be part of the next release (2.1) in Q4, but may be planned at some point after that.

The transactions are not maintained across DB connections/sessions but we have seen some demand for that (XA compatibility). This is a feature we could consider if you were to tell us more about your use case.

Yes Raphael, the XA compatibility. But I am open to any technique which might work.

The use case in our Message Queue oriented architecture we have is as follows:

  1. Divide the main workflow into 4 tasks (in practice it will be more than 4) so that they can be parallelized across cluster of nodes
  2. Start transaction Node N1 (Java application server) to Execute Task T1 and Task T3 parallelly because they are indepedant
  3. Gather result from Task T1 push to Distributed Queue Q1 and T3 to Q3
  4. On Node N10 (NodeJS application server) listen to Q1 and execute T2 as T2 was depedent on T1’s result
  5. On Node N5 (Golang application server) listen to Q3 and execute T4 as T4 was dependent on T2’s result
  6. Maintain transaction across T1,T2,T3 and T4 on Node N1, N5 and N10 accessing the same database.

So basically we need Parallel transaction across not only multiple connections from given Node but also across from multiple nodes Nodes.

It might work if DB allows following behaviour:

  1. Ask DB for transaction token from any application server
  2. Use transaction token from any application server so that DB can maintain list and order of statements need to be executed as part of transactions
  3. Perform Commit or Rollback it against transaction token

Does that make sense ? or need more input about tech functionality ?

Your use case is clear; you do indeed need support for distributed transactions (a transaction held open from multiple clients talking to separate nodes concurrently).

This is not currently supported by CockroachDB but the case is interesting. I’ll forward it internally and let you know of the outcome.

Awesome thanks.

Basically we seek separation of concern between Database connection from Postgres Driver and the Transaction Identifier/Tracker. XA is one of the way to achieve it.

For example in JDBC driver or JTA in Java land used Thread Local Variables to track the different statements which can be part of Transaction, but as you can see its not sufficient for our use case as it bound to just Thread execution path.