How CockroachDB replica and partition data?


#1

Hi,
I am studying how to use CockroachDB.
I have some questions about how CockroachDB replicate and partition data.

For example, when I start a 3-node cluster and I load datum created by benchmark like TPCC.
Then when and how CockroachDB partition this datum to 3 nodes?
Will CockroachDB replicate datum in one node to the others at once?
I am also reading documents and source code to figure them out.

Thanks for your help!


(Raphael 'kena' Poss) #2

Hello @lemon
Thank you for your interest in CockroachDB! Your question sounds like a homework assignment :slight_smile:

CockroachDB replicates data when the data is created. The data is replicated to all nodes in the replication group, however the write is acknowledged to the client as soon as a majority (quorum) of nodes has stored a replica.

Does this answer your question? Feel free to ask more.


#3

Thanks a lot!
So that means when I insert a value into node1, the database will create replications on node2 and node3 as soon as I finish the insertion on node1, right?

I am also wondering that when I load datum to this 3-node cluster by command like
_workload init tpcc --warehouses=1000 “postgres://root@?sslmode=disable”
_
will datum be stored just on node1 or be partitioned to 3 nodes equally?


(Raphael 'kena' Poss) #4

You are not using the words correctly. There is no such thing as “create replications” and “be partitioned to 3 nodes”. Also your usage of the word “datum” is suspicious.
You must really work on your vocabulary – all these words/terms matter!

Definitions:

  • “replica” - synonym “one copy of the data”
    for example: “When you insert a value into node 1, the database will create a replica on node 2 and node 3”
  • “database”, definition 1 - a container for relational tables and views
    for example: “The TPCC benchmark uses one database with 6 tables.”
  • “dataset” - a collection of databases and their contents, to support a given application
    for example: “The TPCC benchmark defines a dataset” or “Before you can start the TPCC benchmark, you must load the TPCC dataset into the CockroachDB cluster”
  • “database”, definition 2 - a dataset of one database (database + table + contents)
  • “datum” - the intersection of a particular row and column in a table
  • “partition”, definition 1 - an arrangement of a set of datums in sub-sets so that all datums are contained in one sub-set and no sub-set overlap; see this definition on Wikipedia
  • “partition”, definition 2 - one of the sub-set of a partitioned set
  • “partition”, definition 3 - one of the sub-set of the rows in a partitioned table

Your questions above are all confused because you have mixed the words together. It makes it very difficult for me to understand your questions.

Please rephrase using the proper words, then we can answer your questions.

Thank you


(Jesse) #5

Hi @lemon,

Our online training might help as you learn the capabilities of CockroachDB. The architecture overview and the first few Ops Basics modules focus on replication.

Best,
Jesse


#6

Hi @knz
I felt sorry for my inaccurate description about my question.
Thanks a lot for your assistence and reminding me of the difference between these words!
I will use more accurate terminology afterwards.
And I found answer in this tutorial:
https://www.cockroachlabs.com/docs/stable/partitioning.html

Thanks again :slight_smile:


#7

Hi @jesse
I am watching slides on the website you mentioned.
I think it help me have better understanding on the architecture.

Thanks a lot!