Efficient To-Do Data Modeling in CockroachDB Beta-1.x?

All,
As CockroachDB gets closer and closer to 1.x; I’ve been increasingly interested in using it for a project. I love what it promises to bring to the open source community, and database technology in general. It’s focus on scalability, durability, and consistency is especially appealing. However, as I ponder its utilization as a data layer for modern Internet facing applications, I start thinking about the SQL data model paradigm. I’m going to use a simple to-do app for my case study. Traditionally, when using a SQL database for this kind of app you would/could have a to-do table with at least the following columns (id[pk], list-id[fk], user-id[fk], done, description), with the associated user and task-list tables. This is simple enough, but it brings with it some obvious performance implications as the dataset grows. Of course this will be the first to-do app in history to become an Internet sensation, so there will be millions of task lists with a dozen or more to-do items each. With each user access, a query is performed on the entire table sorting out the correct owner, for each to-do item in their lists. While this example is a little silly, it does demonstrate how SQL may be a less desirable data organizational system for this type of mega multi-user app. Comparing this, to other data stores that offer data structures that can encapsulate the entire to-do list, and each to-do item in a single entity seems to be much more efficient. Some even offer user name spaces, while maintaining the same schema to help reduce query costs. That being said, are there any innovations that CockroachDB implements to help reduce the overhead of traditional SQL constructs? What would the ideal data model for this app look like in CockroachDB?

Many Thanks!
–Nick

1 Like

Hi Nick,
if your to-do list is indexed with the user-id first in the primary key, then look-ups by user will be efficient.
Intuitively for this type of application the PK will be (user-id, list-id, …).

(These are general data organization principles for RDBMs, not specific to CockroachDB. Best practices from other SQL engines probably apply just as well.)

Perhaps arrays or JSON support would address your interests? There is a byte/blob data type where you can store JSON. In a web app JSON would likely be the format coming back from the browser anyway. If you pointed something like SOLR at it, then you’d have pretty nice to-do list architecture. And CDB supported indexing of text content, so much the better.

CockroachDB does not yet support arrays or JSON data types (you could store your JSON data in a string column, but you wouldn’t be able to query based on its contents).

Like @knz said, a single table is fine for the schema as you’ve described it so far. But that doesn’t give you a place to put things like list titles, so you’ll probably want multiple tables (users, lists, items). And when you use multiple tables in a hierarchy like this, there’s an important CockroachDB-specific feature to be aware of: interleaved tables. You can use the INTERLEAVE IN PARENT clause to interleave the list table in the user table, and the item table in the list table. This will store all of a user’s data together so that when you join these tables together you can load all the data you need at once.

1 Like

Do you if anyone has attempted a SOLR or ElasticSearch integration with CDB yet? Such could a long way to make up for full JSON query support.

@bdarnell / All, Thanks for taking a look at this inquiry! Quote: “so you’ll probably want multiple tables (users, lists, items)”; I mentioned this in my original question, but maybe I needed to make it a little clearer :slight_smile: The Interleave table information is exactly what I was looking for to help me effectively model data in CockroachDB; Although I wish JSON support was also in place. Interleaving makes allot of sense, but I’m wondering what the best logical boundaries would be. My first inclination would be to crate a hierarchy with the appropriate interleave prefixes like this: (User -> Lists -> ToDos). However, after reading the Interleave limitations, would it be better to partition the Interleave hierarchy at just (List -> ToDo), so that you don’t break the 64MB limit? My understanding is that with the complete hierarchy, the user could only have a total of 64MB in To-Do lists before performance degrades. In the second example, the user could have several To-Do lists that are up to 64MB each, and not be effected by the collective size of other to do lists that the user has. Am I thinking about this in the right way?

– Nick

There is not a 64MB limit for interleaved tables. In some cases the benefits of interleaving are reduced if the data is more than 64MB per user, but it’s still beneficial (as long as your data and access patterns are hierarchical) no matter how big the data gets.