In CockroachDB 1.0, the server holds the entire results of your query in memory at once, so you need to use a
LIMIT when scanning the whole table to avoid running out of memory.
OFFSET is very bad for performance (in nearly all databases), so when doing a full table scan via repeated
LIMIT queries, you should instead use
ORDER BY to sort the results by the primary key, then use a
WHERE clause in the next query to select results that come after the ones you’ve already seen:
SELECT * FROM my_table ORDER BY id LIMIT 1000000;
123456789, asdf,... (last record)
SELECT * FROM my_table WHERE id > 123456789
In CockroachDB 1.1, the server can serve streaming results to the client, so as long as your client can handle this you can simply do
SELECT * FROM my_table.
For queries like
SELECT * FROM X WHERE some_id IN (...), it’s best to do this in a single query, at least up to fairly large lists of IDs. I think 10k ids is probably fine, but I wouldn’t go higher than that. If you do split up a query like this, it’s best to sort the IDs first, so the first query has the 10k smallest IDs, etc.
In both cases, you can get a consistent view of the data across multiple queries by picking a timestamp and using the same time in an
AS OF SYSTEM TIME clause for each query. Reads can conflict with concurrent writes; if you don’t need the latest data, choosing a timestamp a short time in the past can help. It’s best to pick a timestamp that is older than your longest-running write transaction (if you can figure that out)
For read-only transactions, changing the transaction isolation to
SNAPSHOT doesn’t do anything.
SNAPSHOT isolation helps read/write transactions that encounter concurrent reads.