I need to split up a potentially large dataset (once days worth of record inserts) for batch processing. The dataset is grouped by client device uuid. This batch processing could potentially be done by a number of servers at the same time. To identify the records as being ready for batch processing I was planning on writing a query like:
status = 'Processing’
AND processing_server = 'X_UUID’
device_id IN (
SELECT DISTINCT device_id
status = 'New’
My question is about efficiency of the LIMIT clause. How does it work? For example:
Does the coordinating SQL server kill the "select"s running on individual nodes once the LIMIT is reached or does it use some other optimizations?