How does LIMIT clause work?

Hello,

I need to split up a potentially large dataset (once days worth of record inserts) for batch processing. The dataset is grouped by client device uuid. This batch processing could potentially be done by a number of servers at the same time. To identify the records as being ready for batch processing I was planning on writing a query like:

UPDATE input_data
SET
status = 'Processing’
AND processing_server = 'X_UUID’
WHERE
device_id IN (
SELECT DISTINCT device_id
FROM input_data
WHERE
status = 'New’
LIMIT 100);

My question is about efficiency of the LIMIT clause. How does it work? For example:
Does the coordinating SQL server kill the "select"s running on individual nodes once the LIMIT is reached or does it use some other optimizations?

Thanks, :slight_smile:

Hi! Yes, query processing will stop when the LIMIT is reached. In this case, as soon as we get 100 distinct device_ids. The only potential problem is if it needs to scan a lot of device_ids to find the ones with status=New (in which case an index on status would help).