Spark Scala write to cockroah fails


(Asvin Kumar) #1

Hi,

I use postgresql-42.2.3.jar to connect. I’m trying to write into Cockroach from Spark Shell, I can connect and read the table data as spark dataframe but when I try to write the dataframe, I get this error.

Caused by: java.sql.BatchUpdateException: Batch entry 633 INSERT INTO demo.configuration_detail (“config_id”,“config_id_prefix”,“sales_item_id”,“id”,“created_date”,“created_user”,“last_changed_by”,“last_changed_date”,“modified_date”,“modified_user”,“price”,“qty”,“relationship_type”,“validity_end_date”,“validity_start_date”) VALUES (‘10’,‘51080232’,‘MCRNC1105’,828868,‘2018-04-02 01:17:16+02’,‘CSP’,NULL,‘2017-12-14 07:05:25+01’,‘2017-12-08 16:52:41+01’,dfdddfd,NULL,6.0,‘FIXED’,NULL,NULL) was aborted: ERROR: restart transaction: HandledRetryableTxnError: TransactionAbortedError(ABORT_REASON_ABORT_SPAN): “sql txn” id=ed7a00b8 key=/Table/61/1/402460548822925313/0 rw=true pri=0.02180245 iso=SERIALIZABLE stat=ABORTED epo=0 ts=1542891678.404688447,2 orig=1542891612.270024725,0 max=1542891612.270024725,0 wto=false rop=false seq=143635 Call getNextException to see other errors in the batch.

But I can insert the data directly into the table using insert command.

Any thoughts, please.


(Asvin Kumar) #2

It worked when I tried like this,

    `df.write.format("jdbc").mode("append").option("driver", "org.postgresql.Driver").option("url", "jdbc:postgresql://x.x.x.x:26258/db") .option("dbtable", "schema.table").option("user", "root").option("password","").save()` 

But I can insert only 50k records. Wont it work if the data is more?

Asvin


(Ron Arévalo) #3

Hey @Asvin,

Regarding the first error you ran into the error means that your txn ran into an “abort span” it was trying to read from a range where it had previously laid down intents that have been cleaned up in the meantime because the transaction was aborted. Retries are expected as there will always be come contention in a distributed DB , because Cockroach DB is also highly consistent, we prefer to throw the errors to make sure consistency remains, otherwise consistency can be lost.

As for your second question, are you running into any other errors after the 50k records?

Thanks,

Ron


(Asvin Kumar) #4

Ron,

Thanks for the reply, it is not throwing any errors but it is not completing even after 15 mins, I’m now trying to let it run for an hour and see what it returns.

Thanks,
Asvin


(Ron Arévalo) #5

Hey @Asvin,

Just wanted to follow up to see if you were able to run those inserts. Did you get any errors, or did they complete successfully? If you’re still having issues, could you share your DML, DDL and Schema with us, this would give us a better idea of where the issue might be occurring.

Thanks,

Ron


(Asvin Kumar) #6

I’m using Scala Spark dataframe to insert the data. I will have to modify the schema and test in order to share with you as it is for my client.

If I have to do bulk load, I would like to insert million of records, do you guys support?


(Ron Arévalo) #7

Hey @Asvin,

An insert of 1 million rows would most likely fail, splitting the data up would be recommended, usually it can be split up into bulk updates somewhere in the range of 10-100k depending on row size. You can read more about some of our best practices here.

Thanks,

Ron