Apache airflow + sqlalchemy

Using cockroachdb with apache airflow.

So for i love cockroachdb. It is so simple !

apache airflow version 1.8.0.
sqlalchemy version 1.2.7

First problem:
connection:
sql_alchemy_conn = cockroachdb://codenet@sjc-ads-2926:26257/airflow?sslmode=disable

exception:

sqlalchemy.exc.ProgrammingError: (psycopg2.ProgrammingError) column “is_encrypted” does not exist
[SQL: ‘UPDATE connection SET is_encrypted=%(is_encrypted)s’] [parameters: {‘is_encrypted’: False}] (Background on this error at: http://sqlalche.me/e/f405)

Quesiton:
#1 - Would cockroachdb could be used successfully with airflow?
#2 - If, yes what is the solution for this issue?

I can provide more details, upon request.

Ok, i spent quite some time to recreate the issue.
Airflow is using alembic to manage the table schema and its versions.
I followed this tutorial for a sample:
http://alembic.zzzcomputing.com/en/latest/tutorial.html

Here is the airflow file that gives trouble:
/migrations/versions/1507a7289a2f_create_is_encrypted.py

These lines throw this error:

conn = op.get_bind()
conn.execute(
connectionhelper.update().values(is_encrypted=False)
)

Here is the error again:
sqlalchemy.exc.ProgrammingError: (psycopg2.ProgrammingError) column “is_encrypted” does not exist
[SQL: ‘UPDATE connection SET is_encrypted=%(is_encrypted)s’] [parameters: {‘is_encrypted’: False}]

Digging more on this…
I could bring up apache airflow webserver and the celery workers.
There is some issues though with the scheduler. The scheduler is using SELECT…FOR UPDATE and UPDATE…FROM that is still not supported by cockroachdb.
https://github.com/cockroachdb/cockroach/issues/7841
https://github.com/cockroachdb/cockroach/pull/19577

Do you guys have plans to support this?

select for update is unlikely to happen. We discussed it at length and it generally doesn’t make sense for CRDB / it cannot be implemented with the same semantics as in Postgres.
About update...from I don’t know. You might want to comment on that issue to show your support.

It might be worth asking the Airflow devs to add support for CRDB as backend, without using those features. Maybe they’ll do it :stuck_out_tongue:

Support for UPDATE FROM is likely to be added in CockroachDB 2.2 (probably not earlier). It needs support first for correlated subqueries and lateral joins, which will be introduced in 2.1.

As Andrei says, SELECT FOR UPDATE on the other hand will probably not be supported. The best course of action is to create a CockroachDB adapter.

Have you tried this one? https://www.cockroachlabs.com/blog/building-application-cockroachdb-sqlalchemy-2/