sqlalchemy FULL OUTER JOIN

How to implement FULL OUTER JOIN in sqlalchemy on orm level.

Here my code:

q1 = (db.session.query(
        tb1.user_id.label('u_id'),
        func.count(tb1.id).label('tb1_c')
    )
    .group_by(tb1.user_id)
)
q2 = (db.session.query(
        tb2.user_id.label('u_id'),
        func.count(tb2.id).label('tb2_c')
    )
    .group_by(tb2.user_id)
)

above two queries and I want to apply FULL OUTER JOIN on them.

Best answer

First of all, sqlalchemy does not support FULL JOIN out of the box, and for some good reasons. So any solution proposed will consist of two parts:

  1. a work-around for missing functionality
  2. sqlalchemy syntax to build a query for that work-around

Now, for the reasons to avoid the FULL JOIN, please read some old blog Better Alternatives to a FULL OUTER JOIN.
From this very blog I will take the idea of how to avoid FULL JOIN by adding 0 values to the missing columns and aggregating (SUM) on UNION ALL intead. SA code might look something like below:

q1 = (session.query(
        tb1.user_id.label('u_id'),
        func.count(tb1.id).label('tb1_c'),
        literal(0).label('tb2_c'), # @NOTE: added 0
      ).group_by(tb1.user_id))
q2 = (session.query(
        tb2.user_id.label('u_id'),
        literal(0).label('tb1_c'), # @NOTE: added 0
        func.count(tb2.id).label('tb2_c')
      ).group_by(tb2.user_id))

qt = union_all(q1, q2).alias("united")
qr = select([qt.c.u_id, func.sum(qt.c.tb1_c), func.sum(qt.c.tb2_c)]).group_by(qt.c.u_id)

Having composed the query above, I actually might consider other options:

  • simply execute those two queries separately and aggregate the results already in Python itself (for not so large results sets)
  • given that it looks like some kind of reporting functionality rather than business model workflow, create a SQL query and execute it directly via engine. (only if it really is much better performing though)