Union without ALL

Condition

Query in RUNNING status contains UNION instead of UNION ALL, and it aggregates a large number of rows.

How to fix

  • Use UNION ALL instead of UNION. If you actually need de-duplication, consider using GROUP BY or DISTINCT on sub-query with multiple tables combined by UNION ALL.

  • Process data in smaller chunks.

Example

    UnionWithoutAllCondition(
        min_input_rows=10_000_000,  # at least 10M input rows for UNION without ALL
        notice_duration=60 * 10,  # 10 minutes for notice
    ),

Specific arguments

  • min_input_rows (int) - how many rows should go into Aggregate node caused by UNION (recommended min value is at least 1M to prevent false positives)

Last updated