Storage Spilling
Condition
Query in RUNNING
status write too much temporary data on disk. Spilling is usually caused by very large sorts (DISTINCT, ROW_NUMBER() window function, etc.).
How to fix
Consider adding more filters and processing smaller amount of data in one go.
Consider adding more steps with explicit pre-aggregation before sorting to reduce its complexity.
Example
StorageSpillingCondition(
min_local_spilling_gb=50, # 50Gb spill to local storage
min_remote_spilling_gb=1, # 1Gb spill to remote storage
warning_duration=60 * 10, # 10 minutes for waring
kill_duration=60 * 20, # 20 minutes for kill
),
Specific arguments
min_local_spilling_gb (int) - how much data should be spilled on local disk before condition matches, in gigabytes (recommended min value is at least 10Gb to prevent false positives)
min_remote_spilling_gb (int) - how much data should be spilled on remote disk before condition matches, in gigabytes (recommended value is 1Gb, since we do not want see remote spilling at all)
Last updated