Skip to content

[VL] Memory Leak possibly TableScan #9456

@nimesh1601

Description

@nimesh1601

Backend

VL (Velox)

Bug description

There seems to be a memory leak from Table Scan
Got this response from Velox team : Velox Issue

Sample query
WITH

cdl_summary AS (
SELECT
user_id,
CASE
WHEN <TRACKING_LABEL_COL> = 'store_front' THEN 'storefront'
ELSE <TRACKING_LABEL_COL>
END AS source,

SUM(CASE WHEN name IN (<IMPRESSION_EVENTS>)
     AND datestr BETWEEN :START_7D   AND :END_DATE THEN 1 ELSE 0 END) AS impression_count_7d,
SUM(CASE WHEN name IN (<IMPRESSION_EVENTS>)
     AND datestr BETWEEN :START_14D  AND :END_DATE THEN 1 ELSE 0 END) AS impression_count_14d,
SUM(CASE WHEN name IN (<IMPRESSION_EVENTS>)
     AND datestr BETWEEN :START_28D  AND :END_DATE THEN 1 ELSE 0 END) AS impression_count_28d,
SUM(CASE WHEN name IN (<IMPRESSION_EVENTS>)
     AND datestr BETWEEN :START_56D  AND :END_DATE THEN 1 ELSE 0 END) AS impression_count_56d,
SUM(CASE WHEN name IN (<IMPRESSION_EVENTS>)
     AND datestr BETWEEN :START_112D AND :END_DATE THEN 1 ELSE 0 END) AS impression_count_112d,
 
SUM(CASE WHEN name IN (<CLICK_EVENTS>)
     AND datestr BETWEEN :START_7D   AND :END_DATE THEN 1 ELSE 0 END) AS click_count_7d,
SUM(CASE WHEN name IN (<CLICK_EVENTS>)
     AND datestr BETWEEN :START_14D  AND :END_DATE THEN 1 ELSE 0 END) AS click_count_14d,
SUM(CASE WHEN name IN (<CLICK_EVENTS>)
     AND datestr BETWEEN :START_28D  AND :END_DATE THEN 1 ELSE 0 END) AS click_count_28d,
SUM(CASE WHEN name IN (<CLICK_EVENTS>)
     AND datestr BETWEEN :START_56D  AND :END_DATE THEN 1 ELSE 0 END) AS click_count_56d,
SUM(CASE WHEN name IN (<CLICK_EVENTS>)
     AND datestr BETWEEN :START_112D AND :END_DATE THEN 1 ELSE 0 END) AS click_count_112d,
 
SUM(CASE WHEN name IN (<ORDER_EVENTS>)
     AND datestr BETWEEN :START_7D   AND :END_DATE THEN 1 ELSE 0 END) AS order_count_7d,
SUM(CASE WHEN name IN (<ORDER_EVENTS>)
     AND datestr BETWEEN :START_14D  AND :END_DATE THEN 1 ELSE 0 END) AS order_count_14d,
SUM(CASE WHEN name IN (<ORDER_EVENTS>)
     AND datestr BETWEEN :START_28D  AND :END_DATE THEN 1 ELSE 0 END) AS order_count_28d,
SUM(CASE WHEN name IN (<ORDER_EVENTS>)
     AND datestr BETWEEN :START_56D  AND :END_DATE THEN 1 ELSE 0 END) AS order_count_56d,
SUM(CASE WHEN name IN (<ORDER_EVENTS>)
     AND datestr BETWEEN :START_112D AND :END_DATE THEN 1 ELSE 0 END) AS order_count_112d,
 
AVG(CASE 
      WHEN name = 'marketplace_scrolled'
       AND datestr BETWEEN :START_7D AND :END_DATE THEN 0
      WHEN name IN ('feed_item_card_scrolled','feed_item_dish_card_scrolled')
       AND datestr BETWEEN :START_7D AND :END_DATE
       THEN feed.display_item_position
      ELSE NULL
    END) AS avg_scroll_depth_7d,
 
 
MAX(CASE WHEN name IN (<INTERACTION_EVENTS>)
         AND datestr BETWEEN :START_112D AND :END_DATE
     THEN epoch_ms ELSE NULL END) AS latest_interaction_time

FROM <SCHEMA_CDL>.<TABLE_CDL> cdl
WHERE TRUE
AND datestr BETWEEN :START_112D AND :END_DATE
AND name IN (<IMPRESSION_EVENTS>, <CLICK_EVENTS>, <ORDER_EVENTS>)
AND COALESCE(<SESSION_ID_COL>, user_id) IS NOT NULL
AND is_first_event = TRUE
AND user_id IS NOT NULL AND user_id <> ''
AND <FEED_CONTEXT_COL> IN ('home','vertical','allstores','all_stores')

GROUP BY 1,2
),

xlb_summary AS (
SELECT
user_id,
CASE
WHEN <TRACKING_LABEL_COL> = 'store_front' THEN 'storefront'
ELSE <TRACKING_LABEL_COL>
END AS source,

MAX(CASE WHEN name IN (<INTERACTION_EVENTS>)
         AND datestr BETWEEN :START_112D AND :END_DATE
     THEN epoch_ms ELSE NULL END) AS latest_interaction_time

FROM <SCHEMA_XLB>.<TABLE_XLB> xlb
WHERE TRUE

GROUP BY 1,2
),

data_summary AS (
SELECT
COALESCE(cdl.user_id, xlb.user_id) AS user_id,
COALESCE(cdl.source, xlb.source) AS source,
COALESCE(cdl.impression_count_7d, 0)
+ COALESCE(xlb.impression_count_7d, 0) AS impression_count_7d,

GREATEST(cdl.latest_interaction_time, xlb.latest_interaction_time)
  AS latest_interaction_time

FROM cdl_summary cdl
FULL OUTER JOIN xlb_summary xlb
ON cdl.user_id = xlb.user_id
AND cdl.source = xlb.source
),

summary_agg AS (
SELECT
user_id,
SUM(impression_count_7d) AS total_impression_7d,

SUM(order_count_112d)     AS total_order_112d

FROM data_summary
GROUP BY 1
)

INSERT OVERWRITE TABLE <SCHEMA_TARGET>.<TABLE_TARGET>
PARTITION (datestr)
SELECT
CONCAT(d.user_id, '|', d.source) AS uuid,
d.user_id,
d.source,
d.impression_count_7d,

agg.total_impression_7d,

d.latest_interaction_time,
:END_DATE AS datestr
FROM data_summary d
JOIN summary_agg agg
ON d.user_id = agg.user_id
WHERE d.source IS NOT NULL;

Gluten version

No response

Spark version

None

Spark configurations

NA

System information

NA

Relevant logs

E20250427 14:32:18.128706   176 VeloxMemoryManager.cc:401] Failed to release Velox memory manager after 43350ms as there are still outstanding memory resources. 
E20250427 14:32:18.128832   176 MemoryPool.cpp:435] [MEM] Memory leak (Used memory): Memory Pool[op.0.0.0.TableScan LEAF root[root] parent[node.0] MALLOC track-usage thread-safe]<unlimited max capacity unlimited capacity used 276.00KB available 748.00KB reservation [used 276.00KB, reserved 1.00MB, min 0B] counters [allocs 114189, frees 114184, reserves 0, releases 1, collisions 0])>
E20250427 14:32:18.128959   176 Exceptions.h:66] Line: cpp/velox/memory/VeloxMemoryManager.cc:102, Function:removePool, Expression: pool->reservedBytes() == 0 (1048576 vs. 0), Source: RUNTIME, ErrorCode: INVALID_STATE
terminate called after throwing an instance of 'facebook::velox::VeloxRuntimeError'
  what():  Exception: VeloxRuntimeError
Error Source: RUNTIME
Error Code: INVALID_STATE
Reason: (1048576 vs. 0)
Retriable: False
Expression: pool->reservedBytes() == 0
Function: removePool
File: cpp/velox/memory/VeloxMemoryManager.cc
Line: 102
Stack trace:
# 0  _ZN8facebook5velox7process10StackTraceC1Ei
# 1  _ZN8facebook5velox14VeloxExceptionC1EPKcmS3_St17basic_string_viewIcSt11char_traitsIcEES7_S7_S7_bNS1_4TypeES7_
# 2  _ZN8facebook5velox6detail14veloxCheckFailINS0_17VeloxRuntimeErrorERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEvRKNS1_18VeloxCheckFailArgsET0_
# 3  _ZN6gluten20ListenableArbitrator10removePoolEPN8facebook5velox6memory10MemoryPoolE
# 4  _ZN8facebook5velox6memory13MemoryManager8dropPoolEPNS1_10MemoryPoolE
# 5  _ZN8facebook5velox6memory14MemoryPoolImplD2Ev
# 6  _ZNSt16_Sp_counted_baseILN9__gnu_cxx12_Lock_policyE2EE24_M_release_last_use_coldEv
# 7  _ZN6gluten18VeloxMemoryManagerD1Ev
# 8  _ZN6gluten18VeloxMemoryManagerD0Ev
# 9  _ZN6gluten13MemoryManager7releaseEPS0_
# 10 Java_org_apache_gluten_memory_NativeMemoryManagerJniWrapper_release
# 11 0x00007fc63ca15074

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtriage

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions