Skip to content

[STORY] Scheduler Leader Election with pg_advisory_lock #423

@jsbattig

Description

@jsbattig

Part of: #408

Story: Scheduler Leader Election with pg_advisory_lock

Part of: #408

[Conversation Reference: "I think it's cleaner that one server takes over scheduling work" and "we will have to have a system to prevent duplicate schedules"]

Story Overview

Objective: Implement leader election using PostgreSQL's pg_advisory_lock so that exactly one cluster node runs the scheduler (RefreshScheduler, DataRetentionScheduler, and other periodic tasks). The advisory lock is held on a dedicated PostgreSQL connection -- if the connection drops (node crash), the lock is automatically released and another node can acquire it.

User Value: Periodic tasks (repo refresh, data retention cleanup, etc.) run exactly once across the cluster, not once per node. No duplicate scheduling, no missed schedules.

Acceptance Criteria

AC1: Advisory Lock Acquisition for Leadership

Scenario: One node acquires the scheduler leadership lock.

Given multiple CIDX nodes are starting up
When each node attempts to acquire pg_advisory_lock(lock_id)
Then exactly one node succeeds (returns true)
And the winning node starts all scheduler services
And losing nodes do NOT start scheduler services
And losing nodes continue serving queries normally

Technical Requirements:

  • LeaderElectionService in src/code_indexer/server/services/leader_election_service.py
  • Uses pg_advisory_lock(lock_id) on a DEDICATED connection (not from pool)
  • Lock ID: fixed integer constant (e.g., 0x434944585F4C4452 -- "CIDX_LDR" in hex)
  • Non-blocking attempt first: pg_try_advisory_lock(lock_id) to check if available
  • Dedicated connection held for the lifetime of leadership (connection = lock lifetime)

AC2: Scheduler Services Gated by Leadership

Scenario: Only the leader node runs periodic schedulers.

Given Node A is the leader
When scheduler startup runs
Then Node A starts: RefreshScheduler, DataRetentionScheduler, and other periodic services
And Node B (follower) does NOT start these schedulers
And Node B's startup log says: "Not leader, skipping scheduler startup"
And Node A's startup log says: "Acquired leadership, starting schedulers"

Technical Requirements:

  • LeaderElectionService.is_leader() returns bool
  • Startup lifespan checks is_leader() before starting each scheduler
  • List of leader-only services: RefreshScheduler, DataRetentionScheduler, meta description hook timer, dependency map scheduler
  • Non-leader nodes skip scheduler initialization entirely (not just pause)

AC3: Dedicated Connection for Lock Holding

Scenario: The advisory lock is held on a connection that stays open.

Given the leader node holds pg_advisory_lock on a dedicated connection
When the connection remains open
Then the lock is held continuously
And the lock is NOT released between operations (session-level lock)
And the connection is separate from the connection pool used by backends

Technical Requirements:

  • Separate psycopg connection (NOT from the shared pool)
  • Connection opened at leader election time, kept alive
  • pg_advisory_lock (session-level, not transaction-level)
  • Connection keepalive enabled (TCP keepalive to detect network failure)
  • No other operations performed on this connection (lock-only)

AC4: Automatic Lock Release on Node Failure

Scenario: If the leader crashes, the lock is automatically released.

Given Node A is the leader holding the advisory lock
When Node A's process crashes (kill -9, OOM, etc.)
Then the PostgreSQL connection is dropped
And PostgreSQL automatically releases the session-level advisory lock
And the lock becomes available for other nodes to acquire

Technical Requirements:

  • PostgreSQL connection drop = automatic lock release (built-in behavior)
  • TCP keepalive settings: keepalives=1, keepalives_idle=10, keepalives_interval=5, keepalives_count=3
  • Connection drop detected by PostgreSQL within ~25 seconds of network failure
  • No application-level lock release needed for crash scenarios

Implementation Status

  • Core implementation complete
  • Unit tests passing
  • Integration tests passing
  • E2E tests passing
  • Code review approved
  • Manual E2E testing completed
  • Documentation updated

Technical Implementation Details

File Structure

src/code_indexer/server/services/
    leader_election_service.py    # LeaderElectionService

Advisory Lock Pattern

class LeaderElectionService:
    LOCK_ID = 0x434944585F4C4452  # "CIDX_LDR"

    def __init__(self, postgres_url: str):
        self._postgres_url = postgres_url
        self._lock_conn: Optional[psycopg.Connection] = None
        self._is_leader = False

    def try_acquire_leadership(self) -> bool:
        """Attempt to become the leader. Non-blocking."""
        self._lock_conn = psycopg.connect(
            self._postgres_url,
            autocommit=True,
            options="-c statement_timeout=5000"
        )
        # Configure TCP keepalive on the connection
        row = self._lock_conn.execute(
            "SELECT pg_try_advisory_lock(%s)", (self.LOCK_ID,)
        ).fetchone()
        self._is_leader = row[0]  # True if lock acquired
        if not self._is_leader:
            self._lock_conn.close()
            self._lock_conn = None
        return self._is_leader

    def is_leader(self) -> bool:
        return self._is_leader

    def release(self) -> None:
        """Graceful release on shutdown."""
        if self._lock_conn:
            self._lock_conn.close()  # Releases advisory lock
            self._is_leader = False

Startup Integration

1. Create connection pool (shared backends)
2. Run migrations
3. Validate NFS mount
4. Try acquire leadership  <-- THIS STORY
5. If leader: start schedulers
6. If not leader: skip schedulers, log info
7. Start query serving (all nodes)
8. Start heartbeat service (all nodes)
9. Start reconciliation service (all nodes)

Leader-Only vs All-Node Services

Service Leader Only All Nodes
RefreshScheduler YES -
DataRetentionScheduler YES -
MetaDescriptionHook timer YES -
DependencyMapScheduler YES -
Query serving - YES
Job claiming/execution - YES
Heartbeat - YES
Reconciliation - YES
Health endpoint - YES

Testing Requirements

  • Automated: Single node acquires leadership successfully.
  • Automated: Second node fails to acquire leadership (non-blocking, returns false).
  • Automated: Leader release makes lock available again.
  • Automated: Scheduler services only start on leader node.
  • Manual E2E: Start 2 nodes, verify only one runs schedulers (check logs), kill leader, verify follower acquires leadership (Story 16).

Definition of Done

  • LeaderElectionService acquires pg_advisory_lock on dedicated connection
  • Only leader node starts scheduler services
  • Non-blocking attempt (pg_try_advisory_lock) for followers
  • TCP keepalive configured for leader connection
  • Lock automatically released on connection drop
  • Startup integration complete (leadership check before scheduler init)
  • All tests pass

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions