Skip to content

ARM64环境下,使用gaussdb 506.0SPC0100版本的libpq,在连接失败的时候python进程会coredump #30

@Dark-Athena

Description

@Dark-Athena
(.env311) [root@ae910d8786f844c4b317045140860809 test_gaussdb]# cat test1.py 
#!/usr/bin/env python3
import argparse
import os
import sys

def main() -> int:

    try:
        import gaussdb
    except Exception as exc:
        print(f"[ERROR] Failed to import gaussdb: {exc}")
        return 3

    print("[INFO] gaussdb module:", getattr(gaussdb, "__file__", "<unknown>"))
    print("[INFO] gaussdb version:", getattr(gaussdb, "__version__", "<unknown>"))

    conn = None
    try:
        conn = gaussdb.connect('host=127.0.0.1 port=3333 dbname=database_name user=username password=password', connect_timeout=10)
        with conn.cursor() as cur:
            cur.execute("select 1")
            row = cur.fetchone()
        print("[OK] Connection succeeded, query result:", row)
        return 0
    except Exception as exc:
        print(f"[ERROR] Connection/query failed: {exc}")
        return 1
    finally:
        if conn is not None:
            try:
                conn.close()
            except Exception:
                pass


if __name__ == "__main__":
    sys.exit(main())
(.env311) [root@ae910d8786f844c4b317045140860809 test_gaussdb]# export LD_LIBRARY_PATH=/workspace/test_gaussdb/gaussdb-505.2-libpq/lib #使用505.2版本libpq不coredump
(.env311) [root@ae910d8786f844c4b317045140860809 test_gaussdb]# python test1.py 
[INFO] gaussdb module: /workspace/test_gaussdb/gaussdb-python/gaussdb/gaussdb/__init__.py
[INFO] gaussdb version: 1.0.4
[ERROR] Connection/query failed: connection failed: could not connect to server: Operation now in progress
        Is the server running on host "127.0.0.1" and accepting
        TCP/IP connections on port 3333?
(.env311) [root@ae910d8786f844c4b317045140860809 test_gaussdb]# export LD_LIBRARY_PATH=/workspace/test_gaussdb/gaussdb-506.0-libpq/lib  #使用506.0版本libpq会coredump
(.env311) [root@ae910d8786f844c4b317045140860809 test_gaussdb]# python test1.py 
[INFO] gaussdb module: /workspace/test_gaussdb/gaussdb-python/gaussdb/gaussdb/__init__.py
[INFO] gaussdb version: 1.0.4
Segmentation fault (core dumped)
(.env311) [root@ae910d8786f844c4b317045140860809 test_gaussdb]# arch
aarch64
(.env311) [root@ae910d8786f844c4b317045140860809 test_gaussdb]# python --version
Python 3.11.14

写C程序直接调用506.0的libpq不会coredump

(.env311) [root@ae910d8786f844c4b317045140860809 test_gaussdb]# cat test_pq.c
#include <stdio.h>
#include <stdlib.h>
#include <libpq-fe.h>

int main() {
    // 连接字符串,请根据实际情况修改数据库名、用户名、密码和主机
    PGconn *conn = PQconnectdb("dbname=postgres user=postgres password=secret host=localhost");

    if (PQstatus(conn) != CONNECTION_OK) {
        fprintf(stderr, "连接失败: %s", PQerrorMessage(conn));
        PQfinish(conn);
        exit(1);
    }

    printf("成功连接到 PostgreSQL\n");
    PQfinish(conn);
    return 0;
}(.env311) [root@ae910d8786f844c4b317045140860809 test_gaussdb]# gcc -o test_pq test_pq.c -I/workspace/test_gaussdb/gaussdb-506.0-libpq/include -L/workspace/test_gaussdb/gaussdb-506.0-libpq/lib -lpq
(.env311) [root@ae910d8786f844c4b317045140860809 test_gaussdb]# ./test_pq 
连接失败: could not connect to server: Operation now in progress
        Is the server running on host "localhost" (::1) and accepting
        TCP/IP connections on port 5432?
could not connect to server: Operation now in progress
        Is the server running on host "localhost" (127.0.0.1) and accepting
        TCP/IP connections on port 5432?
(.env311) [root@ae910d8786f844c4b317045140860809 test_gaussdb]# 

下面是GPT-5.3-CODEX的分析

细化分析

  • 崩溃链路是:连接失败后在 generators.py 调用 finish_pgconn(),随后在 errors.py 读取 pgconn 属性快照。
  • 这些属性(如 hostpassword)会走到 pq_ctypes.py,内部通过 _call_bytes() 调 C API(PQhost/PQpass),见 pq_ctypes.py。
  • _call_bytes()c_char_p 返回值会触发 ctypes 解引用;如果 C 层返回的是“非空但无效地址”,Python 来不及抛异常就会直接 SIGSEGV。你看到的 core 正是这一类崩溃。
  • 我做了独立最小复现:在 connect_poll 失败后直接 print(conn.host),无需上层框架也会立刻段错误,所以问题不在异常对象构造本身,而在“失败态 PGconn + PQhost/PQpass 访问”这个组合。
  • 你 C 程序不崩是因为只调用了 PQstatus/PQerrorMessage/PQfinish,没有触发 PQhost/PQpass 路径。

为什么说是 libpq 组合问题

  • 该驱动在 misc.py 用 find_library("pq") 动态找库,实际加载结果受 LD_LIBRARY_PATH、系统库优先级影响。
  • 你当前这套库报告 PQlibVersion = 90204 (9.2.4),但又来自 gaussdb-506.0-libpq 打包目录,说明是“Gauss 发行库 + 当前系统运行时”的特定组合;在这个组合里,失败态访问某些字符串 getter 不安全。

结论边界

  • 我能 100% 确认“崩溃触发点”是 PQhost/PQpass 在失败态 PGconn 上被调用。
  • 我不能在无 libpq 源码/符号的前提下 100% 断言是 libpq 内部 bug 还是 ABI 兼容性问题,但工程上两者都应按“不安全调用路径”处理

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions