Skip to content

Fix cudf.pandas --line-profile clobbering __file__#23017

Merged
rapids-bot[bot] merged 2 commits into
rapidsai:mainfrom
galipremsagar:23010
Jun 27, 2026
Merged

Fix cudf.pandas --line-profile clobbering __file__#23017
rapids-bot[bot] merged 2 commits into
rapidsai:mainfrom
galipremsagar:23010

Conversation

@galipremsagar

@galipremsagar galipremsagar commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Description

closes #23010

python -m cudf.pandas --line-profile <script> writes an instrumented copy of the script to a temporary file and executes it via runpy.run_path(<temp>), which sets __file__ to that temporary path. Scripts that locate sibling resources relative to __file__ (e.g. Path(__file__).resolve().parent.parent / "data" / "file.parquet") then resolve to the wrong location and fail:

FileNotFoundError: /data/nyc_parking_violations_2022.parquet

The same script runs fine without --line-profile (it is executed directly, so __file__ is correct).

Root cause / fix

The per-line profiler needs the executed code object's filename to be the instrumented temp file — it reads source lines via inspect.stack().code_context (→ linecache on co_filename) and shifts line numbers back to the original — so the temp filename can't simply be swapped for the real one.

This PR keeps the code object's filename pointed at the temp file (per-line profiling output and tracebacks are unchanged) but executes it in a __main__ module whose __file__ — and sys.argv[0] — refer to the original script. This matches the behavior of running without --line-profile, so scripts that resolve paths relative to __file__ keep working.

Tests

Adds test_run_cudf_pandas_line_profile_preserves_file: runs python -m cudf.pandas --line-profile on a script that reads a sibling file via __file__ and asserts it succeeds (and that __file__ is the original script path). The function profiler (--profile) was never affected — it executes the original script directly.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@copy-pr-bot

copy-pr-bot Bot commented Jun 27, 2026

Copy link
Copy Markdown

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-actions github-actions Bot added Python Affects Python cuDF API. cudf.pandas Issues specific to cudf.pandas labels Jun 27, 2026
@galipremsagar galipremsagar added bug Something isn't working non-breaking Non-breaking change labels Jun 27, 2026
@GPUtester GPUtester moved this to In Progress in cuDF Python Jun 27, 2026
@galipremsagar galipremsagar changed the title fix Fix cudf.pandas --line-profile clobbering __file__ Jun 27, 2026
@galipremsagar galipremsagar marked this pull request as ready for review June 27, 2026 00:54
@galipremsagar galipremsagar requested a review from a team as a code owner June 27, 2026 00:54
@coderabbitai

coderabbitai Bot commented Jun 27, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Summary by CodeRabbit

  • Bug Fixes

    • Improved line-profiling runs so scripts preserve the expected __file__ and command-line behavior.
    • Script execution now better maintains correct paths for accessing nearby files when profiling is enabled.
  • Tests

    • Added coverage for --line-profile to verify temporary scripts keep the correct resolved file path and still read sibling files as expected.

Walkthrough

cudf.pandas --line-profile now runs instrumented scripts through a dedicated __main__ wrapper that preserves the original script path metadata. A regression test verifies __file__-based file access and the reported script path.

Changes

cudf.pandas line-profile execution

Layer / File(s) Summary
Instrumented main execution
python/cudf/cudf/pandas/__main__.py
Adds _run_instrumented_as_main(...) and routes --line-profile script execution through it while restoring the original script path semantics.
Line-profile regression test
python/cudf/cudf_pandas_tests/test_main.py
Adds a test that runs a temporary script with --line-profile and checks __file__-based file access and the emitted script path.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly states the main change: fixing line-profile mode clobbering file.
Description check ✅ Passed The description matches the changeset and explains the bug, fix, and test coverage.
Linked Issues check ✅ Passed The changes address #23010 by preserving file and sys.argv[0] for line-profile runs while keeping normal execution intact.
Out of Scope Changes check ✅ Passed No obvious unrelated changes were introduced beyond the bug fix and its test.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@galipremsagar

Copy link
Copy Markdown
Contributor Author

/merge

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
python/cudf/cudf_pandas_tests/test_main.py (1)

9-17: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

HIGH: Drop shell=True from _run_python.

This helper currently treats command as shell text, so valid temp script paths with spaces or shell metacharacters will be split/mangled by the shell. In this test file the immediate risk is flaky/broken path handling, and it also leaves a latent injection footgun for future callers. Pass an argv list to subprocess.check_output instead.

Proposed fix
+import sys
 import subprocess
 import tempfile
 import textwrap
@@
-def _run_python(*, cudf_pandas, command):
-    executable = "python "
-    if cudf_pandas:
-        executable += "-m cudf.pandas "
-    return subprocess.check_output(
-        executable + command,
-        shell=True,
-        text=True,
-        encoding="utf-8",
-    )
+def _run_python(*, cudf_pandas, command):
+    argv = [sys.executable]
+    if cudf_pandas:
+        argv.extend(["-m", "cudf.pandas"])
+    argv.extend(command)
+    return subprocess.check_output(argv, text=True, encoding="utf-8")
-        res = _run_python(cudf_pandas=True, command=f.name)
-        expect = _run_python(cudf_pandas=False, command=f.name)
+        res = _run_python(cudf_pandas=True, command=[f.name])
+        expect = _run_python(cudf_pandas=False, command=[f.name])
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@python/cudf/cudf_pandas_tests/test_main.py` around lines 9 - 17, The
_run_python helper currently builds a shell command string and passes shell=True
to subprocess.check_output, which can mangle paths and is unsafe for future
callers. Update _run_python to construct and pass an argv list instead of a
single command string, keeping the cudf_pandas toggle behavior in the executable
selection and preserving the existing text/encoding handling.

Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@python/cudf/cudf_pandas_tests/test_main.py`:
- Around line 9-17: The _run_python helper currently builds a shell command
string and passes shell=True to subprocess.check_output, which can mangle paths
and is unsafe for future callers. Update _run_python to construct and pass an
argv list instead of a single command string, keeping the cudf_pandas toggle
behavior in the executable selection and preserving the existing text/encoding
handling.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 459cdb8b-337f-45c9-8951-84983363779d

📥 Commits

Reviewing files that changed from the base of the PR and between b40a6e1 and 34c20b6.

📒 Files selected for processing (1)
  • python/cudf/cudf_pandas_tests/test_main.py

@rapids-bot rapids-bot Bot merged commit 613d7ea into rapidsai:main Jun 27, 2026
126 checks passed
@github-project-automation github-project-automation Bot moved this from In Progress to Done in cuDF Python Jun 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working cudf.pandas Issues specific to cudf.pandas non-breaking Non-breaking change Python Affects Python cuDF API.

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[BUG] cudf.pandas --line-profile broken

3 participants