Fix `cudf.pandas --line-profile` clobbering `__file__` by galipremsagar · Pull Request #23017 · rapidsai/cudf

galipremsagar · 2026-06-27T00:52:13Z

Description

python -m cudf.pandas --line-profile <script> writes an instrumented copy of the script to a temporary file and executes it via runpy.run_path(<temp>), which sets __file__ to that temporary path. Scripts that locate sibling resources relative to __file__ (e.g. Path(__file__).resolve().parent.parent / "data" / "file.parquet") then resolve to the wrong location and fail:

FileNotFoundError: /data/nyc_parking_violations_2022.parquet

The same script runs fine without --line-profile (it is executed directly, so __file__ is correct).

Root cause / fix

The per-line profiler needs the executed code object's filename to be the instrumented temp file — it reads source lines via inspect.stack().code_context (→ linecache on co_filename) and shifts line numbers back to the original — so the temp filename can't simply be swapped for the real one.

This PR keeps the code object's filename pointed at the temp file (per-line profiling output and tracebacks are unchanged) but executes it in a __main__ module whose __file__ — and sys.argv[0] — refer to the original script. This matches the behavior of running without --line-profile, so scripts that resolve paths relative to __file__ keep working.

Tests

Adds test_run_cudf_pandas_line_profile_preserves_file: runs python -m cudf.pandas --line-profile on a script that reads a sibling file via __file__ and asserts it succeeds (and that __file__ is the original script path). The function profiler (--profile) was never affected — it executes the original script directly.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

copy-pr-bot · 2026-06-27T00:52:17Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-06-27T00:58:02Z

📝 Walkthrough

Summary by CodeRabbit

Bug Fixes
- Improved line-profiling runs so scripts preserve the expected __file__ and command-line behavior.
- Script execution now better maintains correct paths for accessing nearby files when profiling is enabled.
Tests
- Added coverage for --line-profile to verify temporary scripts keep the correct resolved file path and still read sibling files as expected.

Walkthrough

cudf.pandas --line-profile now runs instrumented scripts through a dedicated __main__ wrapper that preserves the original script path metadata. A regression test verifies __file__-based file access and the reported script path.

Changes

cudf.pandas line-profile execution

Layer / File(s)	Summary
Instrumented main execution `python/cudf/cudf/pandas/__main__.py`	Adds `_run_instrumented_as_main(...)` and routes `--line-profile` script execution through it while restoring the original script path semantics.
Line-profile regression test `python/cudf/cudf_pandas_tests/test_main.py`	Adds a test that runs a temporary script with `--line-profile` and checks `__file__`-based file access and the emitted script path.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly states the main change: fixing line-profile mode clobbering file.
Description check	✅ Passed	The description matches the changeset and explains the bug, fix, and test coverage.
Linked Issues check	✅ Passed	The changes address `#23010` by preserving file and sys.argv[0] for line-profile runs while keeping normal execution intact.
Out of Scope Changes check	✅ Passed	No obvious unrelated changes were introduced beyond the bug fix and its test.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands.}

galipremsagar · 2026-06-27T04:29:41Z

/merge

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

python/cudf/cudf_pandas_tests/test_main.py (1)

9-17: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

HIGH: Drop shell=True from _run_python.

This helper currently treats command as shell text, so valid temp script paths with spaces or shell metacharacters will be split/mangled by the shell. In this test file the immediate risk is flaky/broken path handling, and it also leaves a latent injection footgun for future callers. Pass an argv list to subprocess.check_output instead.

Proposed fix

+import sys
 import subprocess
 import tempfile
 import textwrap
@@
-def _run_python(*, cudf_pandas, command):
-    executable = "python "
-    if cudf_pandas:
-        executable += "-m cudf.pandas "
-    return subprocess.check_output(
-        executable + command,
-        shell=True,
-        text=True,
-        encoding="utf-8",
-    )
+def _run_python(*, cudf_pandas, command):
+    argv = [sys.executable]
+    if cudf_pandas:
+        argv.extend(["-m", "cudf.pandas"])
+    argv.extend(command)
+    return subprocess.check_output(argv, text=True, encoding="utf-8")

-        res = _run_python(cudf_pandas=True, command=f.name)
-        expect = _run_python(cudf_pandas=False, command=f.name)
+        res = _run_python(cudf_pandas=True, command=[f.name])
+        expect = _run_python(cudf_pandas=False, command=[f.name])

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@python/cudf/cudf_pandas_tests/test_main.py` around lines 9 - 17, The
_run_python helper currently builds a shell command string and passes shell=True
to subprocess.check_output, which can mangle paths and is unsafe for future
callers. Update _run_python to construct and pass an argv list instead of a
single command string, keeping the cudf_pandas toggle behavior in the executable
selection and preserving the existing text/encoding handling.

Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@python/cudf/cudf_pandas_tests/test_main.py`:
- Around line 9-17: The _run_python helper currently builds a shell command
string and passes shell=True to subprocess.check_output, which can mangle paths
and is unsafe for future callers. Update _run_python to construct and pass an
argv list instead of a single command string, keeping the cudf_pandas toggle
behavior in the executable selection and preserving the existing text/encoding
handling.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 459cdb8b-337f-45c9-8951-84983363779d

📥 Commits

Reviewing files that changed from the base of the PR and between b40a6e1 and 34c20b6.

📒 Files selected for processing (1)

python/cudf/cudf_pandas_tests/test_main.py

fix

b40a6e1

github-actions Bot assigned galipremsagar Jun 27, 2026

github-actions Bot added Python Affects Python cuDF API. cudf.pandas Issues specific to cudf.pandas labels Jun 27, 2026

github-project-automation Bot added this to cuDF Python Jun 27, 2026

galipremsagar added bug Something isn't working non-breaking Non-breaking change labels Jun 27, 2026

GPUtester moved this to In Progress in cuDF Python Jun 27, 2026

galipremsagar changed the title ~~fix~~ Fix cudf.pandas --line-profile clobbering __file__ Jun 27, 2026

galipremsagar marked this pull request as ready for review June 27, 2026 00:54

galipremsagar requested a review from a team as a code owner June 27, 2026 00:54

galipremsagar requested review from TomAugspurger and vyasr June 27, 2026 00:54

bdice approved these changes Jun 27, 2026

View reviewed changes

fix

34c20b6

coderabbitai Bot reviewed Jun 27, 2026

View reviewed changes

rapids-bot Bot merged commit 613d7ea into rapidsai:main Jun 27, 2026
126 checks passed

github-project-automation Bot moved this from In Progress to Done in cuDF Python Jun 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix `cudf.pandas --line-profile` clobbering `file`#23017

Fix `cudf.pandas --line-profile` clobbering `file`#23017
rapids-bot[bot] merged 2 commits into
rapidsai:mainfrom
galipremsagar:23010

galipremsagar commented Jun 27, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented Jun 27, 2026

Uh oh!

coderabbitai Bot commented Jun 27, 2026 •

edited

Loading

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

galipremsagar commented Jun 27, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

galipremsagar commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Root cause / fix

Tests

Checklist

Uh oh!

copy-pr-bot Bot commented Jun 27, 2026

Uh oh!

coderabbitai Bot commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

galipremsagar commented Jun 27, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

galipremsagar commented Jun 27, 2026 •

edited

Loading

coderabbitai Bot commented Jun 27, 2026 •

edited

Loading