Skip to content

Conversation

@vdiez
Copy link
Contributor

@vdiez vdiez commented Dec 16, 2025

Summary

This PR introduces a new filesystem cache module (packages/shared/src/fs-cache/) that intercepts ALL filesystem operations, including those from dependencies like TypeScript and ESLint. The cache uses a unified tree structure for efficient storage and supports disk persistence via protobuf serialization.

Key Features

  • Global fs interception: Patches Node.js fs module to cache all filesystem operations
  • Unified tree structure: Single FsNode data structure represents all cached path information
  • Negative caching: Caches "file doesn't exist" results to avoid repeated failed lookups
  • Per-project isolation: Separate cache per project with independent lifecycles
  • Disk persistence: Protobuf serialization for saving/loading cache between runs
  • Memory management: Automatic flush to disk when memory threshold is reached

Design Decisions Explored

1. Unified Tree vs Separate Maps

Initially considered separate Maps for each operation type (files, directories, stats, etc.). Chose unified FsNode structure because:

  • Better represents filesystem reality (one node = one path)
  • Avoids redundant storage (stat info stored once, not duplicated)
  • Easier to reason about cache state
  • Natural handling of partial knowledge (e.g., know directory exists but haven't listed children)

2. Cache Lookup Result Pattern

Implemented CacheLookupResult<T> to distinguish three states:

  • undefined returned: path not in cache (cache miss)
  • { exists: false }: cached knowledge that path doesn't exist
  • { exists: true, value: T }: cached data for existing path

3. opendir/opendirSync Caching

fs.opendir returns a Dir object (async iterator). Implemented CachedDir class that:

  • Reads all directory entries on first access
  • Implements full Dir interface including Symbol.asyncDispose/Symbol.dispose
  • Returns cached entries on iteration

4. Type Reuse

Reuses @types/node types where possible via Pick<Stats, ...> instead of maintaining custom type definitions.

Files Created

File Purpose
cache-types.ts Core type definitions (FsNode, FsNodeStat, CacheStats)
cache-utils.ts Path normalization utilities
project-cache.ts Per-project cache implementation with tree structure
cache-manager.ts Singleton managing multiple project caches
fs-patch.ts Monkey-patching for fs module + CachedDir class
index.ts Public API exports
proto/fs-cache.proto Protobuf schema for disk serialization
proto/fs-cache.js Generated protobuf code
proto/fs-cache.d.ts Generated TypeScript definitions

Cached Operations

  • readFileSync / readFile
  • readdirSync / readdir (with withFileTypes support)
  • statSync / stat
  • lstatSync / lstat
  • existsSync
  • realpathSync / realpath
  • accessSync / access
  • opendirSync / opendir

Next Steps for Integration

  1. Initialize at startup: Call initFsCache() when gRPC/HTTP server starts

    import { initFsCache } from '@sonar/shared/fs-cache';
    initFsCache({ memoryThreshold: 500 * 1024 * 1024 });
  2. Set active project before analysis: Call setActiveProject() with project info

    import { setActiveProject } from '@sonar/shared/fs-cache';
    setActiveProject(projectKey, baseDir, cacheDir);
  3. Load cache from previous run (optional):

    import { loadProjectCache } from '@sonar/shared/fs-cache';
    await loadProjectCache(projectKey, cachePath);
  4. Save cache after analysis:

    import { saveProjectCache } from '@sonar/shared/fs-cache';
    await saveProjectCache(projectKey);
  5. Add cache invalidation: Implement file watcher or use Sonar's file change detection to call invalidatePath() when files change

  6. Add tests: Unit tests for cache operations, integration tests for fs patching

  7. Performance metrics: Use getFsCacheStats() to measure cache effectiveness


🤖 Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com

@hashicorp-vault-sonar-prod hashicorp-vault-sonar-prod bot changed the title Add unified filesystem cache module JS-1001 Add unified filesystem cache module Dec 16, 2025
@hashicorp-vault-sonar-prod
Copy link

hashicorp-vault-sonar-prod bot commented Dec 16, 2025

JS-1001

@vdiez
Copy link
Contributor Author

vdiez commented Dec 16, 2025

Additional Next Steps

Complete fs Module Mirror

Currently we only intercept and cache specific fs methods. To get full visibility into filesystem usage, we should create a 1:1 mirror of the entire fs module:

  • Intercept ALL methods: Even methods we don't cache should be wrapped
  • Log non-cached calls: Track when uncached methods are called (e.g., writeFile, unlink, rename, chmod, etc.)
  • Measure frequency: Count calls to each method to identify optimization opportunities
  • Identify missing caches: If a read-only method is called frequently, we can add caching for it

Example metrics to track:

interface FsMethodStats {
  [methodName: string]: {
    calls: number;
    cached: boolean;
    totalTimeMs: number;
  };
}

This will help us:

  1. Understand which fs methods TypeScript/ESLint actually use
  2. Prioritize which methods to add caching for
  3. Detect any unexpected write operations during analysis

Building Protobuf Sources

The protobuf files in packages/shared/src/fs-cache/proto/ are generated from the .proto schema. To regenerate them after modifying fs-cache.proto:

Prerequisites

Ensure protobufjs is installed (already a dependency in the project).

Generate JavaScript code

npx pbjs -t static-module -w es6 -o packages/shared/src/fs-cache/proto/fs-cache.js packages/shared/src/fs-cache/proto/fs-cache.proto

Generate TypeScript definitions

npx pbts -o packages/shared/src/fs-cache/proto/fs-cache.d.ts packages/shared/src/fs-cache/proto/fs-cache.js

Both commands in sequence

npx pbjs -t static-module -w es6 -o packages/shared/src/fs-cache/proto/fs-cache.js packages/shared/src/fs-cache/proto/fs-cache.proto && \
npx pbts -o packages/shared/src/fs-cache/proto/fs-cache.d.ts packages/shared/src/fs-cache/proto/fs-cache.js

Note: The order matters - pbts needs the generated .js file to produce the .d.ts file.

Consider adding an npm script in packages/shared/package.json for convenience:

{
  "scripts": {
    "generate:fs-cache-proto": "pbjs -t static-module -w es6 -o src/fs-cache/proto/fs-cache.js src/fs-cache/proto/fs-cache.proto && pbts -o src/fs-cache/proto/fs-cache.d.ts src/fs-cache/proto/fs-cache.js"
  }
}

@zglicz zglicz changed the base branch from master to typescript-program-caching December 17, 2025 09:43
@vdiez
Copy link
Contributor Author

vdiez commented Dec 17, 2025

FS Loader Hook for ESM Compatibility

Added a Node.js loader hook at packages/shared/src/fs-loader/ to make fs monkey-patching work with all ESM import patterns.

Why We Need It

In ESM, monkey-patching fs only works for default imports:

import fs from 'fs';
fs.readFileSync = patched;  // ✅ Works

import { readFileSync } from 'fs';
readFileSync('./file.txt');  // ❌ Still uses original!

import * as fs from 'fs';
fs.readFileSync('./file.txt');  // ❌ Still uses original!

This is because ESM named imports bind directly to the module's export at import time (immutable binding), not to a property on an object.

When We Need It

Scenario Loader Needed
CommonJS (require) NO
ESM default imports only NO
ESM named imports ({ readFileSync }) YES
ESM namespace imports (* as fs) YES
ESM patching CJS libraries (e.g., TypeScript) NO

For the FS cache use case: If we only need to intercept TypeScript's fs calls, we don't need the loader because TypeScript uses require('fs') internally (CJS), and ESM patches to fs affect CJS require calls.

The loader is only needed if SonarJS's own code or ESM dependencies use named imports.

How It Works

The loader intercepts fs and fs/promises imports and replaces them with a synthetic module that wraps all exports as delegating functions:

// What the loader generates:
import { createRequire } from 'node:module';
const require = createRequire('<loader-url>');
const fs = require('fs');  // Bypasses ESM loader, gets real fs

export function readFileSync(...args) { return fs.readFileSync(...args); }
export function existsSync(...args) { return fs.existsSync(...args); }
// ... all other fs functions
export default fs;

Why this works: The wrapper functions are ESM exports (immutable), but their bodies perform a dynamic property lookup on the mutable fs object at call time, not export time. When you patch fs.readFileSync, the wrappers see the patch because they delegate through fs.X.

Usage

node --import ./packages/shared/src/fs-loader/register.mjs your-app.mjs

Then patch fs normally - all import patterns will see the patch.

Files

  • fs-wrapper-loader.mjs - Main loader with detailed documentation
  • register.mjs - Entry point for --import flag, includes decision guide for when loader is needed

@vdiez
Copy link
Contributor Author

vdiez commented Dec 18, 2025

Today's Changes (Dec 18)

1. Dynamic FS Operation Tracking & Cache Control

  • Added caller tracking via stack trace analysis to identify which packages make fs calls
  • Tracks uncached/passthrough operations separately from cached ones
  • Added environment variables for cache control:
    FS_CACHE=0       # Disable caching (baseline mode)
    FS_CACHE=cold    # Enable caching but don't load existing cache
    FS_CACHE=1       # Enable caching with warm start (default)

2. File Descriptor Caching

Added caching for fs.openSync/fs.closeSync/fs.fstatSync/fs.readSync:

  • Virtual file descriptors mapped to cached file content
  • Tracks read positions for sequential reading
  • Falls back to real filesystem on cache miss

3. Portable Cache with Relative Paths

  • Cache now stores relative paths instead of absolute
  • Resolved against baseDir when loading
  • Makes cache files portable across machines/environments

4. TypeScript CompilerHost Blocking

Prevents TypeScript from walking up parent directories looking for node_modules:

  • Added isAllowedPath() that only allows paths under baseDir or TypeScript's lib directory
  • Modified readFile(), fileExists(), added directoryExists() to block external paths
  • Impact: ~1,000 fs calls blocked, ~5 second improvement
private isAllowedPath(filePath: string): boolean {
  return filePath.startsWith(this.baseDirPrefix) || filePath.startsWith(this.tsLibDirPrefix);
}

vdiez and others added 3 commits December 19, 2025 15:39
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: zglicz <michal.zgliczynski@sonarsource.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
@zglicz zglicz force-pushed the typescript-program-caching branch from e85b35b to 1de976f Compare December 19, 2025 14:39
@vdiez
Copy link
Contributor Author

vdiez commented Dec 19, 2025

FS Cache Caller Tracking Study - Findings

I conducted a study to understand which npm dependencies are responsible for filesystem operations during analysis. Here are the comprehensive findings:

FS Access Distribution (16,511 total tracked ops):

| Dependency | Total Ops | Disk | Cache | Hit% | Top Operations | |-----------------------|-----------|-------|--------|--------|--------------------------------------------------------|
| resolve | 15,609 | 2,634 | 12,975 | 83.1% | stat:14,282, readFile:1,327 |
| typescript | 727 | 154 | 573 | 78.8% | stat:651, readFile:22, openSync:22, realpath.native:16 |
| sonarjs/other | 123 | 122 | 1 | 0.8% | readFile:98, readdir:22, stat:2 |
| find-up-simple | 36 | 10 | 26 | 72.2% | stat:36 |
| tsx | 15 | 15 | 0 | 0.0% | realpath:10, readFile:5 |
| eslint-plugin-unicorn | 1 | 0 | 1 | 100.0% | readFile:1 |

Now the totals match:

  • Cache Stats: 16,567 total fs operations
  • FS Access Distribution: 16,511 tracked ops (difference of 56 is likely edge cases like non-string paths)

Key insights:

  • resolve dominates with 94% of all fs operations (15,609 ops), mostly doing stat calls to find module paths
  • Most external path operations are in the "Disk" column - e.g., resolve's 2,634 disk reads include the 1,713 external:stat operations
  • typescript uses 727 ops total, including fd operations (openSync/readSync/closeSync) and realpath.native
  • tsx (the test runner) does 15 external realpath and readFile operations
  • Cache hit rate across all tracked ops: 82.2% (13,576 cache / 16,511 total)

We don't use resolve directly. It's a transitive dependency:

  ├─┬ eslint-plugin-import
  │ └─┬ eslint-import-resolver-node
  │   └── resolve@1.22.11
  ├─┬ eslint-plugin-react
  │ └── resolve@2.0.0-next.5

📦 Resolve Callers (15609 total calls): eslint-import-resolver-node 15609 (100.0%)
Result: All 15,609 resolve calls (100%) come from eslint-import-resolver-node, which is a dependency of eslint-plugin-import.

  • eslint-plugin-import: 100% of resolve calls (via eslint-import-resolver-node)
  • eslint-plugin-react: 0% of resolve calls

So eslint-plugin-react doesn't use the resolve package at all for module resolution - only eslint-plugin-import does.
This is a big finding - the ESLint import plugin is doing massive amounts of module resolution (14,282 stat calls alone) to validate imports. That's likely a significant performance bottleneck.

Key Findings

  1. resolve package is the biggest fs consumer

    • All calls come from eslint-import-resolver-node (100%)
    • This is a transitive dependency of eslint-plugin-import
    • eslint-plugin-react does NOT use resolve for fs operations
    • Most operations are stat calls checking for module existence
  2. TypeScript's uncached operations

    • TypeScript uses fd-based operations (openSync/readSync/closeSync) for .d.ts files
    • These bypass our caching layer by design (we found that caching fd ops causes TS to lose type information)
    • Only 202 operations total - relatively small footprint
  3. External path operations (~1,800 ops)

    • Operations outside baseDir are tracked but not cached
    • Mostly from resolve checking node_modules paths up the directory tree
  4. Cache effectiveness

    • Operations inside baseDir: 92%+ hit rate
    • stat operations: 12,247 hits / 1,011 misses (92.4%)
    • readFile operations: 1,333 hits / 107 misses (92.6%)

Performance Impact of Stats Tracking

The stats tracking itself has ~3% overhead due to stack trace analysis on every fs call:

  • Without stats: 21,972ms analysis
  • With stats: 22,630ms analysis

Changes Made

  1. Removed caller tracking code - The dependency analysis code was temporary for this study
  2. Made stats optional - Added FS_CACHE_STATS=1 env var to enable detailed stats (disabled by default)
  3. Normalized baseDir - Fixed path normalization for consistent cache key handling

Usage

# Default - no stats overhead
npx tsx --test packages/ruling/projects/Joust.ruling.test.ts

# With detailed stats (for debugging)
FS_CACHE_STATS=1 npx tsx --test packages/ruling/projects/Joust.ruling.test.ts

Conclusions

  • The fs-cache is working effectively with 92%+ hit rates on cacheable operations
  • The main remaining disk I/O comes from:
    • resolve package (module resolution for eslint-plugin-import)
    • TypeScript's fd-based operations (by design, for correct type info)
    • External path checks (outside project baseDir)

vdiez and others added 7 commits December 19, 2025 17:07
Implement a global filesystem cache that intercepts all fs operations,
including those from dependencies like TypeScript and ESLint.

Key features:
- Unified tree structure (FsNode) instead of separate maps per operation
- Negative caching - caches "file doesn't exist" results
- Per-project cache isolation with disk persistence (protobuf)
- Full opendir/opendirSync support with cached Dir implementation
- Reuses @types/node types where possible

Cached operations: readFile, readdir, opendir, stat, lstat, exists,
realpath, realpathSync.native, access (both sync and async variants)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Track all fs operations dynamically (cached and uncached)
- Add caller tracking per operation with Map-based stats
- Add FS_CACHE env variable for cache control:
  - FS_CACHE=0: disable caching (baseline mode)
  - FS_CACHE=cold: cold start (no cache load)
  - default: warm start with cache loading
- Add detailed timing breakdown for cache load/save
- Update testProject.ts with comprehensive stats display

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Implement openSync, readSync, closeSync, fstatSync caching
- Use real kernel fds when files exist (TypeScript compatibility)
- Fall back to fake fds for offline mode when files don't exist
- Track fd -> path mapping for serving cached content
- Add comparison/debug modes for troubleshooting

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Only cache files/directories under project baseDir
- Files outside baseDir pass through to real filesystem
- Store paths relative to baseDir for cache portability
- Cache size reduced ~82% (only project files, not TypeScript libs)
- Add isUnderBaseDir() and toRelativeCachePath() utilities

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Restrict TypeScript's filesystem access to project directory and TS lib files
to prevent unnecessary parent directory walks during module resolution.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add STATS_ENABLED flag controlled by FS_CACHE_STATS=1 env var
- Wrap all recordMiss/recordExternal/recordUncached/recordHit calls
  with STATS_ENABLED conditional to skip stack trace analysis
- Make printCacheStats output conditional on STATS_ENABLED
- Normalize baseDir to Unix-style path in ProjectFsCache

This improves performance by ~3% by avoiding stack trace analysis
on every fs call when detailed stats are not needed.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@vdiez vdiez force-pushed the fs-cache-unified-tree branch from 2fea59e to 544c257 Compare December 19, 2025 16:07
@zglicz zglicz force-pushed the typescript-program-caching branch from 80f82ed to 23da93c Compare December 22, 2025 14:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants