-
Notifications
You must be signed in to change notification settings - Fork 192
JS-1001 Add unified filesystem cache module #6086
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: typescript-program-caching
Are you sure you want to change the base?
Conversation
Additional Next StepsComplete fs Module MirrorCurrently we only intercept and cache specific fs methods. To get full visibility into filesystem usage, we should create a 1:1 mirror of the entire
Example metrics to track: interface FsMethodStats {
[methodName: string]: {
calls: number;
cached: boolean;
totalTimeMs: number;
};
}This will help us:
Building Protobuf SourcesThe protobuf files in PrerequisitesEnsure Generate JavaScript codenpx pbjs -t static-module -w es6 -o packages/shared/src/fs-cache/proto/fs-cache.js packages/shared/src/fs-cache/proto/fs-cache.protoGenerate TypeScript definitionsnpx pbts -o packages/shared/src/fs-cache/proto/fs-cache.d.ts packages/shared/src/fs-cache/proto/fs-cache.jsBoth commands in sequencenpx pbjs -t static-module -w es6 -o packages/shared/src/fs-cache/proto/fs-cache.js packages/shared/src/fs-cache/proto/fs-cache.proto && \
npx pbts -o packages/shared/src/fs-cache/proto/fs-cache.d.ts packages/shared/src/fs-cache/proto/fs-cache.jsNote: The order matters - Consider adding an npm script in {
"scripts": {
"generate:fs-cache-proto": "pbjs -t static-module -w es6 -o src/fs-cache/proto/fs-cache.js src/fs-cache/proto/fs-cache.proto && pbts -o src/fs-cache/proto/fs-cache.d.ts src/fs-cache/proto/fs-cache.js"
}
} |
FS Loader Hook for ESM CompatibilityAdded a Node.js loader hook at Why We Need ItIn ESM, monkey-patching import fs from 'fs';
fs.readFileSync = patched; // ✅ Works
import { readFileSync } from 'fs';
readFileSync('./file.txt'); // ❌ Still uses original!
import * as fs from 'fs';
fs.readFileSync('./file.txt'); // ❌ Still uses original!This is because ESM named imports bind directly to the module's export at import time (immutable binding), not to a property on an object. When We Need It
For the FS cache use case: If we only need to intercept TypeScript's fs calls, we don't need the loader because TypeScript uses The loader is only needed if SonarJS's own code or ESM dependencies use named imports. How It WorksThe loader intercepts // What the loader generates:
import { createRequire } from 'node:module';
const require = createRequire('<loader-url>');
const fs = require('fs'); // Bypasses ESM loader, gets real fs
export function readFileSync(...args) { return fs.readFileSync(...args); }
export function existsSync(...args) { return fs.existsSync(...args); }
// ... all other fs functions
export default fs;Why this works: The wrapper functions are ESM exports (immutable), but their bodies perform a dynamic property lookup on the mutable Usagenode --import ./packages/shared/src/fs-loader/register.mjs your-app.mjsThen patch Files
|
Today's Changes (Dec 18)1. Dynamic FS Operation Tracking & Cache Control
2. File Descriptor CachingAdded caching for
3. Portable Cache with Relative Paths
4. TypeScript CompilerHost BlockingPrevents TypeScript from walking up parent directories looking for
private isAllowedPath(filePath: string): boolean {
return filePath.startsWith(this.baseDirPrefix) || filePath.startsWith(this.tsLibDirPrefix);
} |
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: zglicz <michal.zgliczynski@sonarsource.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
e85b35b to
1de976f
Compare
FS Cache Caller Tracking Study - FindingsI conducted a study to understand which npm dependencies are responsible for filesystem operations during analysis. Here are the comprehensive findings: FS Access Distribution (16,511 total tracked ops):| Dependency | Total Ops | Disk | Cache | Hit% | Top Operations | |-----------------------|-----------|-------|--------|--------|--------------------------------------------------------| Now the totals match:
Key insights:
We don't use resolve directly. It's a transitive dependency: 📦 Resolve Callers (15609 total calls): eslint-import-resolver-node 15609 (100.0%)
So eslint-plugin-react doesn't use the resolve package at all for module resolution - only eslint-plugin-import does. Key Findings
Performance Impact of Stats TrackingThe stats tracking itself has ~3% overhead due to stack trace analysis on every fs call:
Changes Made
Usage# Default - no stats overhead
npx tsx --test packages/ruling/projects/Joust.ruling.test.ts
# With detailed stats (for debugging)
FS_CACHE_STATS=1 npx tsx --test packages/ruling/projects/Joust.ruling.test.tsConclusions
|
Implement a global filesystem cache that intercepts all fs operations, including those from dependencies like TypeScript and ESLint. Key features: - Unified tree structure (FsNode) instead of separate maps per operation - Negative caching - caches "file doesn't exist" results - Per-project cache isolation with disk persistence (protobuf) - Full opendir/opendirSync support with cached Dir implementation - Reuses @types/node types where possible Cached operations: readFile, readdir, opendir, stat, lstat, exists, realpath, realpathSync.native, access (both sync and async variants) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Track all fs operations dynamically (cached and uncached) - Add caller tracking per operation with Map-based stats - Add FS_CACHE env variable for cache control: - FS_CACHE=0: disable caching (baseline mode) - FS_CACHE=cold: cold start (no cache load) - default: warm start with cache loading - Add detailed timing breakdown for cache load/save - Update testProject.ts with comprehensive stats display 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Implement openSync, readSync, closeSync, fstatSync caching - Use real kernel fds when files exist (TypeScript compatibility) - Fall back to fake fds for offline mode when files don't exist - Track fd -> path mapping for serving cached content - Add comparison/debug modes for troubleshooting 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Only cache files/directories under project baseDir - Files outside baseDir pass through to real filesystem - Store paths relative to baseDir for cache portability - Cache size reduced ~82% (only project files, not TypeScript libs) - Add isUnderBaseDir() and toRelativeCachePath() utilities 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Restrict TypeScript's filesystem access to project directory and TS lib files to prevent unnecessary parent directory walks during module resolution. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Add STATS_ENABLED flag controlled by FS_CACHE_STATS=1 env var - Wrap all recordMiss/recordExternal/recordUncached/recordHit calls with STATS_ENABLED conditional to skip stack trace analysis - Make printCacheStats output conditional on STATS_ENABLED - Normalize baseDir to Unix-style path in ProjectFsCache This improves performance by ~3% by avoiding stack trace analysis on every fs call when detailed stats are not needed. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2fea59e to
544c257
Compare
80f82ed to
23da93c
Compare
Summary
This PR introduces a new filesystem cache module (
packages/shared/src/fs-cache/) that intercepts ALL filesystem operations, including those from dependencies like TypeScript and ESLint. The cache uses a unified tree structure for efficient storage and supports disk persistence via protobuf serialization.Key Features
fsmodule to cache all filesystem operationsFsNodedata structure represents all cached path informationDesign Decisions Explored
1. Unified Tree vs Separate Maps
Initially considered separate
Maps for each operation type (files, directories, stats, etc.). Chose unifiedFsNodestructure because:2. Cache Lookup Result Pattern
Implemented
CacheLookupResult<T>to distinguish three states:undefinedreturned: path not in cache (cache miss){ exists: false }: cached knowledge that path doesn't exist{ exists: true, value: T }: cached data for existing path3. opendir/opendirSync Caching
fs.opendirreturns aDirobject (async iterator). ImplementedCachedDirclass that:Dirinterface includingSymbol.asyncDispose/Symbol.dispose4. Type Reuse
Reuses
@types/nodetypes where possible viaPick<Stats, ...>instead of maintaining custom type definitions.Files Created
cache-types.tsFsNode,FsNodeStat,CacheStats)cache-utils.tsproject-cache.tscache-manager.tsfs-patch.tsCachedDirclassindex.tsproto/fs-cache.protoproto/fs-cache.jsproto/fs-cache.d.tsCached Operations
readFileSync/readFilereaddirSync/readdir(withwithFileTypessupport)statSync/statlstatSync/lstatexistsSyncrealpathSync/realpathaccessSync/accessopendirSync/opendirNext Steps for Integration
Initialize at startup: Call
initFsCache()when gRPC/HTTP server startsSet active project before analysis: Call
setActiveProject()with project infoLoad cache from previous run (optional):
Save cache after analysis:
Add cache invalidation: Implement file watcher or use Sonar's file change detection to call
invalidatePath()when files changeAdd tests: Unit tests for cache operations, integration tests for fs patching
Performance metrics: Use
getFsCacheStats()to measure cache effectiveness🤖 Generated with Claude Code
Co-Authored-By: Claude noreply@anthropic.com