Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,8 +48,8 @@ cargo clippy --all-features -- -D warnings
### Language support

- v0.1: TypeScript, JavaScript, Java, Python, Go, C#
- v0.2: Kotlin (Spring, Ktor)
- planned: Ruby, PHP
- v0.2: Kotlin (Spring, Ktor), Ruby (Rails, Pundit, CanCanCan, Devise)
- planned: PHP

## Conventional Commits & Versioning

Expand Down
11 changes: 11 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ tree-sitter-python = "0.25"
tree-sitter-go = "0.25"
tree-sitter-c-sharp = "0.23"
tree-sitter-kotlin-ng = "1.1"
tree-sitter-ruby = "0.23"
ignore = "0.4"
sha2 = "0.11"
regex = "1"
Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

Sift through your codebase for embedded authorization logic. Extract it into Policy as Code (PaC) — [Rego](https://www.openpolicyagent.org/docs/latest/policy-language/) for [OPA](https://www.openpolicyagent.org/), or [Cedar](https://www.cedarpolicy.com/) for [AWS Verified Permissions](https://aws.amazon.com/verified-permissions/), Arbiter, and other Cedar-compatible engines.

> **Status:** v0.2 — structural scanning ready for TypeScript, JavaScript, Java, Python, Go, and C#. `--deep` (LLM-assisted) mode functional via any OpenAI-compatible endpoint or MCP-capable agent host.
> **Status:** v0.2 — structural scanning ready for TypeScript, JavaScript, Java, Python, Go, C#, Kotlin, and Ruby. `--deep` (LLM-assisted) mode functional via any OpenAI-compatible endpoint or MCP-capable agent host.

## What is zift?

Expand All @@ -27,7 +27,7 @@ zift report . # detailed findings report

1. **Structural scan** (tree-sitter) — fast, deterministic, zero-cost. Finds known authorization patterns: role checks, permission guards, auth middleware, security annotations.

2. **Semantic scan** (`--deep`, opt-in) — sends candidate code regions to an LLM that classifies authorization logic the structural pass missed or misjudged. Useful for business rules that implicitly encode access control, and for languages where structural support hasn't shipped yet (Ruby, PHP, etc.).
2. **Semantic scan** (`--deep`, opt-in) — sends candidate code regions to an LLM that classifies authorization logic the structural pass missed or misjudged. Useful for business rules that implicitly encode access control, and for languages where structural support hasn't shipped yet (PHP, etc.).

## Supported languages

Expand All @@ -39,7 +39,7 @@ zift report . # detailed findings report
| Go | yes (v0.1) | yes (v0.1) | Gin, Echo |
| C# | yes (v0.2) | yes (v0.1) | ASP.NET Core |
| Kotlin | yes (v0.2) | yes (v0.1) | Spring (Kotlin), Ktor |
| Ruby | planned (v0.2) | yes (v0.1) | Rails |
| Ruby | yes (v0.2) | yes (v0.1) | Rails, Pundit, CanCanCan, Devise |
| PHP | planned (v0.2) | yes (v0.1) | Laravel |

Deep mode walks the full source tree by extension and detects auth-y function names with regex — so it produces useful results in any language well before structural support lands.
Expand Down
4 changes: 2 additions & 2 deletions docs/DESIGN.md
Original file line number Diff line number Diff line change
Expand Up @@ -200,7 +200,7 @@ allow if {

## Language support

These priorities describe release milestones. C# shipped in the v0.2 milestone; Kotlin (Spring + Ktor) followed in v0.2.x.
These priorities describe release milestones. C# shipped in the v0.2 milestone; Kotlin (Spring + Ktor) and Ruby (Rails + Pundit + CanCanCan + Devise) followed in v0.2.x.

### Priority 1 (v0.1)

Expand All @@ -218,12 +218,12 @@ These priorities describe release milestones. C# shipped in the v0.2 milestone;
| Go | Custom middleware, Casbin, chi/gorilla middleware chains, `if claims.Role` |
| C# | ASP.NET Core `[Authorize]`, policy-based authorization, `ClaimsPrincipal` checks |
| Kotlin | Spring Security (same patterns as Java), Ktor `install(Authentication)` + `authenticate { ... }`, idiomatic role checks |
| Ruby | Pundit (`authorize @post`, `policy(...).<action>?`, `*Policy < ApplicationPolicy`), CanCanCan (`authorize!`, `can :read, Article`, `user.can?(:action)`), Devise / Rails (`before_action :authenticate_user!`, `current_user.admin?`), idiomatic role checks |

### Priority 3 (v0.3)

| Language | Key frameworks / patterns |
|----------|--------------------------|
| Ruby | Pundit, CanCanCan, Devise, `before_action` guards |
| PHP | Laravel Gates/Policies, Symfony Voters |

### Adding a new language
Expand Down
1 change: 1 addition & 0 deletions docs/corpus/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ We are **not** shipping policies for these projects. The runs exist to stress-te
| Go | [go-gitea/gitea](https://github.com/go-gitea/gitea) | 18 | 23 (perm subset) | Deep surfaces the entire `IsAdmin`/`IsOwner`/`Has*` family the structural pass missed; one predicate widening on `go-has-role-call` closes most of the gap. See [go.md](go.md). |
| C# | [bitwarden/server](https://github.com/bitwarden/server) | 318 | 88 (AdminConsole subset) | ASP.NET Core resource authorization dominates structurally; deep surfaces generic `[Authorize<TRequirement>]`, ownership checks, and helper gates. See [csharp.md](csharp.md). |
| Kotlin | [ktorio/ktor-samples](https://github.com/ktorio/ktor-samples) | 13 | — | Ktor `install(Authentication)` + named `authenticate(...) { ... }` route guards account for every finding; Spring-Kotlin rules need a separate corpus target to calibrate. See [kotlin.md](kotlin.md). |
| Ruby | [discourse/discourse](https://github.com/discourse/discourse) | 339 | — | Rails `before_action` filters (180) and `current_user.<role>?` predicates (159) carry every finding; Discourse's `Guardian` call sites are the gap. Pundit/CanCanCan rules need a Pundit-flavored target to calibrate. See [ruby.md](ruby.md). |

> The "deep" column is intentionally a **scoped subset** rather than the whole repo — running deep against 5,000+ files per language is neither cheap nor necessary to surface gaps. Each per-language doc explains the subset and why.
>
Expand Down
117 changes: 117 additions & 0 deletions docs/corpus/ruby.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
# Ruby — Discourse

Real-world results from running Zift against [discourse/discourse](https://github.com/discourse/discourse), the open-source forum platform.

## Why this target

Discourse is the largest mature open-source Rails app in active use. It carries 10K+ Ruby files, deep controller hierarchies, and a hand-rolled `Guardian` authz layer that wraps Rails' `before_action` filter mechanism — so it exercises the Rails-side rules (`before_action`, `skip_before_action`) at scale and the Devise-style role-predicate rule (`current_user.staff?`, `current_user.admin?`) on a real privilege hierarchy. It does **not** use Pundit or CanCanCan, which makes it a useful contrast target: it confirms the Pundit/CanCanCan rules don't false-positive on Rails-flavored idioms that happen to share keywords.

## Target metadata

| | |
|---|---|
| Repo | [discourse/discourse](https://github.com/discourse/discourse) |
| Commit | `947c99ef` |
| Ruby files | 10,117 |
| LOC (`.rb`) | 1,103,264 |
| Externalized PaC | None observed |
| Zift version | 0.2.2 |

## Structural pass

```bash
zift scan ~/zift-corpus/ruby/discourse --language ruby --format json -o structural.json
```

| | |
|---|---|
| Wall time | 10.7s |
| Peak RSS | ~34 MB |
| Total findings | **339** |
| Files with findings | 172 |
| Externalized % | 0% (no policy-import enforcement points emitted) |
| Files skipped (policy-impl path) | 63 |

**Findings per rule**

| Rule | Count |
|------|------:|
| `ruby-rails-before-action-filter` | 180 |
| `ruby-current-user-role-predicate` | 159 |

**Findings per category**

| Category | Count |
|----------|------:|
| `middleware` | 180 |
| `rbac` | 159 |

**Top filter names (from `before_action` / `skip_before_action`)**

| Symbol | Count |
|--------|------:|
| `:check_xhr` | 74 |
| `:verify_authenticity_token` | 29 |
| `:ensure_logged_in` | 24 |
| `:ensure_admin` | 10 |
| `:ensure_staff` | 4 |
| `:check_permissions` | 2 |
| `:ensure_can_see` | 2 |

**Top role predicates (from `current_user.<role>?`)**

| Predicate | Approximate share |
|-----------|------:|
| `current_user.staff?` | majority |
| `current_user.admin?` | second |
| `current_user&.staff?` (safe-navigation) | also matched |

**Top findings (sample)**

| File | Line | Snippet |
|------|-----:|---------|
| `app/controllers/admin/admin_controller.rb` | 5 | `before_action :ensure_admin` |
| `app/controllers/admin/dashboard_controller.rb` | 13 | `current_user.admin?` |
| `app/controllers/application_controller.rb` | 758 | `current_user.staff?` |
| `app/controllers/application_controller.rb` | 762 | `current_user.admin?` |
| `app/controllers/groups_controller.rb` | 45 | `current_user&.staff?` |
| `app/controllers/admin/impersonate_controller.rb` | 4 | `skip_before_action :ensure_admin, only: :destroy` |
| `app/controllers/about_controller.rb` | 6 | `skip_before_action :check_xhr, only: [:index]` |

The two rules that did fire each surface a legitimate authz surface. `before_action` carries the privilege-gate logic for every admin and staff-only controller; the role-predicate rule catches Discourse's `current_user.staff?`-centric privilege model. Both rule categories show the expected high-confidence finding density on `app/controllers/admin/*` and `app/controllers/application_controller.rb` — Discourse's actual authz surface.

## Zero-coverage rules (intentional)

Seven Ruby rules contribute zero findings on Discourse — and that's correct:

| Rule | Why zero is expected |
|------|---------------------|
| `ruby-pundit-authorize` | Discourse doesn't use Pundit; it has its own `Guardian` |
| `ruby-pundit-policy-method` | Same |
| `ruby-pundit-policy-class` | Same — no `*Policy < ApplicationPolicy` shape in tree |
| `ruby-cancancan-can-declaration` | No CanCanCan in Discourse |
| `ruby-cancancan-can-check` | Same |
| `ruby-role-equals-check` | Discourse never spells RBAC as `user.role == "admin"` |
| `ruby-role-collection-include` | Privileges go through `Guardian`, not collection `.include?` |

The Pundit/CanCanCan rules are exercised by the inline rule tests (`cargo run -- rules test`) and need a Pundit-flavored or CanCanCan-flavored corpus target — a typical Rails-Pundit app — for end-to-end calibration. The Pundit sample suite (`varvet/pundit` test fixtures) lights up `ruby-pundit-policy-class` cleanly (18 findings, all in `spec/support/policies/`), confirming the class-declaration shape works as designed.

## Gaps & follow-ups

**FP risk: medium on framework filters.** `check_xhr` (74 hits) and `verify_authenticity_token` (29 hits) match `ruby-rails-before-action-filter` because the rule's filter-name regex includes `check_` and `verify_` prefixes. These are real Rails framework filters — `check_xhr` enforces XHR-only access on Discourse controllers; `verify_authenticity_token` is Rails' CSRF token check. Both *are* preconditions enforced via the controller filter mechanism, but a strict authz-vs-input-validation reader would dismiss them. The current breadth is deliberate (avoiding false negatives on hand-rolled `check_can_see`, `verify_user_active` etc.), but a narrower variant or a follow-up `not_match` predicate is worth filing once we see a second corpus target's filter inventory.

**Policy-implementation path skip is broader than intended.** The scanner's `is_policy_implementation_path` heuristic — shared with Java/Go/Python — silently skips files under any directory whose name contains a policy keyword. On Discourse that drops 63 files, including:

- 20 files under `plugins/discourse-policy/**` — a community plugin for *forum-style* policy acceptance posts, **not** an authz policy engine. These would otherwise be in scope for the consumer-side scan.
- ~6 files under `app/services/*/policy/*.rb` — Discourse's [`Service`](https://github.com/discourse/discourse/blob/main/lib/service/base.rb) result-pattern preconditions (`SettingsAreConfigurable`, `NotAlreadySilenced`). These are validators, not policy engines, but the path keyword causes the bypass.
- 4 files under `lib/content_security_policy/` and 1 under `spec/lib/content_security_policy/` — CSP header generation, not authz.

This is a known limitation of the shared keyword-bypass heuristic and a candidate for a Ruby-aware refinement: drop the bypass when the directory contains application code shapes (controllers, models) rather than implementation-side primitives. Tracked as a future rule-engine tweak.

**FN: Guardian call-site coverage.** Discourse's privilege checks largely go through `guardian.can_*?(...)` and `guardian.is_*?(...)` predicates (e.g. `guardian.can_see?(@topic)`, `guardian.is_admin?`). The current ruleset doesn't have a `guardian.*?` rule because Guardian is Discourse-specific — but the shape generalizes to any project that hand-rolls its own predicate-style authz wrapper. A follow-up rule targeting the `<receiver>.can_*?` / `<receiver>.is_*?` shape (gated by name regex to avoid `record.is_persisted?`-style noise) would convert a substantial chunk of Discourse's actual authz decisions to structural findings.

**FP risk: low on role predicates.** Every `current_user.<role>?` match here is a real RBAC decision. Discourse's privilege hierarchy (`staff` ⊃ `admin` ⊂ `moderator`) is exactly the kind of predicate-method idiom the rule was built for.

## Deep pass

Not run for this target. The structural pass yields a tight, high-signal slice of the privileged routes and role gates. Deep would primarily add value on the Guardian call sites flagged above (`guardian.can_*?`) — those are the gap to close before a deep pass adds new information.
82 changes: 82 additions & 0 deletions rules/ruby/cancancan-can-check.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
[rule]
id = "ruby-cancancan-can-check"
languages = ["ruby"]
category = "rbac"
confidence = "high"
description = "CanCanCan ability predicate: user.can?(:action, resource) (Ruby)"
# The call-site check for CanCanCan and the related `user.can?(:read, Post)` /
# `cannot?` predicates. Requires the question-mark predicate form to avoid
# colliding with the `can :read, ...` declaration rule. The receiver is pinned
# to principal-shaped identifiers (`current_user`, `user`, `@account`, etc.)
# so unrelated domain objects that happen to expose a `.can?` predicate don't
# fire — `widget.can?(:render)` is not authz.
query = """
(call
receiver: [
(identifier) @receiver
(instance_variable) @receiver
]
method: (identifier) @method_name
arguments: (argument_list
(simple_symbol) @action)
) @match
"""

[rule.predicates.method_name]
match = "^(can\\?|cannot\\?)$"

[rule.predicates.receiver]
match = "(?i)^@?(current_user|user|account|member|principal|viewer|actor)$"

[rule.rego_template]
template = """
default allow := false

allow if {
input.action == "{{action}}"
}
"""


[rule.cedar_template]
template = """
permit (
principal,
action,
resource
)
when {
principal.role == "TODO"
};
"""
[[rule.tests]]
input = """
def edit_link
link_to 'Edit', edit_post_path(@post) if current_user.can?(:update, @post)
end
"""
expect_match = true

[[rule.tests]]
input = """
def guard
return unless current_user.cannot?(:delete, @account)
end
"""
expect_match = true

[[rule.tests]]
input = """
def list
user.posts.each { |p| render p }
end
"""
expect_match = false

[[rule.tests]]
input = """
def render_widget
return unless widget.can?(:render)
end
"""
expect_match = false
84 changes: 84 additions & 0 deletions rules/ruby/cancancan-can-declaration.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
[rule]
id = "ruby-cancancan-can-declaration"
languages = ["ruby"]
category = "rbac"
confidence = "high"
description = "CanCanCan ability declaration: can :action, Model (Ruby)"
# CanCanCan abilities are declared in `Ability#initialize` as
# `can :action, Resource` (or `:manage` for any action, `:all` for any
# resource). The method has no receiver — it's invoked from `Ability`'s `self`
# scope. We capture the action symbol and the resource constant. Symbol-only
# args also catch `cannot :destroy, Account`-style guards.
query = """
(call
!receiver
method: (identifier) @method_name
arguments: (argument_list
(simple_symbol) @action
.
[
(constant) @resource
(call) @resource
(identifier) @resource
])
) @match
"""

[rule.predicates.method_name]
match = "^(can|cannot)$"

[rule.rego_template]
template = """
default allow := false

# CanCanCan ability: {{method_name}} {{action}} on {{resource}}.
allow if {
input.action == "{{action}}"
input.resource.type == "{{resource}}"
}
"""


[rule.cedar_template]
template = """
permit (
principal,
action,
resource
)
when {
principal.role == "TODO"
};
"""
[[rule.tests]]
input = """
class Ability
include CanCan::Ability
def initialize(user)
can :read, Article
can :manage, Comment
end
end
"""
expect_match = true

[[rule.tests]]
input = """
class Ability
include CanCan::Ability
def initialize(user)
cannot :destroy, Account if user.guest?
end
end
"""
expect_match = true

[[rule.tests]]
input = """
class Ability
def initialize(user)
enable :feature_flags
end
end
"""
expect_match = false
Loading