Freshwater Referenced Spatial Hydrology. A composable stream network modelling engine. Query and extract stream networks, classify habitat by gradient and channel width, segment networks at barriers and break points, aggregate features upstream or downstream, and run multi-species habitat modelling with parallel workers.
Repository: NewGraphEnvironment/fresh Primary Language: R (package) Version: 0.24.1 License: MIT
| Package | Role |
|---|---|
| fresh | Stream network modelling engine (this package) — segment networks, classify habitat, cluster, aggregate |
| link | Feature-to-network interpretation — load and validate override CSVs, score and prioritize crossings, build per-species barrier skip lists, orchestrate bcfishpass-reproducing six-phase pipelines via lnk_pipeline_*() + lnk_config() |
| flooded | Delineate floodplain extents from DEMs and stream networks |
| drift | Track land cover change within floodplains over time |
| fly | Estimate airphoto footprints and select optimal coverage for a study area |
| diggs | Interactive explorer for fly airphoto selections (Shiny app) |
Pipelines:
- Fish habitat / connectivity: link → fresh. link's
lnk_pipeline_*()helpers interpret features (crossings, observations, falls, user-definite barriers, habitat confirmations) and produce thebreak_sourcesandbarrier_overridestable that feedfrs_habitat_classify(). Without link, fresh still runs on any break sources you construct yourself. - Land cover change: fresh (network) → flooded (floodplains) → drift (land cover change).
Legend: [link] = building block used directly by link (called by lnk_pipeline_*() rather than via frs_habitat()).
R/
fresh-package.R — package-level doc, imports
frs_db_conn.R — DB connection via PG_*_SHARE env vars
frs_db_query.R — execute SQL, return sf
frs_habitat.R — orchestrator: multi-WSG/AOI habitat pipeline
frs_habitat_classify.R — long-format habitat classification per species [link]
frs_network_segment.R — domain-agnostic network segmentation
frs_feature_find.R — locate point features on network
frs_feature_index.R — upstream/downstream relationship indexing
frs_break.R — gradient break detection (island method);
exports frs_break_find/frs_break_apply/
frs_break_validate [link]
frs_barriers_minimal.R — per-flowpath minimal barrier reduction [link]
frs_classify.R — segment classification by attributes/breaks
frs_categorize.R — bin continuous attributes to categorical classes
frs_cluster.R — connectivity clustering (rearing/spawning linkage) [link]
frs_extract.R — extract streams to working table
frs_clip.R — clip a table to an AOI
frs_aggregate.R — upstream/downstream feature aggregation [link]
frs_col_join.R — join attributes (channel width, discharge) [link]
frs_col_generate.R — recompute gradient from geometry [link]
frs_network.R — unified multi-table network traversal
frs_network_upstream.R — upstream network query (ltree)
frs_network_downstream.R — downstream network query (ltree)
frs_network_prune.R — prune network to connected-waterbody constraints
frs_waterbody_network.R — build the waterbody-only network subset
frs_watershed_at_measure.R — watershed polygon at a blue-line position
frs_watershed_split.R — split a watershed polygon at a position
frs_point_snap.R — snap points to nearest stream
frs_point_locate.R — locate point on stream network
frs_stream_fetch.R — fetch stream segments
frs_lake_fetch.R — fetch lakes
frs_wetland_fetch.R — fetch wetlands
frs_order_filter.R — filter on stream order / order_parent
frs_params.R — load species habitat parameters
frs_wsg_species.R — species presence per watershed group
frs_edge_types.R — FWA edge type lookup
utils.R — internal helpers
tests/testthat/ — unit tests
inst/extdata/ — bundled CSVs (falls, crossings, params, species)
data-raw/ — data refresh scripts
docker/ — local fwapg Docker setup
vignettes/ — .Rmd.orig source, .Rmd pre-knitted output
- All exported functions prefixed
frs_, named noun-first:frs_network_segment(),frs_habitat_classify(),frs_feature_find() - Two layers:
- Generic primitives (domain-neutral):
frs_classify(),frs_break_find(),frs_break_apply(),frs_network_segment(),frs_barriers_minimal(),frs_col_generate(),frs_col_join(),frs_aggregate(),frs_cluster(). Classify on any attribute into any label column, break at any position, cluster by any rule. No assumptions about what's being modelled. - Fish-habitat wrappers (domain-coupled):
frs_habitat_classify(),frs_habitat(),frs_params(). Output schema is fixed:accessible / spawning / rearing / lake_rearingbooleans perspecies_code. Parameters YAML is species-keyed withspawn / rear / spawn_connectedrule sections. For non-fish domains, compose the primitives on your own output schema rather than shoehorning through these wrappers.
- Generic primitives (domain-neutral):
frs_habitat()is the fish-workflow orchestrator — wrapsfrs_network_segment()+frs_habitat_classify()with mirai parallel workersfrs_network_segment()is domain-agnostic — segments any network at any break pointsfrs_habitat_classify()produces long-format output: one row per segment × species, joined to geometry viaid_segmentbreak_sourceswithlabel,label_col,label_mapfor flexible network-referenced classificationlabel_blockcontrols which labels restrict access (default"blocked", configurable — pass your own domain labels)gate = FALSEskips accessibility for raw habitat potential- Direct SQL via DBI/RPostgres against fwapg PostgreSQL
- Returns sf objects for spatial results
- Requires PostgreSQL with fwapg (bcfishpass/bcfishobs optional, per-function)
- Vignette data cached in
inst/extdata/withupdate_gisYAML param (FALSE = cached, TRUE = live DB)
- Runtime: DBI, RPostgres, sf
- Suggests: mirai, tmap (>= 4.0), gq, bookdown, knitr, rmarkdown
- Database: PostgreSQL with fwapg (local Docker or remote tunnel)
- Connection: Local Docker on port 5432 or SSH tunnel on 63333; see db-newgraph skill
Integration tests use frs_db_conn() which reads PG_*_SHARE env vars
(PG_HOST_SHARE, PG_PORT_SHARE, PG_USER_SHARE, PG_PASS_SHARE,
PG_DB_SHARE). On dev machines those are commonly wired in ~/.Renviron
to point at the SSH tunnel (localhost:63333) for the bcfishpass-shared
DB. R's user ~/.Renviron overrides shell env, so a shell-level
export PG_PORT_SHARE=5432 does NOT change what R sees.
To run fresh tests against a host's local Docker fwapg (port 5432) instead — useful when the tunnel is down, when the local box has a byte-identical fwapg, or when offloading test runs from a busy machine to a parallel host (e.g. m4 ↔ m1 on Tailscale):
# Inside R, before any frs_db_conn() call:
Sys.setenv(PG_HOST_SHARE = "localhost",
PG_PORT_SHARE = "5432",
PG_USER_SHARE = "postgres",
PG_PASS_SHARE = "postgres",
PG_DB_SHARE = "fwapg")
testthat::set_max_fails(Inf)
devtools::test()Or via Rscript --no-environ to skip user ~/.Renviron entirely
(then shell-level export works).
Note: the local Docker fwapg typically has only the fwapg
database / whse_basemapping schema. Tests that read the bcfishpass
schema (bcfishpass.parameters_habitat_thresholds, etc.) will fail
without the tunnel — those need the shared DB. Most habitat-pipeline
tests use only whse_basemapping and run fine off-tunnel.
Cross-host run pattern (m4 → m1 over Tailscale, e.g. when m4 is busy):
# 1. Push the branch from m4 (the dev box)
git push -u origin <branch>
# 2. On m1: fetch + checkout + reinstall
ssh m1 'cd /Users/airvine/Projects/repo/fresh && \
git fetch origin && git checkout <branch> && \
Rscript -e "pak::pak(\"NewGraphEnvironment/fresh@<branch>\", upgrade = FALSE, ask = FALSE)"'
# 3. On m1: run tests with local-fwapg env override
ssh m1 'cd /Users/airvine/Projects/repo/fresh && Rscript -e "
Sys.setenv(PG_PORT_SHARE = \"5432\", PG_USER_SHARE = \"postgres\",
PG_PASS_SHARE = \"postgres\", PG_DB_SHARE = \"fwapg\",
PG_HOST_SHARE = \"localhost\")
testthat::set_max_fails(Inf); devtools::test()
" > /tmp/m1_fresh_test.log 2>&1 &'id_*prefix for fresh-generated identifiers:id_segment(sub-segment after breaking)- FWA columns kept as-is:
linear_feature_id,blue_line_key,downstream_route_measure,wscode_ltree,localcode_ltree - Habitat classification columns are generic (not species-prefixed):
accessible,spawning,rearing,lake_rearing. Species is a row value (species_code), not a column name.
to,from— complete DB table names (schema-qualified)to_prefix— table name prefix, suffixed by the function (e.g."fresh.streams"→fresh.streams_co)to_streams,to_habitat— explicit output table names infrs_habitat()col_*— column name mappings for flexible table schemas:col_blk,col_measurebreak_sources— list of break source specs withtable,where,label,label_col,label_map,col_blk,col_measure
frs_noun_verbpattern:frs_network_segment,frs_point_snap,frs_stream_fetch- Pipeline orchestrators wrap exported building blocks:
frs_habitat()wrapsfrs_network_segment()+frs_habitat_classify()frs_network_segment()wrapsfrs_extract()+frs_break_find()+frs_break_apply()frs_habitat_classify()wrapsfrs_classify()per species
{to_streams}— one row per segment, geometry + base attributes +id_segment{to_habitat}— long format, one row per segment × species. Columns:id_segment,species_code,accessible,spawning,rearing,lake_rearing. No geometry.- Views per species:
{to_streams}_co_vw= join streams + habitat WHERE species_code = 'CO'
Use the gq package for all shared layer symbology. Never hardcode hex color values when a registry style exists.
library(gq)
reg <- gq_reg_main() # load once per script — 51+ layersCore pattern: reg$layers$lake, reg$layers$road, reg$layers$bec_zone, etc.
| Target | Simple layer | Classified layer |
|---|---|---|
| tmap | gq_tmap_style(layer) → do.call(tm_polygons, ...) |
gq_tmap_classes(layer) → field, values, labels |
| mapgl | gq_mapgl_style(layer) → paint properties |
gq_mapgl_classes(layer) → match expression |
For project-specific layers not in the main registry, use a hand-curated CSV and merge:
reg <- gq_reg_merge(gq_reg_main(), gq_reg_read_csv("path/to/custom.csv"))Install: pak::pak("NewGraphEnvironment/gq")
| Output | Tool | When |
|---|---|---|
| PDF / print figures | tmap v4 |
Bookdown PDF, static reports |
| Interactive HTML | mapgl (MapLibre GL) |
Bookdown gitbook, memos, web pages |
| QGIS project | Native QML | Field work, Mergin Maps |
sf_use_s2(FALSE)at top of every mapping script- Compute area BEFORE simplify in SQL
- No map title — title belongs in the report caption
- Legend over least-important terrain — swap legend and logo sides when it reduces AOI occlusion. No fixed convention for which side.
- Four-corner rule — legend, logo, scale bar, keymap each get their own corner. Never stack two in the same quadrant.
- Bbox must match canvas aspect ratio — compute the ratio from geographic extents and page dimensions. Mismatch causes white space bands.
- Consistent element-to-frame spacing — all inset elements should have visually equal margins from the frame edge
- Map fills to frame — basemap extends edge-to-edge, no dead bands. Use near-zero
inner.marginsandouter.margins. - Suppress auto-legends — build manual ones from registry values
- ALL CAPS labels appear larger — use title case for legend labels (gq
gq_tmap_classes()handles this automatically viato_title()fallback)
Read the PNG and check before showing anyone:
- Correct polygon/study area shown? (verify source data, not just the bbox)
- Map fills the page? (no white/black bands)
- Keymap inside frame with spacing from edge?
- No element overlap? (each in its own corner)
- Legend over least-important terrain?
- Consistent spacing across all elements?
- Scale bar breaks appropriate for extent?
See the cartography skill for full reference: basemap blending, BC spatial data queries, label hierarchy, mapgl gotchas, and worked examples.
Use drift and flooded together for riparian land cover change analysis. flooded delineates floodplain extents from DEMs and stream networks; drift tracks what's changing inside them over time.
Pipeline:
# 1. Delineate floodplain AOI (flooded)
valleys <- flooded::fl_valley_confine(dem, streams)
# 2. Fetch, classify, summarize (drift)
rasters <- drift::dft_stac_fetch(aoi, source = "io-lulc", years = c(2017, 2020, 2023))
classified <- drift::dft_rast_classify(rasters, source = "io-lulc")
summary <- drift::dft_rast_summarize(classified, unit = "ha")
# 3. Interactive map with layer toggle
drift::dft_map_interactive(classified, aoi = aoi)- Class colors come from drift's shipped class tables (IO LULC, ESA WorldCover)
- For production COGs on S3,
dft_map_interactive()serves tiles via titiler — setoptions(drift.titiler_url = "...") - See the drift vignette for a worked example (Neexdzii Kwa floodplain, 2017-2023)
Structured checklist for reviewing diffs before commit. Used by /code-check.
Add new checks here when a bug class is discovered — they compound over time.
- Variables in double-quoted strings containing single quotes break if value has
' "echo '${VAR}'"— if VAR contains', shell syntax breaks- Use
printf '%s\n' "$VAR" | commandto pipe values safely - Heredocs: unquoted
<<EOFexpands variables locally,<<'EOF'does not — know which you need
- Hardcoded absolute paths (
/Users/airvine/...) break for other users - Use
REPO_ROOT="$(cd "$(dirname "$0")/<relative>" && pwd)" - After moving scripts, verify
../depth still resolves correctly - Usage comments should match actual script location
|| truehides real errors — is the failure actually safe to ignore?- Empty variable before destructive operation (rm, destroy) — add guard:
[ -n "$VAR" ] || exit 1 grepreturning empty silently — downstream commands get empty input
- Secrets passed as command-line args are visible in
ps aux - Use env files, stdin pipes, or temp files with
chmod 600instead
- Must be pure ASCII — em dashes, curly quotes, arrows cause silent parse failure
- Check with:
perl -ne 'print "$.: $_" if /[^\x00-\x7F]/' file.yaml
cloud-init cleancauses full re-provisioning on next boot — almost never what you want before snapshot- Use
tailscale logoutnottailscale downbefore snapshot (deregister vs disconnect)
- Secrets rendered via
templatefile()are readable at169.254.169.254metadata endpoint - Acceptable for ephemeral machines, document the tradeoff
- Parsing
tofu state showtext output is fragile — usetofu outputinstead - Missing outputs that scripts need — add them to main.tf
- Snapshot/image IDs in tfvars after deleting the snapshot — stale reference
- Validate resource IDs before destroy:
[ -n "$ID" ] || exit 1 tofu destroywithout-targetdestroys everything including reserved IPs- Snapshot ID extraction: use
--resource dropletandgrep -Ffor exact match
.tfvarsmust be gitignored (contains tokens, passwords).tfvars.exampleshould have all variables with empty/placeholder values- Sensitive variables need
sensitive = truein variables.tf
0.0.0.0/0for SSH is world-open — document if intentional- If access is gated by Tailscale, say so explicitly
- Passwords with special chars (
',",$,!) break naive shell quoting printf '%q'escapes values for shell safety- Temp files for secrets: create with
chmod 600, delete after use
- pak stops on first unresolvable package — all subsequent packages are skipped
- Removed CRAN packages (like
leaflet.extras) must move to GitHub source - PPPM binaries may lag a few hours behind new CRAN releases
- Branch pins (
pkg@branch) are not reproducible — document why used - Pinned download URLs (RStudio .deb) go stale — document where to update
- Moving/renaming scripts: update CLAUDE.md, READMEs, usage comments
- New variables: update .tfvars.example
- New workflows: update relevant README
Standards for external communications across New Graph Environment.
compost is the working repo for email drafts, scripts, contact management, and Gmail utilities. These conventions capture the universal principles; compost has the implementation details.
Three levels. Default to casual unless context dictates otherwise.
| Level | When | Style |
|---|---|---|
| Casual | Established working relationships | Professional but warm. Direct, concise. No slang. |
| Very casual | Close collaborators with rapport | Colloquial OK. Light humor. Slang acceptable. |
| Formal | New contacts, senior officials, formal requests | Full sentences, no contractions, state purpose early. |
Collaborative, not directive. Acknowledge their constraints:
- Avoid: "Work these in as makes sense for your lab"
- Better: "If you're able to work these in when it fits your schedule that would be really helpful"
Draft in markdown, convert to HTML at send time via gmailr. See compost for script templates, OAuth setup, and search_gmail.R.
File naming: YYYYMMDD_recipient_topic_draft.md + YYYYMMDD_recipient_topic.R
Key gotchas (documented in detail in compost):
- Gmail strips
<style>blocks — use inline styles for tables gm_create_draft()does NOT supportthread_id— onlygm_send_message()can reply into threads. Drafts land outside the conversation.- Always use
test_modeandcreate_draftvariables for safe workflows
- Never manually type data into tables — generate programmatically from source files
- Link to canonical sources (GitHub repos, public reports) rather than embedding raw data
- Provide both CSV and Excel when sharing tabular data
- Document ID codes — when using compressed IDs (e.g.,
id_lab), include a reference sheet so recipients can decode
- Internal QA info (blanks, control samples, calibration data)
- Internal tracking codes or SRED references
- Draft status or revision history
- Internal project management details
Keep client-facing communications focused on deliverables and technical content.
Al Irvine B.Sc., R.P.Bio.
New Graph Environment Ltd.
Cell: 250-777-1518
Email: al@newgraphenvironment.com
Website: www.newgraphenvironment.com
In HTML emails, use <br> tags between lines.
Behavioral guidelines to reduce common LLM coding mistakes. Merge with project-specific instructions as needed.
Tradeoff: These guidelines bias toward caution over speed. For trivial tasks, use judgment.
Don't assume. Don't hide confusion. Surface tradeoffs.
Before implementing:
- State your assumptions explicitly. If uncertain, ask.
- If multiple interpretations exist, present them - don't pick silently.
- If a simpler approach exists, say so. Push back when warranted.
- If something is unclear, stop. Name what's confusing. Ask.
Minimum code that solves the problem. Nothing speculative.
- No features beyond what was asked.
- No abstractions for single-use code.
- No "flexibility" or "configurability" that wasn't requested.
- No error handling for impossible scenarios.
- If you write 200 lines and it could be 50, rewrite it.
Ask yourself: "Would a senior engineer say this is overcomplicated?" If yes, simplify.
Touch only what you must. Clean up only your own mess.
When editing existing code:
- Don't "improve" adjacent code, comments, or formatting.
- Don't refactor things that aren't broken.
- Match existing style, even if you'd do it differently.
- If you notice unrelated dead code, mention it - don't delete it.
When your changes create orphans:
- Remove imports/variables/functions that YOUR changes made unused.
- Don't remove pre-existing dead code unless asked.
The test: Every changed line should trace directly to the user's request.
Define success criteria. Loop until verified.
Transform tasks into verifiable goals:
- "Add validation" → "Write tests for invalid inputs, then make them pass"
- "Fix the bug" → "Write a test that reproduces it, then make it pass"
- "Refactor X" → "Ensure tests pass before and after"
For multi-step tasks, state a brief plan:
1. [Step] → verify: [check]
2. [Step] → verify: [check]
3. [Step] → verify: [check]
Strong success criteria let you loop independently. Weak criteria ("make it work") require constant clarification.
These guidelines are working if: fewer unnecessary changes in diffs, fewer rewrites due to overcomplication, and clarifying questions come before implementation rather than after mistakes.
Core patterns for professional, efficient workflows across New Graph Environment repositories.
Five repos form the governance and operations layer across all New Graph Environment work:
| Repo | Purpose | Analogy |
|---|---|---|
| compass | Ethics, values, guiding principles | The "why" |
| soul | Standards, skills, conventions for LLM agents | The "how" |
| compost | Communications templates, email workflows, contact management | The "who" |
| rtj (formerly awshak) | Infrastructure as Code, deployment | The "where" |
| gq | Cartographic style management across QGIS, tmap, leaflet, web | The "look" |
Adaptive management: Conventions evolve from real project work, not theory. When a pattern is learned or refined during project work, propagate it back to soul so all projects benefit. The /claude-md-init skill builds each project's CLAUDE.md from soul conventions.
Cross-references: sred-2025-2026 tracks R&D activities across repos. Compost cross-cuts all projects as the centralized communications workflow — email drafts, contact registry, and tone guidelines live there and are copied to individual project communications/ folders as needed.
- Check for duplicates:
gh issue list --state open --search "<keywords>"-- search before creating - Link to SRED: If work involves infrastructure, R&D, tooling, or performance benchmarking, add
Relates to NewGraphEnvironment/sred-2025-2026#N(match by repo name in SRED issue title) - One issue, one concern. Keep focused.
Write issues with clear technical focus:
- Use normal technical language in titles and descriptions
- Focus on the problem and solution approach
- Add tracking links at the end (e.g.,
Relates to Owner/repo#N)
Issue body structure:
## Problem
<what's wrong or missing>
## Proposed Solution
<approach>
Relates to #<local>
Relates to NewGraphEnvironment/sred-2025-2026#<N>The gh issue create command with heredoc syntax fails repeatedly with EOF errors. ALWAYS use --body-file:
cat > /tmp/issue_body.md << 'EOF'
## Problem
...
## Proposed Solution
...
EOF
gh issue create --title "Brief technical title" --body-file /tmp/issue_body.mdDO: Close issues via commit messages. The commit IS the closure and the documentation.
Fix broken DEM path in loading pipeline
Update hardcoded path to use config-driven resolution.
Fixes #20
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
DON'T: Close issues with gh issue close. This breaks the audit trail — there's no linked diff showing what changed.
Fixes #NorCloses #N— auto-closes and links the commit to the issueRelates to #N— partial progress, does not close- Always close issues when work is complete. Don't leave stale open issues.
Write clear, informative commit messages:
Brief description (50 chars or less)
Detailed explanation of changes and impact.
Fixes #<issue> (or Relates to #<issue>)
Relates to NewGraphEnvironment/sred-2025-2026#<N>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When to commit:
- Logical, atomic units of work
- Working state (tests pass)
- Clear description of changes
What to avoid:
- "WIP" or "temp" commits in main branch
- Combining unrelated changes
- Vague messages like "fixes" or "updates"
Rules learned from real project sessions. These apply across all repos.
- Install missing packages, don't workaround — if a package is needed, ask the user to install it (e.g.
pak::pak("pkg")). Don't write degraded fallback code to avoid the dependency. - Never hardcode extractable data — if coordinates, station names, or metadata can be pulled from an API or database at runtime, do that. Don't hardcode values that have a programmatic source.
- Close issues via commits, not
gh issue close— see Closing Issues above. - Cite primary sources — see references conventions.
Pattern: noun_verb-detail -- noun first, verb second across all naming:
| What | Example |
|---|---|
| Skills | claude-md-init, gh-issue-create, planning-update |
| Scripts | stac_register-baseline.sh, stac_register-pypgstac.sh |
| Logs | 20260209_stac_register-baseline_stac-dem-bc.txt |
| Log format | yyyymmdd_noun_verb-detail_target.ext |
Scripts and logs live together: scripts/<module>/logs/
- Projects = daily cross-repo tracking (always add to relevant project)
- Milestones = iteration boundaries (only for release/claim prep)
- Don't double-track unless there's a reason
| Content | Project |
|---|---|
| R&D, experiments, SRED-related | SRED R&D Tracking (#8) |
| Data storage, sqlite, postgres, pipelines | Data Architecture (#9) |
| Fish passage field/reporting | Fish Passage 2025 (#6) |
| Restoration planning | Aquatic Restoration Planning (#5) |
| QGIS, Mergin, field forms | Collaborative GIS (#3) |
How Claude manages structured planning for complex tasks using planning-with-files (PWF).
Use PWF when a task has multiple phases, requires research, or involves more than ~5 tool calls. Triggers:
- User says "let's plan this", "plan mode", "use planning", or invokes
/planning-init - Complex issue work begins (multi-step, uncertain approach)
- Claude judges the task warrants structured tracking
Skip planning for single-file edits, quick fixes, or tasks with obvious next steps.
- Explore first — Enter plan mode (read-only). Read code, trace paths, understand the problem before proposing anything.
- Plan to files — Write the plan into 3 files in
planning/active/:task_plan.md— Phases with checkbox tasksfindings.md— Research, discoveries, technical analysisprogress.md— Session log with timestamps and commit refs
- Commit the plan — Commit the planning files before starting implementation. This is the baseline.
- Work in atomic commits — Each commit bundles code changes WITH checkbox updates in the planning files. The diff shows both what was done and the checkbox marking it done.
- Code check before commit — Run
/code-checkon staged diffs before committing. Don't mark a task done until the diff passes review. - Archive when complete — Move
planning/active/toplanning/archive/via/planning-archive. Write a README.md in the archive directory with a one-paragraph outcome summary and closing commit/PR ref — future sessions scan these to catch up fast.
Every commit that completes a planned task MUST include:
- The code/script changes
- The checkbox update in
task_plan.md(- [ ]->- [x]) - A progress entry in
progress.mdif meaningful
This creates a git audit trail where git log -- planning/ tells the full story. Each commit is self-documenting — you can backtrack with git and understand everything that happened.
Phases with checkboxes. This is the core tracking file.
# Task Plan
## Phase 1: [Name]
- [ ] Task description
- [ ] Another task
## Phase 2: [Name]
- [ ] Task descriptionMark tasks done as they're completed: - [x] Task description
Append-only research log. Discoveries, technical analysis, things learned.
# Findings
## [Topic]
[What was found, with source/date]Session entries with commit references.
# Progress
## Session YYYY-MM-DD
- Completed: [items]
- Commits: [refs]
- Next: [items]planning/
active/ <- Current work (3 PWF files)
archive/ <- Completed issues
YYYY-MM-issue-N-slug/
If planning/ doesn't exist in the repo, run /planning-init first.
| Skill | When to use |
|---|---|
/planning-init |
First time in a repo — creates directory structure |
/planning-update |
Mid-session — sync checkboxes and progress |
/planning-archive |
Issue complete — archive and create fresh active/ |
Standards for R package development across New Graph Environment repositories. Based on R Packages (2e) by Hadley Wickham and Jenny Bryan.
Reference packages: When starting a new package, study these existing
packages for patterns: flooded, gq. They demonstrate the conventions below
in practice (DESCRIPTION fields, README layout, NEWS.md style, pkgdown setup,
test structure, hex sticker, etc.).
- tidyverse style guide: snake_case, pipe operators (
|>or%>%) - Match existing patterns in each codebase
- Use
pakfor package installation (notinstall.packages) - Prefix column name vectors with
cols_for discoverability in the environment pane:cols_all,cols_carry,cols_split,cols_writable. Same principle for other grouped vectors (params_,tbl_, etc.)
Follow R Packages (2e) conventions:
R/for functions,tests/testthat/for tests,man/for docsDESCRIPTIONwith proper fields (Title, Description, Authors@R)DESCRIPTIONURL field: include both the GitHub repo and the pkgdown site so pkgdown links correctly (e.g.,URL: https://github.com/OWNER/PKG, https://owner.github.io/PKG/)NAMESPACEmanaged by roxygen2 (#' @export,#' @import,#' @importFrom)- Never edit
NAMESPACEorman/by hand
Each exported function gets its own R file and its own test file:
R/fl_mask.R→tests/testthat/test-fl_mask.R- Commit the function and its tests together
- Use
Fixes #Nin the commit message to close the corresponding issue
File a GitHub issue for each function before building it. This creates a traceable record of what was planned, built, and verified.
For new packages or major features, work on a branch and merge via PR:
main ← scaffold-branch (PR closes with "Relates to NewGraphEnvironment/sred-2025-2026#N")
This gives one PR that contains all commits — a single SRED cross-reference
covers the entire body of work. Individual commits within the branch close
their respective function issues with Fixes #N.
Close function issues via commit messages — see Closing Issues in newgraph conventions.
- Use testthat 3e (
Config/testthat/edition: 3in DESCRIPTION) - Run
devtools::test()before committing - Test files mirror source:
R/utils.R->tests/testthat/test-utils.R - Test for edge cases and potential failures, not just happy paths
- Tests must pass before closing the function's issue
- Always grep for errors in the same command as the test run to avoid
running twice:
For error context:
Rscript -e 'devtools::test()' 2>&1 | grep -E "(FAIL|ERROR|PASS)" | tail -5
grep -E "(ERROR:|FAIL )" -A 10 | head -25
Examples are how users discover what a function does. They must:
- Actually run — no
\dontrun{}unless external resources are required - Use bundled test data via
system.file()so they work for anyone - Show why the function is useful — not just that it runs, but what it produces and why you'd use it
- Use qualified names for non-exported dependencies (
terra::rast(),sf::st_read()) since examples run in the user's environment
At least one vignette showing the full pipeline on real data:
- Demonstrates the package solving an actual problem end-to-end
- Uses bundled test data (committed to
inst/testdata/) - Hosted on pkgdown so users can read it without installing
Output format: Use bookdown::html_vignette2 (not
rmarkdown::html_vignette) for figure numbering and cross-references.
Requires bookdown in Suggests and chunks must have fig.cap for
numbered figures. Cross-reference with Figure \@ref(fig:chunk-name).
Vignettes that need external resources (DB, API, STAC): Do NOT use
the .Rmd.orig pre-knit pattern — it breaks bookdown figure numbering
because knitr evaluates chunks during pre-knit and emits 
markdown that bookdown can't number.
Instead, separate data generation from presentation:
data-raw/vignette_data.R— runs the queries, saves results as.rdstoinst/testdata/(orinst/vignette-data/)- Vignette loads
.rdsfiles, all chunks run live during pkgdown build - Note at top of vignette: "Data generated by
data-raw/script.R" - bookdown controls all chunks — figure numbers, cross-refs work
This is the same pattern as test data: data-raw/ documents how the data
was produced, committed artifacts make vignettes reproducible without the
external resource.
- Created via a script in
data-raw/that documents exactly how the data was produced (database queries, spatial crops, etc.) - Committed to
inst/testdata/— small enough to ship with the package - Used by tests, examples, and vignettes — one dataset, three purposes
- roxygen2 for all exported functions
@importor@importFromin the package-level doc (R/<pkg>-package.R) to populate NAMESPACE — don't rely on::everywhere in function bodies- pkgdown site for public packages with
_pkgdown.yml(bootstrap 5) - GitHub Action for pkgdown (
usethis::use_github_action("pkgdown"))
Run lintr::lint_package() before committing R package code. Fix all warnings — every lint should be worth fixing.
linters: linters_with_defaults(
line_length_linter(120),
object_name_linter(styles = c("snake_case", "dotted.case")),
commented_code_linter = NULL
)
exclusions: list(
"renv" = list(linters = "all")
)- 120 char line length (default 80 is too strict for data pipelines)
- Allow dotted.case (common in base R and legacy code)
- Suppress commented code lints (exploratory R scripts often have commented alternatives)
- Exclude renv directory entirely
- Minimize Imports — use
Suggestsfor packages only needed in tests/vignettes - Pin versions only when breaking changes are known
- Prefer packages already in the tidyverse ecosystem
- Update
NEWS.md— keep it concise:- First release: one line (e.g., "Initial release. Brief description.")
- Later releases: describe what changed and why, not function-by-function. Link to the pkgdown reference page for details — don't duplicate it.
- Don't list every function; the pkgdown reference page is the single source of truth for what's in the package.
- Bump version in
DESCRIPTION(e.g.,0.0.0.9000→0.1.0) - Commit as "Release vX.Y.Z"
- Tag:
git tag vX.Y.Z && git push && git push --tags
Protect main from deletion and force pushes:
gh api repos/OWNER/REPO/rulesets --method POST --input - <<'EOF'
{
"name": "Protect main",
"target": "branch",
"enforcement": "active",
"bypass_actors": [
{ "actor_id": 5, "actor_type": "RepositoryRole", "bypass_mode": "always" }
],
"conditions": { "ref_name": { "include": ["refs/heads/main"], "exclude": [] } },
"rules": [ { "type": "deletion" }, { "type": "non_fast_forward" } ]
}
EOFusethis::create_package(".")usethis::use_mit_license("New Graph Environment Ltd.")usethis::use_testthat(edition = 3)usethis::use_pkgdown()usethis::use_github_action("pkgdown")usethis::use_directory("dev")— reproducible setup scriptusethis::use_directory("data-raw")— data generation scripts- Hex sticker via
hexSticker(seedata-raw/make_hexsticker.R) - Set GitHub Pages to serve from
gh-pagesbranch
Keep a dev/dev.R file that documents every setup step. Not idempotent —
run interactively. This is the reproducible recipe for the package scaffold.
Keep the README lean:
- Hex sticker, one-line description, install, example showing why it's useful
- Link to pkgdown vignette and function reference — don't duplicate them
- Don't maintain a function table — it's just another thing to keep updated and pkgdown's reference page is the single source of truth
When an LLM assistant modifies R package code:
- Run
lintr::lint_package()— fix issues before committing - Run
devtools::test()with error grep — ensure tests pass in one call:Rscript -e 'devtools::test()' 2>&1 | grep -E "(FAIL|ERROR|PASS)" | tail -5
- Run
devtools::document()and grep for results:Rscript -e 'devtools::document()' 2>&1 | grep -E "(Writing|Updating|warning)" | tail -10
- Check
devtools::check()passes for releases — capture results in one call:Rscript -e 'devtools::check()' 2>&1 | grep -E "(ERROR|WARNING|NOTE|errors|warnings|notes)" | tail -10
How references flow between Claude Code, Zotero, and technical writing at New Graph Environment.
Three tools, different purposes. Use the right one.
| Need | Tool | Why |
|---|---|---|
| Search by keyword, read metadata/fulltext, semantic search | MCP zotero_* tools |
pyzotero, works with Zotero item keys |
Look up by citation key (e.g., irvine2020ParsnipRiver) |
/zotero-lookup skill |
Citation keys are a BBT feature — pyzotero can't resolve them |
| Create items, attach PDFs, deduplicate | /zotero-api skill |
Connector API for writes, JS console for attachments |
Citation keys vs item keys: Citation keys (like irvine2020ParsnipRiver) come from Better BibTeX. Item keys (like K7WALMSY) are native Zotero. The MCP works with item keys. /zotero-lookup bridges citation keys to item data.
BBT citation key storage: As of Feb 2025+, BBT stores citation keys as a citationKey field directly in zotero.sqlite (via Zotero's item data system), not in a separate BBT database. The old better-bibtex.sqlite and better-bibtex.migrated files are stale and no longer updated. Query citation keys with: SELECT idv.value FROM items i JOIN itemData id ON i.itemID = id.itemID JOIN itemDataValues idv ON id.valueID = idv.valueID JOIN fields f ON id.fieldID = f.fieldID WHERE f.fieldName = 'citationKey'.
When research turns up a reference:
- DOI available: Tell the user — Zotero's magic wand (DOI lookup) is the fastest path
- ResearchGate link: Flag to user for manual check — programmatic fetch is blocked (403), but full text is often there
- BC gov report: Search ACAT, for.gov.bc.ca library, EIRS viewer
- Paywalled: Note it, move on. Don't waste time trying to bypass.
Preferred order:
- DOI magic wand in Zotero UI (fastest, most complete metadata)
- Web API POST with
collectionsarray (grey literature, local PDFs — targets collection directly, no UI interaction needed) saveItemsvia/zotero-api(batch creation from structured data — requires UI collection selection)- JS console script for group library (when connector can't target the right collection)
Collection targeting: saveItems drops items into whatever collection is selected in Zotero's UI. Always confirm with the user before calling it. Web API bypasses this — include "collections": ["KEY"] in the POST body. Find collection keys with ?q=name search on the collections endpoint.
saveItems attachments silently fail. Don't use them. Instead:
- Web API S3 upload (preferred): Create attachment item → get upload auth → build S3 body (Python: prefix + file bytes + suffix) → POST to S3 → register with uploadKey. Works without Zotero running. See
/zotero-apiskill section 4. - JS console fallback: Download with
curl, attach viaitem_attach_pdf.jsin Zotero JS console. - Verify attachment exists via MCP:
zotero_get_item_children
After manual adds, confirm via MCP:
zotero_search_items— find by titlezotero_get_item_metadata— check fields are completezotero_get_item_children— confirm PDF attached
If duplicates were created (common with saveItems retries):
- Run
collection_dedup.jsvia Zotero JS console - It keeps the copy with the most attachments, trashes the rest
# index.Rmd — dynamic bib from Zotero via Better BibTeX
bibliography: "`r rbbt::bbt_write_bib('references.bib', overwrite = TRUE)`"rbbt pulls from BBT, which syncs with Zotero. Edit references in Zotero → rebuild report → bibliography updates.
Library targeting: rbbt must know which Zotero library to search. This is set globally in ~/.Rprofile:
# default library — NewGraphEnvironment group (libraryID 9, group 4733734)
options(rbbt.default.library_id = 9)Without this option, rbbt searches only the personal library (libraryID 1) and won't find group library references. The library IDs map to Zotero's internal numbering — use /zotero-lookup with SELECT DISTINCT libraryID FROM citationkey against the BBT database to discover available libraries.
[@key2020]— parenthetical: (Author 2020)@key2020— narrative: Author (2020)[@key1; @key2]— multiplenocite:in YAML — include uncited references
When a review paper references an older study, trace back to the original and cite it. Don't attribute findings to the review when the original exists. (See LLM Agent Conventions in newgraph.md.)
When the original is unavailable (paywalled, out of print, can't locate): use secondary citation format in the prose and include bib entries for both sources:
Smith et al. (2003; as cited in Doctor 2022) found that...
Both @smith2003 and @doctor2022 go in the .bib file. The reader can then track down the original themselves. Flag incomplete metadata on the primary entry — it's better to have a partial reference than none at all.
When you need a PDF and the obvious URL doesn't work:
- DOI resolver → publisher site (often has OA link)
- Europe PMC (
europepmc.org/backend/ptpmcrender.fcgi?accid=PMC{ID}&blobtype=pdf) — ncbi blocks curl - SciELO — needs
User-Agent: Mozilla/5.0header - ResearchGate — flag to user for manual download
- Semantic Scholar — sometimes has OA links
- Ask user for institutional access
Always verify downloads: file paper.pdf should say "PDF document", not HTML.
scripts/rag_build.R— maps citation keys to Zotero PDF attachment keys, builds DuckDBdata/rag/gitignored — store is local, not committed- Dependencies: ragnar, Ollama with nomic-embed-text model
- See
/lit-searchskill for full recipe
ragnar_store_connect() then ragnar_retrieve() — returns chunks with source file attribution.
- NEVER write abstracts manually — if CrossRef has no abstract, leave blank
- NEVER cite specific numbers without verifying from the source PDF via ragnar search
- NEVER paraphrase equations — copy exact notation and cite page/section
How SR&ED tracking integrates with New Graph Environment's development workflows.
All SRED-eligible work across NGE falls under a single continuous project:
Dynamic GIS-based Data Processing and Reporting Framework
- Field: Software Engineering (2.02.09)
- Start date: May 2022
- Fiscal year: May 1 – April 30
- Consultant: Boast Capital (prepares final technical report)
Do not fragment work into separate claims. Each fiscal year's work is structured as iterations within this one project. Internal tracking (experiment numbers in sred-2025-2026) maps to iterations — Boast assembles the final narrative.
Use Relates to NewGraphEnvironment/sred-2025-2026#N in commit messages when work is SRED-eligible.
Tag hours with sred_ref field linking to the relevant sred-2025-2026 issue number.
Link SRED-eligible issues to the tracking repo: Relates to NewGraphEnvironment/sred-2025-2026#N
Eligible (systematic investigation to overcome technological uncertainty):
- Building tools/functions that don't exist in standard practice
- Prototyping new integrations between systems (GIS ↔ reporting ↔ field collection)
- Testing whether an approach works and documenting why it did/didn't
- Iterating on failed approaches with new hypotheses
Not eligible:
- Standard configuration of known tools
- Routine bug fixes in working systems
- Writing reports using the framework (that's service delivery)
The test: "Did we try something we weren't sure would work, and did we learn something from the attempt?" If yes, it's likely eligible.