Skip to content

steffenfritz/FileTrove

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

443 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Build Status License: AGPL v3 Go Reference OpenSSF Scorecard OpenSSF Best Practices

VERSION: v1.0.0-BETA-4


FileTrove walks a directory tree, identifies every file, computes metadata, and writes all results into a SQLite database with TSV export support.

What it collects

Category Details
File type MIME type, PRONOM identifier, format version, identification proof/note, extension — via siegfried
File & directory timestamps Creation, modification, and access times
Hashes MD5, SHA1, SHA256, SHA512, BLAKE2B-512
Entropy Shannon entropy (files up to 1 GB)
Extended attributes xattr from ext3/ext4, btrfs, APFS, and others
EXIF metadata Extracted from image files
YARA-X Match results from your own rule files
NSRL Flags known software files via the National Software Reference Library
Dublin Core Optional session-level descriptive metadata

Each file and directory gets a UUIDv4 as a unique identifier. All results land in a SQLite database and can be exported to TSV.

Installation

  1. Get a distribution bundle — download from the releases page, or build one from source (see BUILDING.md):

    task dist:bundle    # builds binaries + bundles siegfried.sig + nsrl.bloom

    The bundle at build/<os>_<arch>/ contains everything you need.

  2. Run the installer from the bundle directory:

    cd build/darwin_arm64   # or linux_amd64, etc.
    ./ftrove --install .

    This creates the scan database (db/filetrove.db) and logs/ directory. The siegfried signature file and NSRL bloom filter are already included in the bundle.

  3. You're ready.

Building from source without task dist? You can also set up the NSRL bloom filter separately. See BUILDING.md for details on task nsrl:build-all and disk space requirements.

YARA-X

YARA-X scanning requires a C library that is not bundled with FileTrove. It is built automatically during task build if not already present. See BUILDING.md for setup instructions.

  • Example rule files: testdata/yara/
  • When a rule matches, the rule name, session UUID, and file UUID are recorded in the yara table. The rule file itself is not stored.

NSRL

FileTrove ships a pre-built NSRL Bloom filter in the repository. When NIST publishes a new RDS version, rebuild by updating NSRL_VERSION in Taskfile.nsrl.yml and running one of the build targets above.

You can also build a custom Bloom filter from any newline-delimited list of SHA1 hashes:

admftrove --creatensrl hashes.txt --nsrlversion "my-hashset-v1"

Optional flags: --nsrl-estimate (expected hash count, default 40M) and --nsrl-fpr (false positive rate, default 0.0001). Copy the resulting nsrl.bloom into db/.

Running a scan

./ftrove -i $DIRECTORY

FileTrove walks $DIRECTORY recursively. Run ./ftrove -h for all available flags.

Viewing results

List all sessions and export one to TSV:

./ftrove -l
./ftrove -t 926be141-ab75-4106-8236-34edfcf102f2

You can also query the SQLite database directly:

Background

FileTrove is the successor of filedriller, based on the iPres 2021 paper Marrying siegfried and the National Software Reference Library.

Sponsor this project

 

Packages

 
 
 

Contributors