Skip to content

A high-performance Dart library for FP-Growth algorithm and association rule mining. Efficiently discover frequent patterns and generate insightful association rules from your data.

License

Notifications You must be signed in to change notification settings

MostafaSensei106/FP-Growth

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

67 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

FP-Growth

Banner

A high-performance Dart library for the FP-Growth algorithm and association rule mining.
Efficiently discover frequent patterns and generate insightful association rules from your data.

About β€’ Features β€’ Installation β€’ Usage β€’ CLI β€’ Contributing β€’ License


πŸ“– About

Welcome to FP-Growth for Dart β€” a robust and efficient library for implementing the FP-Growth algorithm. This package is designed to help you discover frequent itemsets and generate association rules from transactional datasets. It's an essential tool for tasks like market basket analysis, user behavior prediction, and understanding relationships within large data collections.

Built with performance and ease of use in mind, fp_growth provides a comprehensive, scalable, and parallelized solution for pattern mining in Dart and Flutter applications.


✨ Features

Core Algorithm & Functionality

  • FP-Growth Algorithm: A complete and optimized implementation of the Frequent Pattern Growth algorithm.
  • FP-Tree Construction: Efficiently builds a compressed FP-Tree to represent transactional data.
  • Header Table: Utilizes a header table for quick access and traversal of item nodes within the tree.
  • Optimized Mining: Features a recursive mining approach with dynamic pruning and a single-path optimization for faster pattern discovery.
  • Association Rule Generation: Extracts all possible association rules from frequent itemsets.
    • Calculates key metrics: Support, Confidence, Lift, Leverage, and Conviction.

πŸ› οΈ Performance & Scalability

  • Parallel Processing: Harnesses the power of multiple CPU cores by using a fixed-size Isolate Pool to parallelize the mining process, efficiently speeding up analysis on large datasets without the overhead of spawning new isolates for every task.
  • Memory-Optimized Two-Pass Architecture: Built to handle massive datasets. The algorithm processes transactions in two efficient passes, calculating frequencies first and then building the FP-Tree, without holding the entire transaction list in memory during the recursive mining process. This design drastically reduces peak memory usage.
  • Efficient Data Structures: Employs internal integer mapping for items and uses weighted conditional FP-Trees to dramatically reduce memory usage and improve processing speed during recursive mining steps.

βš™οΈ Utilities

  • CSV Data Adapter: Easily load transactional data directly from CSV files using modern stream-based parsers.
  • Data Exporters: Export frequent itemsets and association rules to JSON, CSV, or formatted Text.
  • Command-Line Interface (CLI): A powerful and user-friendly CLI tool for performing analysis directly from your terminal, with support for parallelism, large files, and multiple output formats.

πŸ“¦ Installation

  1. Add this to your package's pubspec.yaml file:

    dependencies:
      fp_growth: ^1.0.3 # Replace with the latest version
  2. Install it from your terminal:

    dart pub get

    or for Flutter projects:

    flutter pub get

πŸš€ Usage

This library provides a two-step process for analysis:

  1. Mine Frequent Itemsets: Discover which groups of items appear together frequently.
  2. Generate Association Rules: Create rules ({A} => {B}) from those itemsets to uncover actionable insights.

1. Mining Frequent Itemsets

First, find frequent itemsets from your data source.

Which mining API should I use?

  • In-memory List? ==> Use fpGrowth.mineFromList() (easiest).
  • CSV file? ==> Use fpGrowth.mineFromCsv() (recommended for large files).
  • Database or custom source? ==> Use fpGrowth.mine() with a stream provider (advanced).

Example: From an In-Memory List

import 'package:fp_growth/fp_growth.dart';

final transactions = [
  ['bread', 'milk'],
  ['bread', 'diaper', 'beer', 'eggs'],
  ['milk', 'diaper', 'beer', 'cola'],
];

// Instantiate FPGrowth and find itemsets with a minimum support of 2.
final fpGrowth = FPGrowth<String>(minSupport: 2);
final (frequentItemsets, totalTransactions) = await fpGrowth.mineFromList(transactions);

// frequentItemsets is a Map<List<String>, int>
// totalTransactions is an int

2. Generating Association Rules

Once you have the frequent itemsets, you can generate association rules. The RuleGenerator class takes the frequent itemsets and the total transaction count to calculate metrics like confidence and lift.

import 'package:fp_growth/fp_growth.dart';

// (Continuing from the previous example...)

// 1. Setup the RuleGenerator
final generator = RuleGenerator<String>(
  minConfidence: 0.7, // 70%
  frequentItemsets: frequentItemsets,
  totalTransactions: totalTransactions,
);

// 2. Generate the rules
final rules = generator.generateRules();

// 3. Print the rules and their metrics
for (final rule in rules) {
  print(rule.formatWithMetrics());
  // Example Output:
  // {beer} => {diaper} [sup: 0.667, conf: 1.000, lift: 1.50, lev: 0.222, conv: ∞]
}

Other Data Sources

From a CSV File

For large files, mineFromCsv is the most memory-efficient option. It requires importing package:fp_growth/fp_growth_io.dart and is not available on the web.

import 'dart:io';
import 'package:fp_growth/fp_growth_io.dart'; // Note the IO-specific import!

Future<void> processLargeFile(String filePath) async {
  final fpGrowth = FPGrowth<String>(
    minSupport: 70,
    parallelism: Platform.numberOfProcessors,
  );

  final (itemsets, count) = await fpGrowth.mineFromCsv(filePath);

  // Now you can generate rules with the `itemsets` and `count`.
  print('Found ${itemsets.length} frequent itemsets in $count transactions.');
}

From a Custom Stream

For databases or other custom sources, use the core mine method with a stream provider function (Stream<List<T>> Function()). This function must return a new stream each time it's called.

import 'dart:async';
import 'package:fp_growth/fp_growth.dart';

Future<void> useCustomStream() async {
  Stream<List<String>> streamProvider() => Stream.fromIterable([
    ['a', 'b', 'c'], ['a', 'b'], ['b', 'c'], ['a', 'c'],
  ]);

  final fpGrowth = FPGrowth<String>(minSupport: 2);
  final (itemsets, count) = await fpGrowth.mine(streamProvider);

  // Now you can generate rules with the `itemsets` and `count`.
  print('Found ${itemsets.length} itemsets in $count transactions.');
}

πŸ“‹ CLI Usage

The fp_growth package includes a command-line interface (CLI) tool for quick analysis of CSV files without writing any Dart code. It's designed to handle large files by streaming data.

Prerequisites

Create a CSV file (e.g., data.csv) where each line represents a transaction, and items are comma-separated.

bread,milk
bread,diaper,beer,eggs
milk,diaper,beer,cola

Running the CLI

Execute the CLI tool using dart run. You can specify the minimum support, confidence, and an output file.

# Run analysis and print to console
dart run fp_growth --input data.csv --minSupport 0.6 --minConfidence 0.7

# Save results to a JSON file
dart run fp_growth -i data.csv -s 3 -c 0.7 -o results.json -f json

# Save results to a CSV file
dart run fp_growth -i data.csv -s 3 -c 0.7 --output-file results.csv --output-format csv

Options

Flag Abbreviation Description Default
--input -i (Mandatory) Path to the input CSV file.
--minSupport -s Minimum support as a percentage (0.05) or an absolute count (5). 0.05
--minConfidence -c Minimum confidence threshold for association rules. 0.7
--output-file -o Path to an output file to save results. null
--output-format -f Output format (json or csv). Only used if output-file is specified. json
--log-level Set the logging level (debug, info, warning, error, none). info
--parallelism -p Number of isolates to use for parallel processing. 1

⚑ Performance

The fp_growth library is optimized for both speed and memory efficiency. The following benchmarks were run on an AMD Ryzenβ„’ 7 5800H (16 Threads) with a dataset of 1,000,000 transactions and a minimum support of 0.05. The results highlight the performance improvements of the new implementation (v2.0.1) compared to the older version (v1.0.2).

Old Benchmark (v1.0.2):

Averaged Execution Time: ~2.73 seconds.

Benchmark Results (v2.0.1)

API Method Averaged Execution Time Speed vs. v1.0.2 Memory Usage (Delta) Notes
In-Memory (mineFromList) 1.46 s 1.87Γ— faster +28.4 MB Fastest execution, requires full dataset in RAM.
CSV Streaming (mineFromCsv) 2.32 s 1.18Γ— faster -0.70 MB Minimal memory footprint, ideal for very large files.
Custom Stream (mine) 1.82 s 1.50Γ— faster Not measured Flexible streaming for custom data sources.

🀝 Contributing

Contributions are welcome! Here’s how to get started:

  1. Fork the repository.
  2. Create a new branch: git checkout -b feature/YourFeature
  3. Commit your changes: git commit -m "Add amazing feature"
  4. Push to your branch: git push origin feature/YourFeature
  5. Open a pull request.

πŸ’‘ Please read our Contributing Guidelines and open an issue first for major feature ideas or changes.


πŸ“œ License

This project is licensed under the GPL-3.0 License. See the LICENSE file for full details.

Made with ❀️ by MostafaSensei106

About

A high-performance Dart library for FP-Growth algorithm and association rule mining. Efficiently discover frequent patterns and generate insightful association rules from your data.

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Languages