A high-performance Dart library for the FP-Growth algorithm and association rule mining.
Efficiently discover frequent patterns and generate insightful association rules from your data.
About β’ Features β’ Installation β’ Usage β’ CLI β’ Contributing β’ License
Welcome to FP-Growth for Dart β a robust and efficient library for implementing the FP-Growth algorithm. This package is designed to help you discover frequent itemsets and generate association rules from transactional datasets. It's an essential tool for tasks like market basket analysis, user behavior prediction, and understanding relationships within large data collections.
Built with performance and ease of use in mind, fp_growth provides a comprehensive, scalable, and parallelized solution for pattern mining in Dart and Flutter applications.
- FP-Growth Algorithm: A complete and optimized implementation of the Frequent Pattern Growth algorithm.
- FP-Tree Construction: Efficiently builds a compressed FP-Tree to represent transactional data.
- Header Table: Utilizes a header table for quick access and traversal of item nodes within the tree.
- Optimized Mining: Features a recursive mining approach with dynamic pruning and a single-path optimization for faster pattern discovery.
- Association Rule Generation: Extracts all possible association rules from frequent itemsets.
- Calculates key metrics:
Support,Confidence,Lift,Leverage, andConviction.
- Calculates key metrics:
- Parallel Processing: Harnesses the power of multiple CPU cores by using a fixed-size Isolate Pool to parallelize the mining process, efficiently speeding up analysis on large datasets without the overhead of spawning new isolates for every task.
- Memory-Optimized Two-Pass Architecture: Built to handle massive datasets. The algorithm processes transactions in two efficient passes, calculating frequencies first and then building the FP-Tree, without holding the entire transaction list in memory during the recursive mining process. This design drastically reduces peak memory usage.
- Efficient Data Structures: Employs internal integer mapping for items and uses weighted conditional FP-Trees to dramatically reduce memory usage and improve processing speed during recursive mining steps.
- CSV Data Adapter: Easily load transactional data directly from CSV files using modern stream-based parsers.
- Data Exporters: Export frequent itemsets and association rules to
JSON,CSV, or formattedText. - Command-Line Interface (CLI): A powerful and user-friendly CLI tool for performing analysis directly from your terminal, with support for parallelism, large files, and multiple output formats.
-
Add this to your package's
pubspec.yamlfile:dependencies: fp_growth: ^1.0.3 # Replace with the latest version
-
Install it from your terminal:
dart pub get
or for Flutter projects:
flutter pub get
This library provides a two-step process for analysis:
- Mine Frequent Itemsets: Discover which groups of items appear together frequently.
- Generate Association Rules: Create rules (
{A} => {B}) from those itemsets to uncover actionable insights.
First, find frequent itemsets from your data source.
Which mining API should I use?
- In-memory
List? ==> UsefpGrowth.mineFromList()(easiest).- CSV file? ==> Use
fpGrowth.mineFromCsv()(recommended for large files).- Database or custom source? ==> Use
fpGrowth.mine()with a stream provider (advanced).
import 'package:fp_growth/fp_growth.dart';
final transactions = [
['bread', 'milk'],
['bread', 'diaper', 'beer', 'eggs'],
['milk', 'diaper', 'beer', 'cola'],
];
// Instantiate FPGrowth and find itemsets with a minimum support of 2.
final fpGrowth = FPGrowth<String>(minSupport: 2);
final (frequentItemsets, totalTransactions) = await fpGrowth.mineFromList(transactions);
// frequentItemsets is a Map<List<String>, int>
// totalTransactions is an intOnce you have the frequent itemsets, you can generate association rules. The RuleGenerator class takes the frequent itemsets and the total transaction count to calculate metrics like confidence and lift.
import 'package:fp_growth/fp_growth.dart';
// (Continuing from the previous example...)
// 1. Setup the RuleGenerator
final generator = RuleGenerator<String>(
minConfidence: 0.7, // 70%
frequentItemsets: frequentItemsets,
totalTransactions: totalTransactions,
);
// 2. Generate the rules
final rules = generator.generateRules();
// 3. Print the rules and their metrics
for (final rule in rules) {
print(rule.formatWithMetrics());
// Example Output:
// {beer} => {diaper} [sup: 0.667, conf: 1.000, lift: 1.50, lev: 0.222, conv: β]
}For large files, mineFromCsv is the most memory-efficient option. It requires importing package:fp_growth/fp_growth_io.dart and is not available on the web.
import 'dart:io';
import 'package:fp_growth/fp_growth_io.dart'; // Note the IO-specific import!
Future<void> processLargeFile(String filePath) async {
final fpGrowth = FPGrowth<String>(
minSupport: 70,
parallelism: Platform.numberOfProcessors,
);
final (itemsets, count) = await fpGrowth.mineFromCsv(filePath);
// Now you can generate rules with the `itemsets` and `count`.
print('Found ${itemsets.length} frequent itemsets in $count transactions.');
}For databases or other custom sources, use the core mine method with a stream provider function (Stream<List<T>> Function()). This function must return a new stream each time it's called.
import 'dart:async';
import 'package:fp_growth/fp_growth.dart';
Future<void> useCustomStream() async {
Stream<List<String>> streamProvider() => Stream.fromIterable([
['a', 'b', 'c'], ['a', 'b'], ['b', 'c'], ['a', 'c'],
]);
final fpGrowth = FPGrowth<String>(minSupport: 2);
final (itemsets, count) = await fpGrowth.mine(streamProvider);
// Now you can generate rules with the `itemsets` and `count`.
print('Found ${itemsets.length} itemsets in $count transactions.');
}The fp_growth package includes a command-line interface (CLI) tool for quick analysis of CSV files without writing any Dart code. It's designed to handle large files by streaming data.
Create a CSV file (e.g., data.csv) where each line represents a transaction, and items are comma-separated.
bread,milk
bread,diaper,beer,eggs
milk,diaper,beer,cola
Execute the CLI tool using dart run. You can specify the minimum support, confidence, and an output file.
# Run analysis and print to console
dart run fp_growth --input data.csv --minSupport 0.6 --minConfidence 0.7
# Save results to a JSON file
dart run fp_growth -i data.csv -s 3 -c 0.7 -o results.json -f json
# Save results to a CSV file
dart run fp_growth -i data.csv -s 3 -c 0.7 --output-file results.csv --output-format csv| Flag | Abbreviation | Description | Default |
|---|---|---|---|
--input |
-i |
(Mandatory) Path to the input CSV file. | |
--minSupport |
-s |
Minimum support as a percentage (0.05) or an absolute count (5). |
0.05 |
--minConfidence |
-c |
Minimum confidence threshold for association rules. | 0.7 |
--output-file |
-o |
Path to an output file to save results. | null |
--output-format |
-f |
Output format (json or csv). Only used if output-file is specified. |
json |
--log-level |
Set the logging level (debug, info, warning, error, none). |
info |
|
--parallelism |
-p |
Number of isolates to use for parallel processing. | 1 |
The fp_growth library is optimized for both speed and memory efficiency. The following benchmarks were run on an AMD Ryzenβ’ 7 5800H (16 Threads) with a dataset of 1,000,000 transactions and a minimum support of 0.05. The results highlight the performance improvements of the new implementation (v2.0.1) compared to the older version (v1.0.2).
Old Benchmark (v1.0.2):
Averaged Execution Time: ~2.73 seconds.
| API Method | Averaged Execution Time | Speed vs. v1.0.2 | Memory Usage (Delta) | Notes |
|---|---|---|---|---|
In-Memory (mineFromList) |
1.46 s | 1.87Γ faster | +28.4 MB | Fastest execution, requires full dataset in RAM. |
CSV Streaming (mineFromCsv) |
2.32 s | 1.18Γ faster | -0.70 MB | Minimal memory footprint, ideal for very large files. |
Custom Stream (mine) |
1.82 s | 1.50Γ faster | Not measured | Flexible streaming for custom data sources. |
Contributions are welcome! Hereβs how to get started:
- Fork the repository.
- Create a new branch:
git checkout -b feature/YourFeature - Commit your changes:
git commit -m "Add amazing feature" - Push to your branch:
git push origin feature/YourFeature - Open a pull request.
π‘ Please read our Contributing Guidelines and open an issue first for major feature ideas or changes.
This project is licensed under the GPL-3.0 License. See the LICENSE file for full details.
Made with β€οΈ by MostafaSensei106