Financial Narrative Summarization Evaluation (🚧 In Progress)

📄 Overview

In this project we evaluate evaluation methods for automatic summarization on long financial documents from the Financial Narrative Summarization (FNS) task 2023.

Features

Data collection

This project uses 3 datasets of annual reports and their (gold) summaries.

English
Greek
Spanish

Due to access limitations a script has been added to download the Greek dataset that is publicly available.

To get the Greek dataset run the data collection script.

Dataset and text statistics

You can get the statistics for all the datasets (original datasets and generated candidate summaries) and their texts by running the stats scripts.

Text statistics extracted:

spaCy token count
spaCy sentence count
BERT token count

Generation of noisy candidate summaries

To create noisy summaries from existing ones run the summary corruption script.

Summary evaluation

To evaluate your summaries run the summary evaluation script.

🗂️ Project structure

thesis/
│── data/                         # Source documents, gold summaries, candidate summaries (generated by pipelines/generate.py)
│── evaluation_methods/           # Evaluation methods 
│── notebooks/                    # Analyses of the results
│── results/                      # Evaluation results from pipelines/evaluate.py
│── samples/                      # Example untracked files
│── src/
|   ├── modules/
│       ├── data_collector.py     # Data collection
│       ├── summary_corruptor.py  # Noise insertion
│       ├── summary_generator.py  # Candidate summaries generator
│       ├── summary_evaluator.py  # Evaluator
│       ├── tokenizer.py          # Tokenization handling
|   ├── pipelines/
│       ├── collect.py            # Data collection pipeline
│       ├── generate.py           # Candidate summaries generation pipeline
│       ├── evaluate.py           # Evaluation pipeline
|   ├── utils/                    # Helper functions for the main modules
│       ├── summary_corruptor_utils.py
│       ├── summary_evaluator_utils.py
│       ├── visualization.py
├── main.py
│── README.md

Requirements

`uv` package manager

This project uses uv for its dependencies. Follow the official installation instructions depending on your OS.

Datasets

A data directory at the root is needed.

If you don't have a dataset available, you can run the data collection script to get a dataset of Greek annual reports and their gold summaries

Untracked files and directories

Create a .env file at the root based on the samples/sample.env file.
Create a conf/config.yaml file in the src/ directory.

To generate and/or evaluate candidate summaries, configure the language variables in the .env file:

LANGUAGE: English, Greek, Spanish
SUMMARY_VER:
- English: _1
- Greek: _2
- Spanish: _GS1

Evaluation methods

The metrics used in this project are:

Metric type	Metric name	How to set up
N-gram-based	Rouge-1, Rouge-2	Comes with the project
N-gram-graph-based	AutoSummENG, MeMoG, NPowER	Contact G.Giannakopoulos
Embeddings-based	BERTScore	Comes with the project
Embeddings-based	BARTScore	Clone repository
Model-based	Bleurt	Clone repository
Model-based	FactCC	Model is used via HuggingFace
Model-based	LongDocFACTScore	Comes with the project

Installation of BARTScore and Bleurt

Create an evaluation_methods directory at the root.

cd thesis
mkdir evaluation_methods
cd evaluation_methods

Clone the repositories inside the evaluation_methods directory following the instructions of the repositories.

git clone <the-eval-metric-repo>

Setup for FactCC

🛠️ Installation

git clone https://github.com/stefsyrsiri/thesis.git
cd thesis
uv sync --locked

🚀 Running the scripts

To use the scripts, ensure that all the requirements are met.

🔧 Run options

# Run the pipeline with selected steps:
uv run main.py [--collect] [--generate] [--evaluate] [--all]

Collect

To collect the Greek annual reports dataset run the main script with the collect flag:

uv run main.py --collect

Merge datasets (needed to get the dataset stats)

To merge all the datasets (all languages, all document types) and store them in a single place run the following:

uv run main.py --merge-datasets

Get statistics

To get the text statistics for the merged dataset run the following:

uv run main.py --stats

Generate

To create noisy summaries run the main script with the generate flag:

uv run main.py --generate

You can optionally use the truncate flag to truncate the summaries at 512 tokens before applying the noise. This is useful, if you need to evaluate summaries with metrics that have token input limitations.

Note: Even with truncation, it is not possible to predetermine the exact length of the final text and it is certain that it will exceed the 512 token limit and be automatically truncated. However, truncating prior to noise insertion limits the automatic truncation of the metric at inference time making the results more reliable.

Evaluate

To evaluate summaries run the main script with the evaluate flag. When running the evaluation script, you need to specify whether you're going to use CPU-bound or GPU-bound metrics by using either the cpu or gpu flag. If you want to use a reference-free metric, such as LongDocFACTScore, you need to add the no-refs flag.

Note:

LongDocFACTScore is GPU-bound as it depends on BARTScore, and therefore you need both gpu and no-refs flags.

uv run main.py --evaluate

Generic flags

all: Run all the scripts (collect, generate, evaluate)
subset: Run your selected script for [subset] documents, where the subset is a positive integer

Examples

# Collect (Greek) data
uv run main.py --collect

# Generate noisy summaries
uv run main.py --generate --truncate

# Evaluate summaries
uv run main.py --evaluate --gpu --subset 10

# Run all steps: collect, generate, evaluate
uv run main.py --all

Name		Name	Last commit message	Last commit date
Latest commit History 125 Commits
notebooks		notebooks
results		results
samples		samples
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Financial Narrative Summarization Evaluation (🚧 In Progress)

📄 Overview

Table of Contents

Features

Data collection

Dataset and text statistics

Generation of noisy candidate summaries

Summary evaluation

🗂️ Project structure

Requirements

`uv` package manager

Datasets

Untracked files and directories

Evaluation methods

Installation of BARTScore and Bleurt

Setup for FactCC

🛠️ Installation

🚀 Running the scripts

🔧 Run options

Collect

Merge datasets (needed to get the dataset stats)

Get statistics

Generate

Evaluate

Generic flags

Examples

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Financial Narrative Summarization Evaluation (🚧 In Progress)

📄 Overview

Table of Contents

Features

Data collection

Dataset and text statistics

Generation of noisy candidate summaries

Summary evaluation

🗂️ Project structure

Requirements

uv package manager

Datasets

Untracked files and directories

Evaluation methods

Installation of BARTScore and Bleurt

Setup for FactCC

🛠️ Installation

🚀 Running the scripts

🔧 Run options

Collect

Merge datasets (needed to get the dataset stats)

Get statistics

Generate

Evaluate

Generic flags

Examples

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`uv` package manager

Packages