Skip to content

feat: add benchmark events as sources#8

Open
bruno-dasilva wants to merge 6 commits intobeyond-all-reason:mainfrom
bruno-dasilva:main
Open

feat: add benchmark events as sources#8
bruno-dasilva wants to merge 6 commits intobeyond-all-reason:mainfrom
bruno-dasilva:main

Conversation

@bruno-dasilva
Copy link
Copy Markdown
Contributor

@bruno-dasilva bruno-dasilva commented Apr 15, 2026

Purpose

Fixes #7. The goal is to take the "complex events" in teiserver that match system:benchmark and make them available as data for dashboarding.

Privacy

I'm not sure our stance on this but I've defaulted to excluding user_id from the export (it's not needed at all) so people can't identify the hardware of specific users.

An example payload in the value column is this:

{
"Sim": {"mean": 13.961731, "count": 1800, "total": 25131.1152, "spread": 1.27117431, "percentiles": {...}}, 
"cpu": "AMD Ryzen 9 7950X3D 16-Core Processor          ; 64632MB RAM, 73336MB pagefile", 
"gpu": "NVIDIA GeForce RTX 5070 Ti/PCIe/SSE2", 
"Draw": {"mean": 9.84391785, "count": 2388, "total": 23507.2754, "spread": 3.77889252, "percentiles": {...}}, 
"Update": {"mean": 1.88539255, "count": 2388, "total": 4502.31738, "spread": 0.9580617, "percentiles": {...}}, 
"display": "3840x1600", 
"mapName": "Starwatcher 1.0", 
"gameName": "Beyond All Reason test-29918-105c4e6", 
"engineVersion": "2025.06.19", 
"benchmarkcommand": "luarules fightertest corak armpw 650 10 2040"
}

which, as you can tell, has no PII. So it should be safe to export here anonymously.

Next steps

A follow up PR will be created to make intermediate/marts but I am holding off on that until a sample of data is exported for the first time (so I can test it locally).

AI Disclosure

Generated these queries with claude code. But they look right to me (they're simple enough)

# Purpose
See beyond-all-reason#7. The goal is to take the "complex events" in teiserver that match system:benchmark and make them available as data for dashboarding.

## Privacy
I'm not sure our stance on this but I've defaulted to **excluding** user_id from the export (it's not needed at all) so people can't identify the hardware of specific users.

An example payload in the `value` column is this: `{"Sim": {"mean": 13.961731, "count": 1800, "total": 25131.1152, "spread": 1.27117431, "percentiles": {"0": 9.9630003, "1": 11.0419998, "2": 11.2279997, "5": 11.6110001, "10": 12.0139999, "20": 12.5120001, "35": 13.2309999, "50": 13.8470001, "65": 14.46, "80": 15.1219997, "90": 15.9379997, "95": 16.7099991, "98": 17.7919998, "99": 18.5750008, "100": 43.8349991}}, "cpu": "AMD Ryzen 9 7950X3D 16-Core Processor          ; 64632MB RAM, 73336MB pagefile", "gpu": "NVIDIA GeForce RTX 5070 Ti/PCIe/SSE2", "Draw": {"mean": 9.84391785, "count": 2388, "total": 23507.2754, "spread": 3.77889252, "percentiles": {"0": 1.70700002, "1": 1.74100006, "2": 1.75399995, "5": 1.85000002, "10": 2.12700009, "20": 3.73200011, "35": 8.6960001, "50": 10.9720001, "65": 12.3769999, "80": 13.1339998, "90": 14.1190004, "95": 16.1040001, "98": 18.6919994, "99": 20.1100006, "100": 158.367004}}, "Update": {"mean": 1.88539255, "count": 2388, "total": 4502.31738, "spread": 0.9580617, "percentiles": {"0": 0.214, "1": 0.227, "2": 0.23100001, "5": 0.245, "10": 0.29499999, "20": 0.588, "35": 1.079, "50": 2.43099999, "65": 2.54900002, "80": 2.67600012, "90": 2.83500004, "95": 2.96399999, "98": 3.16599989, "99": 3.61100006, "100": 8.98400021}}, "display": "3840x1600", "mapName": "Starwatcher 1.0", "gameName": "Beyond All Reason test-29918-105c4e6", "engineVersion": "2025.06.19", "benchmarkcommand": "luarules fightertest corak armpw 650 10 2040"}` which, as you can tell, has no PII. So it should be safe to export here anonymously.

## Next steps
A follow up PR will be created to make intermediate/marts but I am holding off on that until a sample of data is exported for the first time (so I can test it locally).
@bruno-dasilva
Copy link
Copy Markdown
Contributor Author

@p2004a PTAL when you have some time!

Comment thread scripts/export_prod_data_source.sql Outdated
Comment thread scripts/export_prod_data_source.sql Outdated
Copy link
Copy Markdown

@NortySpock NortySpock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not qualified to answer "does this do what Bruno expects?" or "is this safe to export?" .

I did spot it referencing a table that does not exist, an event type that is not in the local development environment I have, and I can suggest a simplification of the output to a single file using UNION ALL. So have another go, but we probably need someone with more experience to tell us the event type is and if this is safe.

EDIT: possibly this would make sense to add to dbt sources (and, I suppose, dbt exposures) but I haven't gotten that far in the dbt project yet.

Comment thread scripts/export_prod_data_source.sql
Comment thread scripts/export_prod_data_source.sql Outdated
Comment thread scripts/export_prod_data_source.sql Outdated
@NortySpock
Copy link
Copy Markdown

@bruno-dasilva This is a start (you extracted the data from postgres) but to get it to be materialized by dbt (and I think only tables in the mart are materialized and served up), I think (reading the documentation in a hurry) you would need to (1) add the name of the dataset you just exported teiserver_benchmark_events to the sources list at the bottom of

and then (2) create a sql statement named "what_you_want_the_dataset_to_be_called.sql" with select id as benchmark_id, timestamp, value, is_anon FROM {{ source('pgdumps', 'teiserver_benchmark_events') }} in the models/marts folder.

The documentation has a dataflow diagram if you click on the little blue button in the right side.
https://beyond-all-reason.github.io/data-processing/#!/overview?g_v=1

@bruno-dasilva
Copy link
Copy Markdown
Contributor Author

@NortySpock I believe you're right, as I noticed the other sources are not available in the output bucket for download.

Your suggestion is the easiest way to just passthrough the data which would be a good start for now, I suppose.

@p2004a
Copy link
Copy Markdown
Collaborator

p2004a commented May 1, 2026

I've not looked at it, because I have concerns about sensitivity of this data and just making it available publicly for the whole world. Exporting into private puckers for bar-internal analytics would not be a concern.

Just stripping user_id does not make it anonymous. I do not have good enough understanding on whatever what is included here is equivalent to what can be extracted from replay data or not.

@bruno-dasilva
Copy link
Copy Markdown
Contributor Author

@p2004a Two things:

  1. replays currently log user's hardware (and their username) so it's already available for the whole world. I know this because I downloaded the last year of replays and am currently running analysis on them. That doesn't mean it's a good precedent to follow.
  2. I don't care for this to be available for the whole world, it's not something very useful outside of BAR developers, so I'm totally okay with this being turned into something for private analytics.
    • Do you have any suggestions on how to move forward with that? Extend the DBT repo but with a set of marts that go to a private destination?

@bruno-dasilva
Copy link
Copy Markdown
Contributor Author

bruno-dasilva commented May 1, 2026

please check out 9b7fb12 (commit prior to this comment) - a very simple way of segregating to an internal/private bucket. Could do GCS for easier access control or cloudflare for cheaper.

@bruno-dasilva
Copy link
Copy Markdown
Contributor Author

xref this issue i just found beyond-all-reason/Beyond-All-Reason#5173

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Proposal: adding benchmark stats to source files

3 participants