feat: add benchmark events as sources#8
feat: add benchmark events as sources#8bruno-dasilva wants to merge 6 commits intobeyond-all-reason:mainfrom
Conversation
# Purpose See beyond-all-reason#7. The goal is to take the "complex events" in teiserver that match system:benchmark and make them available as data for dashboarding. ## Privacy I'm not sure our stance on this but I've defaulted to **excluding** user_id from the export (it's not needed at all) so people can't identify the hardware of specific users. An example payload in the `value` column is this: `{"Sim": {"mean": 13.961731, "count": 1800, "total": 25131.1152, "spread": 1.27117431, "percentiles": {"0": 9.9630003, "1": 11.0419998, "2": 11.2279997, "5": 11.6110001, "10": 12.0139999, "20": 12.5120001, "35": 13.2309999, "50": 13.8470001, "65": 14.46, "80": 15.1219997, "90": 15.9379997, "95": 16.7099991, "98": 17.7919998, "99": 18.5750008, "100": 43.8349991}}, "cpu": "AMD Ryzen 9 7950X3D 16-Core Processor ; 64632MB RAM, 73336MB pagefile", "gpu": "NVIDIA GeForce RTX 5070 Ti/PCIe/SSE2", "Draw": {"mean": 9.84391785, "count": 2388, "total": 23507.2754, "spread": 3.77889252, "percentiles": {"0": 1.70700002, "1": 1.74100006, "2": 1.75399995, "5": 1.85000002, "10": 2.12700009, "20": 3.73200011, "35": 8.6960001, "50": 10.9720001, "65": 12.3769999, "80": 13.1339998, "90": 14.1190004, "95": 16.1040001, "98": 18.6919994, "99": 20.1100006, "100": 158.367004}}, "Update": {"mean": 1.88539255, "count": 2388, "total": 4502.31738, "spread": 0.9580617, "percentiles": {"0": 0.214, "1": 0.227, "2": 0.23100001, "5": 0.245, "10": 0.29499999, "20": 0.588, "35": 1.079, "50": 2.43099999, "65": 2.54900002, "80": 2.67600012, "90": 2.83500004, "95": 2.96399999, "98": 3.16599989, "99": 3.61100006, "100": 8.98400021}}, "display": "3840x1600", "mapName": "Starwatcher 1.0", "gameName": "Beyond All Reason test-29918-105c4e6", "engineVersion": "2025.06.19", "benchmarkcommand": "luarules fightertest corak armpw 650 10 2040"}` which, as you can tell, has no PII. So it should be safe to export here anonymously. ## Next steps A follow up PR will be created to make intermediate/marts but I am holding off on that until a sample of data is exported for the first time (so I can test it locally).
|
@p2004a PTAL when you have some time! |
There was a problem hiding this comment.
I am not qualified to answer "does this do what Bruno expects?" or "is this safe to export?" .
I did spot it referencing a table that does not exist, an event type that is not in the local development environment I have, and I can suggest a simplification of the output to a single file using UNION ALL. So have another go, but we probably need someone with more experience to tell us the event type is and if this is safe.
EDIT: possibly this would make sense to add to dbt sources (and, I suppose, dbt exposures) but I haven't gotten that far in the dbt project yet.
|
@bruno-dasilva This is a start (you extracted the data from postgres) but to get it to be materialized by dbt (and I think only tables in the mart are materialized and served up), I think (reading the documentation in a hurry) you would need to (1) add the name of the dataset you just exported select id as benchmark_id, timestamp, value, is_anon FROM {{ source('pgdumps', 'teiserver_benchmark_events') }} in the models/marts folder.
The documentation has a dataflow diagram if you click on the little blue button in the right side. |
|
@NortySpock I believe you're right, as I noticed the other sources are not available in the output bucket for download. Your suggestion is the easiest way to just passthrough the data which would be a good start for now, I suppose. |
|
I've not looked at it, because I have concerns about sensitivity of this data and just making it available publicly for the whole world. Exporting into private puckers for bar-internal analytics would not be a concern. Just stripping user_id does not make it anonymous. I do not have good enough understanding on whatever what is included here is equivalent to what can be extracted from replay data or not. |
|
@p2004a Two things:
|
|
please check out 9b7fb12 (commit prior to this comment) - a very simple way of segregating to an internal/private bucket. Could do GCS for easier access control or cloudflare for cheaper. |
|
xref this issue i just found beyond-all-reason/Beyond-All-Reason#5173 |
Purpose
Fixes #7. The goal is to take the "complex events" in teiserver that match system:benchmark and make them available as data for dashboarding.
Privacy
I'm not sure our stance on this but I've defaulted to excluding user_id from the export (it's not needed at all) so people can't identify the hardware of specific users.
An example payload in the
valuecolumn is this:{ "Sim": {"mean": 13.961731, "count": 1800, "total": 25131.1152, "spread": 1.27117431, "percentiles": {...}}, "cpu": "AMD Ryzen 9 7950X3D 16-Core Processor ; 64632MB RAM, 73336MB pagefile", "gpu": "NVIDIA GeForce RTX 5070 Ti/PCIe/SSE2", "Draw": {"mean": 9.84391785, "count": 2388, "total": 23507.2754, "spread": 3.77889252, "percentiles": {...}}, "Update": {"mean": 1.88539255, "count": 2388, "total": 4502.31738, "spread": 0.9580617, "percentiles": {...}}, "display": "3840x1600", "mapName": "Starwatcher 1.0", "gameName": "Beyond All Reason test-29918-105c4e6", "engineVersion": "2025.06.19", "benchmarkcommand": "luarules fightertest corak armpw 650 10 2040" }which, as you can tell, has no PII. So it should be safe to export here anonymously.
Next steps
A follow up PR will be created to make intermediate/marts but I am holding off on that until a sample of data is exported for the first time (so I can test it locally).
AI Disclosure
Generated these queries with claude code. But they look right to me (they're simple enough)