Skip to content

Run & track experiments on Hugging Face infra #520

@lewtun

Description

@lewtun

Hi folks 👋 I made an end-to-end example on how to launch & track parameter-golf training experiments on Hugging Face infra using HF Jobs and Buckets, with live metrics via Trackio: https://github.com/lewtun/parameter-golf/tree/hf-jobs-example/examples/hf_jobs

The setup is just two files:

  • launch_job.py runs locally: it creates a bucket, submits the training job, and streams logs back to your terminal.
  • train_job.py runs on the Hub as a uv script: it installs dependencies, downloads the FineWeb data, logs metrics to Trackio, and uploads results (logs, metrics, model artifact) to your bucket.

A typical launch looks like:

python launch_job.py --name baseline-9L --hardware h200x8                                                                                                                                                                                                                                                                                                                        

And the outputs look something like this:

It also comes with instructions to equip your favourite agent with skills for managing the infra, so you can launch autoresearch experiments to your heart's content. I hope you find it useful in your golfing 🏌️‍♂️!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions