Skip to content

andrlime/macaw

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Macaw: lightweight monitoring over UDP

[[very_work_in_progress]]

Macaw is a lightweight runtime profiler for long-running research jobs. Take the stupid example

import macaw
import numpy as np
import time

def very_important_research(n_iterations=200):
    results = []
    
    for i, _ in macaw.track(enumerate(range(n_iterations)), total=n_iterations, desc="doing nothing"):
        with macaw.stage("generate_data"):
            data = np.random.randn(65536) * i
            macaw.log.info(f"generated {len(data)} numbers")

        with macaw.stage("fourier_transform"):
            ft = np.fft.fft(data)
            ft = np.fft.fftshift(ft)
            macaw.log.info("transformed data into different data")

    macaw.log.info(f"done")

if __name__ == "__main__":
    very_important_research()

Then the output may look like

[macaw] job started: very_important_research.py (pid: 48291, job_id: a3f2-...)

iter   1/200  [          ]   0%  eta: --:--
  → generate_data        0.001s
  → fourier_transform    0.043s
      → numpy.fft.fft       0.038s  shape=(65536,) dtype=complex128
      → numpy.fft.fftshift  0.004s  shape=(65536,)
  → crop                 0.001s
  → analysis             0.002s
  metrics: mean=2.31  is_big=False  confidence=0.71  vibes=bad

...

[INFO] done

[macaw] job finished: very_important_research.py
  elapsed:     14s
  exit code:   0
  peak memory: 200.0 MB
  iterations:  200/200

  stage summary:
  fourier_transform    avg: 0.043s  min: 0.038s  max: 0.089s  ± 0.004s
    numpy.fft.fft      avg: 0.038s  min: 0.034s  max: 0.081s  ± 0.003s
    numpy.fft.fftshift avg: 0.004s  min: 0.003s  max: 0.006s  ± 0.001s
  generate_data        avg: 0.001s  min: 0.001s  max: 0.002s  ± 0.000s

This streams to a receiver over UDP which uses SSE to stream the events to a frontend. That frontend can display this information in a dashboard.

This tool profiles runtime-defined stages and substages into corresponding runtimes, including support for wrapping common library functions. Instead of perf-like verbose timing, we get a high level profile of only the functions we care about, and user-defined stages.

The choice to use UDP is because singular events being dropped doesn't matter, and the overhead of using TCP doesn't seem to be worth it when another progress update will be sent soon anyway.

Some more concrete use cases may be

with macaw.stage("generate_data"):
  for i in macaw.track(range(1000000), total=1000000, desc="generating data"):
    generate_data(...)

with macaw.stage("consume_data"):
  ...

and this enables monitoring of data generation without needing to SSH into the box running the script, or manually dig for a log file, or something else annoying.

Structure

/server - Rust-based UDP backend /frontend - React-based frontend /sdk/py - Python library /sdk/jl - Julia library

About

SSE-based UDP monitoring for research scripts

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages