Adding batching and multi-histogram support by perrymcmanis144 · Pull Request #12 · mozilla/mozfun-local

perrymcmanis144 · 2023-01-20T22:37:42Z

My queries are getting queued so I can't really test any more, I thought I had a code regression but no it's BQ making me wait.

This code can take multiple columns and runs samples at 10% at a time. It turns out that this actually seems to improve stability quite a bit; i.e. 10 colums at 10% would finish and 1 column at 100% would not. I used mostly the absolutely most populous columns I could. I think this is a real path towards being able to do 100% of histograms.

results = glam_style_histogram(
    [
        "wr_renderer_time",
        "dns_native_queuing",
        "gc_ms",
        "ssl_time_until_ready",
        "dns_native_lookup_time",
        "cycle_collector_max_pause",
        "http_kbread_per_conn2",
        "gc_pretenure_count_2",
        "network_cache_v2_miss_time_ms",
        "input_event_response_ms",
    ],
    False,
    "2023-01-08",
    limit=None,
    batch_size=None,
    table="mozdata.telemetry.main",
)

edugfilho · 2023-01-20T23:07:59Z

neat. How long did the run that finished take?

perrymcmanis144 · 2023-01-23T14:48:00Z

neat. How long did the run that finished take?

60 minutes on my laptop; unfortunately batching has some weird interactions because of needing to pull data down, I think that it should be much faster on a VM if you don't have the same queuing issue I appear to still be suffering from.

For example, if we return to our small histogram wr_renderer_time, with 10% repeat sampling (so, 10 20 ... 100%) it takes 6 minutes, though this appears to be mostly waiting for BQ to get through the queue to actually start doing stuff as no sampling is also taking much longer than it should despite the post query section running at normal speed. Testing suggests that 10% sampling may increase runtime significantly anyway, though.

I am going to put sampling to 20%, would you be able to run with this pct branch?

Increasing that rate shows a noticeable improvement, probably due to less queuing, I think your VM should be able to handle 20%. I was able to churn all 10 columns at 100%, it just took a long time. But no OOM which I was hitting before and I know you hit without sampling. Ideally we push this number even higher, or we increase the number of column we process at a single time.

perrymcmanis144 · 2023-01-23T14:55:43Z

Also would it be possible to test on one of the memory optimized vms? It looks like google has some options that are close to a TB of memory for a similar spot price (e.g. m1-ultramem-40) and I'd very much like to know if this is really an OOM problem or if we are, for example, exceeding some max size such that it would never finish irrespective of how much memory the device has.

edugfilho · 2023-01-25T18:57:00Z

Just updating this thread: I've been testing on memory-optimized VMs but found a regression in performance, apparently due to google-cloud-bigquery python lib version. Force updating that library normally fixes that issue but somehow with the environment on the memory-optimized VM that isn't happening. In the meantime, on another VM I could update the library and fix the performance issue but all of the sudden I started having issues with permissions to the source table.
I will update this with a doc with execution times as soon as I am able to get a stable environment to test.

perrymcmanis144 added 7 commits January 19, 2023 11:14

small improvement to glam functions

f297825

added ability to run multiple columns at a time

26b45b4

Merge branch 'main' into pct

7ef301c

implemented batching for large histograms

bdb4286

renamed sample to more logical batch

9b79776

renamed sample to more logical batch

8b81731

renamed sample to more logical batch

8ee5b1b

put it back to sample rate

e01cd12

perrymcmanis144 added 2 commits January 23, 2023 09:48

increased sampling rate to 20 percent

e300f8a

increased sampling rate to 20 percent

fdbaca0

perrymcmanis144 added 2 commits January 23, 2023 12:56

added debugging print statements

0d9e51a

don't move data around in memory unecessarily

6876e82

perrymcmanis144 added 4 commits January 25, 2023 16:49

re-added concurrency

cccce39

changed organization a bit

a253aac

improve concurrency a bit

e5a5cf9

concurrency improvements

cdc20f8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding batching and multi-histogram support#12

Adding batching and multi-histogram support#12
perrymcmanis144 wants to merge 16 commits into
mainfrom
pct

perrymcmanis144 commented Jan 20, 2023

Uh oh!

edugfilho commented Jan 20, 2023

Uh oh!

perrymcmanis144 commented Jan 23, 2023 •

edited

Loading

Uh oh!

perrymcmanis144 commented Jan 23, 2023

Uh oh!

edugfilho commented Jan 25, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

perrymcmanis144 commented Jan 20, 2023

Uh oh!

edugfilho commented Jan 20, 2023

Uh oh!

perrymcmanis144 commented Jan 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

perrymcmanis144 commented Jan 23, 2023

Uh oh!

edugfilho commented Jan 25, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

perrymcmanis144 commented Jan 23, 2023 •

edited

Loading