Skip to content

Workloads

Maycon Viana Bordin edited this page Mar 24, 2022 · 3 revisions

Applications and Datasets

Application Name Prefix Sample Data Download Dataset
ads-analytics ad ad-clicks.dat download KDD Cup 2012 (12GB)
bargain-index bi stocks.csv download Kaggle Stock Market Dataset of NASDAQ (3GB), Yahoo Finance, Google Finance
click-analytics ca click-stream.json 1998 WorldCup (104GB)
fraud-detection fd credit-card.dat
linear-road lr
log-processing lp http-server.log 1998 WorldCup (104GB)
machine-outlier mo cluster-traces.csv download Google Cluster Traces (36GB), Alibaba Machine Usage (8.4GB)
reinforcement-learner rl
sentiment-analysis sa Twitter Streaming
spam-filter sf enron.json download TREC 2007 (547MB, labeled)
SPAM Archive (~1.2GB, spam)
Enron Email Dataset (2.6GB, raw)
Enron Spam Dataset (50MB, labeled)
spike-detection sd sensors.dat Intel Berkeley Research Lab (150MB)
traffic-monitoring tm taxi-traces.csv Beijing Taxi Traces
Shapefile from: https://download.bbbike.org/osm/bbbike/
trending-topics tt Twitter Streaming
voipstream vs
word-count wc books.dat download Project Gutenberg (~8GB)
smart-grid sg smart-grid.csv download [DEBS 2014 Grand Challenge] (3.2GB)

Clone this wiki locally