This repository contains Python and R code to transform and analyze harmonic progressions in the McGill Billboard dataset (BB), including:
assembleClusterResultsTable.Rtakes the output of each cluster solution and merges them with individual song metadata, producing a single table containing song metadata and the cluster assignment from each solution (1-15 clusters) insong_metadata_and_clusters.csv.billboard-2.0-index.csvis an index of all songs in the corpus.chord_by_chord.csvcontains a tidy table of all chords in the corpus, in order within each song, with original chord data from BB and key-oriented harmonic function notation fromparse.py.cluster_summary_tables.mdcontains a list of markdown-formatted tables of normalized average transitional probability values for each cluster (output ofnormalize_tables.py).clusters_to_tables.Rtakes the transition probability analysis for each song in a cluster (from/cluster_tables/) and creates a table of average probabilities for each cluster in each solution (in/cluster_summaries/).kmeans_cluster.pyperforms a K-means cluster analysis on that output, for cardinalities of 1 to 15. Script contains a seed value so you can reproduce our results exactly. Remove seed value to obtain slightly different (random) results.normalize_tables.pynormalizes transitional probability averages for each cluster summary so that rows sum to 1, then writes tables in markdown format tocluster_summary_tables.md.parse.pyparses BB and transforms the absolute chord notation into key-oriented, functional notation (Roman numerals, identifying the chord root in relation to the tonic pitch of the key, with chord quality removed).readdata.pydefines file reading and parsing functions used bytransitionprob.py.solutions_to_tables.pymerges song transition probability data with cluster analysis results, and outputs a table for each cluster containing transitional probability data for each song in that cluster to/cluster_tables/.song_metadata_and_clusters.csvis a table containing song metadata and cluster assignments for each solution (1-15 clusters).song_metadata_and_cluster_names.csvis a table containing song metadata and cluster names ("authentic", "plagal", "doo-wop", "blues", etc.) for each solution (1-15 clusters).song_metadata.csvcontains metadata for each song, extracted from BB source files.songbysongtransprob.csvcontains transitional probability analyses for each song in the corpus.transitionprob.pycalculates the probability of occurrence of chord-to-chord transitions in each song and outputs a table with results for each song in BB.visualizations.Rloads cluster-average transitional probability data for each cluster from/cluster_summaries/and generates a visualization of that table in/plots/.
BB data can be downloaded from the McGill Billboard dataset website directly. Outputs of all parsing, machine learning, and analysis scripts are already contained in this repository.
To run these scripts yourself, download BB, place the data in the same root folder as these scripts, and move billboard-2.0-index.csv to the BB data folder. Then run the scripts in the following order:
parse.pyto parse and transform the data into key-oriented chord informationtransitionprob.pyto analyze the chord-to-chord transitional probabilities for each songkmeans_cluster.pyto run the cluster analysis algorithmassembleClusterResultsTable.Rto create a list of songs, metadata, and cluster assignments for each testsolutions_to_tables.pyto assemble a table of song data for each cluster in each solutionclusters_to_tables.Rto generate cluster-wide average probability tables for each cluster in each solutionnormalize_tables.pyto normalize tables and write to markdownvisualizations.Rto generate visualizations