-
Notifications
You must be signed in to change notification settings - Fork 3
Expand file tree
/
Copy pathniw.html
More file actions
314 lines (285 loc) · 16.5 KB
/
niw.html
File metadata and controls
314 lines (285 loc) · 16.5 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Real Valued Data and the Normal Inverse-Wishart Distribution</title>
<link rel="stylesheet" href="_static/basic.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="_static/bootswatch-3.3.4/lumen/bootstrap.min.css" type="text/css" />
<link rel="stylesheet" href="_static/bootstrap-sphinx.css" type="text/css" />
<script type="text/javascript">
var DOCUMENTATION_OPTIONS = {
URL_ROOT: './',
VERSION: '0.1.0',
COLLAPSE_INDEX: false,
FILE_SUFFIX: '.html',
HAS_SOURCE: true
};
</script>
<script type="text/javascript" src="_static/jquery.js"></script>
<script type="text/javascript" src="_static/underscore.js"></script>
<script type="text/javascript" src="_static/doctools.js"></script>
<script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>
<script type="text/javascript" src="_static/js/jquery-1.11.0.min.js"></script>
<script type="text/javascript" src="_static/js/jquery-fix.js"></script>
<script type="text/javascript" src="_static/bootstrap-3.3.4/js/bootstrap.min.js"></script>
<script type="text/javascript" src="_static/bootstrap-sphinx.js"></script>
<link rel="top" title="None" href="index.html" />
<link rel="up" title="Tutorials" href="docs.html" />
<link rel="next" title="Univariate Data with the Normal Inverse Chi-Square Distribution" href="nic.html" />
<link rel="prev" title="Categorical Data and the Dirichlet Discrete Distribution" href="dd.html" />
<meta charset='utf-8'>
<meta http-equiv='X-UA-Compatible' content='IE=edge,chrome=1'>
<meta name='viewport' content='width=device-width, initial-scale=1.0, maximum-scale=1'>
<meta name="apple-mobile-web-app-capable" content="yes">
</head>
<body role="document">
<div id="navbar" class="navbar navbar-inverse navbar-default navbar-fixed-top">
<div class="container">
<div class="navbar-header">
<!-- .btn-navbar is used as the toggle for collapsed navbar content -->
<button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".nav-collapse">
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="navbar-brand" href="index.html">
datamicroscopes</a>
<span class="navbar-text navbar-version pull-left"><b>0.1</b></span>
</div>
<div class="collapse navbar-collapse nav-collapse">
<ul class="nav navbar-nav">
<li><a href="https://github.com/datamicroscopes">GitHub</a></li>
<li><a href="https://qadium.com/">Qadium</a></li>
<li class="dropdown globaltoc-container">
<a role="button"
id="dLabelGlobalToc"
data-toggle="dropdown"
data-target="#"
href="index.html">Site <b class="caret"></b></a>
<ul class="dropdown-menu globaltoc"
role="menu"
aria-labelledby="dLabelGlobalToc"><ul>
<li class="toctree-l1"><a class="reference internal" href="intro.html">Discovering structure in your data: an overview of clustering</a></li>
<li class="toctree-l1"><a class="reference internal" href="ncluster.html">Finding the number of clusters with the Dirichlet Process</a></li>
<li class="toctree-l1"><a class="reference internal" href="enron_blog.html">Network Modeling with the Infinite Relational Model</a></li>
<li class="toctree-l1"><a class="reference internal" href="topic.html">Bayesian Nonparametric Topic Modeling with the Daily Kos</a></li>
</ul>
<ul class="current">
<li class="toctree-l1"><a class="reference internal" href="datatypes.html">Datatypes and Bayesian Nonparametric Models</a></li>
<li class="toctree-l1"><a class="reference internal" href="bb.html">Binary Data with the Beta Bernouli Distribution</a></li>
<li class="toctree-l1"><a class="reference internal" href="dd.html">Categorical Data and the Dirichlet Discrete Distribution</a></li>
<li class="toctree-l1 current"><a class="current reference internal" href="">Real Valued Data and the Normal Inverse-Wishart Distribution</a></li>
<li class="toctree-l1"><a class="reference internal" href="nic.html">Univariate Data with the Normal Inverse Chi-Square Distribution</a></li>
<li class="toctree-l1"><a class="reference internal" href="gamma_poisson.html">Count Data and Ordinal Data with the Gamma-Poisson Distribution</a></li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="gauss2d.html">Inferring Gaussians with the Dirichlet Process Mixture Model</a></li>
<li class="toctree-l1"><a class="reference internal" href="mnist_predictions.html">Digit recognition with the MNIST dataset</a></li>
<li class="toctree-l1"><a class="reference internal" href="enron_email.html">Clustering the Enron e-mail corpus using the Infinite Relational Model</a></li>
<li class="toctree-l1"><a class="reference internal" href="hdp.html">Learning Topics in The Daily Kos with the Hierarchical Dirichlet Process</a></li>
</ul>
<ul class="current">
<li class="toctree-l1"><a class="reference internal" href="docs.html">Tutorials</a><ul>
<li class="toctree-l2"><a class="reference internal" href="intro.html">Discovering structure in your data: an overview of clustering</a></li>
<li class="toctree-l2"><a class="reference internal" href="ncluster.html">Finding the number of clusters with the Dirichlet Process</a></li>
<li class="toctree-l2"><a class="reference internal" href="enron_blog.html">Network Modeling with the Infinite Relational Model</a></li>
<li class="toctree-l2"><a class="reference internal" href="topic.html">Bayesian Nonparametric Topic Modeling with the Daily Kos</a></li>
</ul>
</li>
<li class="toctree-l1 current"><a class="reference internal" href="docs.html#datatypes-and-likelihood-models-in-datamicroscopes">Datatypes and likelihood models in datamicroscopes</a><ul class="current">
<li class="toctree-l2"><a class="reference internal" href="datatypes.html">Datatypes and Bayesian Nonparametric Models</a></li>
<li class="toctree-l2"><a class="reference internal" href="bb.html">Binary Data with the Beta Bernouli Distribution</a></li>
<li class="toctree-l2"><a class="reference internal" href="dd.html">Categorical Data and the Dirichlet Discrete Distribution</a></li>
<li class="toctree-l2 current"><a class="current reference internal" href="">Real Valued Data and the Normal Inverse-Wishart Distribution</a></li>
<li class="toctree-l2"><a class="reference internal" href="nic.html">Univariate Data with the Normal Inverse Chi-Square Distribution</a></li>
<li class="toctree-l2"><a class="reference internal" href="gamma_poisson.html">Count Data and Ordinal Data with the Gamma-Poisson Distribution</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="docs.html#examples">Examples</a><ul>
<li class="toctree-l2"><a class="reference internal" href="gauss2d.html">Inferring Gaussians with the Dirichlet Process Mixture Model</a></li>
<li class="toctree-l2"><a class="reference internal" href="mnist_predictions.html">Digit recognition with the MNIST dataset</a></li>
<li class="toctree-l2"><a class="reference internal" href="enron_email.html">Clustering the Enron e-mail corpus using the Infinite Relational Model</a></li>
<li class="toctree-l2"><a class="reference internal" href="hdp.html">Learning Topics in The Daily Kos with the Hierarchical Dirichlet Process</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="api.html">API Reference</a><ul>
<li class="toctree-l2"><a class="reference internal" href="microscopes.common.dataview.html">dataviews</a></li>
<li class="toctree-l2"><a class="reference internal" href="microscopes.common.util.html">util</a></li>
<li class="toctree-l2"><a class="reference internal" href="microscopes.common.random.html">microscopes.common.random</a></li>
<li class="toctree-l2"><a class="reference internal" href="microscopes.common.query.html">query</a></li>
<li class="toctree-l2"><a class="reference internal" href="microscopes.common.validator.html">microscopes.common.validator</a></li>
<li class="toctree-l2"><a class="reference internal" href="microscopes.kernels.parallel.html">parallel</a></li>
<li class="toctree-l2"><a class="reference internal" href="microscopes.mixture.html">mixturemodel</a></li>
<li class="toctree-l2"><a class="reference internal" href="microscopes.irm.html">irm</a></li>
<li class="toctree-l2"><a class="reference internal" href="microscopes.kernels.html">kernels</a></li>
<li class="toctree-l2"><a class="reference internal" href="api.html#indices-and-tables">Indices and tables</a></li>
</ul>
</li>
</ul>
</ul>
</li>
<li class="dropdown">
<a role="button"
id="dLabelLocalToc"
data-toggle="dropdown"
data-target="#"
href="#">Contents <b class="caret"></b></a>
<ul class="dropdown-menu localtoc"
role="menu"
aria-labelledby="dLabelLocalToc"><ul>
<li><a class="reference internal" href="#">Real Valued Data and the Normal Inverse-Wishart Distribution</a></li>
</ul>
</ul>
</li>
<li class="hidden-sm">
<div id="sourcelink">
<a href="_sources/niw.txt"
rel="nofollow">Source</a>
</div></li>
</ul>
<form class="navbar-form navbar-right" action="search.html" method="get">
<div class="form-group">
<input type="text" name="q" class="form-control" placeholder="Search" />
</div>
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
</div>
</div>
</div>
<div class="container">
<div class="row">
<div class="col-md-12">
<div class="section" id="real-valued-data-and-the-normal-inverse-wishart-distribution">
<h1>Real Valued Data and the Normal Inverse-Wishart Distribution<a class="headerlink" href="#real-valued-data-and-the-normal-inverse-wishart-distribution" title="Permalink to this headline">¶</a></h1>
<hr class="docutils" />
<p>One of the most common forms of data is real valued data</p>
<p>Let’s set up our environment and consider an example dataset</p>
<div class="code python highlight-python"><div class="highlight"><pre>import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
sns.set_context('talk')
sns.set_style('darkgrid')
</pre></div>
</div>
<p>The <a class="reference external" href="https://archive.ics.uci.edu/ml/datasets/Iris">Iris Flower
Dataset</a> is a standard
machine learning data set dating back to the 1930s. It contains
measurements from 150 flowers, 50 from each of the following species:</p>
<ul class="simple">
<li>Iris Setosa</li>
<li>Iris Versicolor</li>
<li>Iris Virginica</li>
</ul>
<div class="code python highlight-python"><div class="highlight"><pre><span class="n">iris</span> <span class="o">=</span> <span class="n">sns</span><span class="o">.</span><span class="n">load_dataset</span><span class="p">(</span><span class="s">'iris'</span><span class="p">)</span>
<span class="n">iris</span><span class="o">.</span><span class="n">head</span><span class="p">()</span>
</pre></div>
</div>
<div style="max-height:1000px;max-width:1500px;overflow:auto;">
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>sepal_length</th>
<th>sepal_width</th>
<th>petal_length</th>
<th>petal_width</th>
<th>species</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>5.1</td>
<td>3.5</td>
<td>1.4</td>
<td>0.2</td>
<td>setosa</td>
</tr>
<tr>
<th>1</th>
<td>4.9</td>
<td>3.0</td>
<td>1.4</td>
<td>0.2</td>
<td>setosa</td>
</tr>
<tr>
<th>2</th>
<td>4.7</td>
<td>3.2</td>
<td>1.3</td>
<td>0.2</td>
<td>setosa</td>
</tr>
<tr>
<th>3</th>
<td>4.6</td>
<td>3.1</td>
<td>1.5</td>
<td>0.2</td>
<td>setosa</td>
</tr>
<tr>
<th>4</th>
<td>5.0</td>
<td>3.6</td>
<td>1.4</td>
<td>0.2</td>
<td>setosa</td>
</tr>
</tbody>
</table>
</div><p>In the case of the <code class="docutils literal"><span class="pre">iris</span></code> dataset, plotting the data shows that
indiviudal species exhibit a typical range of measurements</p>
<div class="code python highlight-python"><div class="highlight"><pre><span class="n">irisplot</span> <span class="o">=</span> <span class="n">sns</span><span class="o">.</span><span class="n">pairplot</span><span class="p">(</span><span class="n">iris</span><span class="p">,</span> <span class="n">hue</span><span class="o">=</span><span class="s">"species"</span><span class="p">,</span> <span class="n">palette</span><span class="o">=</span><span class="s">'Set2'</span><span class="p">,</span> <span class="n">diag_kind</span><span class="o">=</span><span class="s">"kde"</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="mf">2.5</span><span class="p">)</span>
<span class="n">irisplot</span><span class="o">.</span><span class="n">fig</span><span class="o">.</span><span class="n">suptitle</span><span class="p">(</span><span class="s">'Scatter Plots and Kernel Density Estimate of Iris Data by Species'</span><span class="p">,</span> <span class="n">fontsize</span> <span class="o">=</span> <span class="mi">18</span><span class="p">)</span>
<span class="n">irisplot</span><span class="o">.</span><span class="n">fig</span><span class="o">.</span><span class="n">subplots_adjust</span><span class="p">(</span><span class="n">top</span><span class="o">=.</span><span class="mi">9</span><span class="p">)</span>
</pre></div>
</div>
<img alt="_images/normal-inverse-wishart_5_0.png" src="_images/normal-inverse-wishart_5_0.png" />
<p>If we wanted to learn these underlying species’ measurements, we would
use these real valued measurements and make assumptions about the
structure of the data.</p>
<p>In practice, real valued data is commonly assumed to be distributed
normally, or Gaussian</p>
<p>We could assume that conditioned on <code class="docutils literal"><span class="pre">species</span></code>, the measurement data
follwed a multivariate normal</p>
<div class="math">
\[P(\mathbf{x}|species=s)\sim\mathcal{N}(\mu_{s},\Sigma_{s})\]</div>
<p>The normal inverse-Wishart distribution allows us to learn the
underlying parameters of each normal distribution, its mean
<span class="math">\(\mu_s\)</span> and its covariance <span class="math">\(\Sigma_s\)</span>. Since the normal
inverse-Wishart is the conjugate prior of the multivariate normal, the
posterior distribution of a multivariate normal with a normal
inverse-Wishart prior also follows a normal inverse-Wishart
distribution. This allows us to infer the distirbution over values of
<span class="math">\(\mu_s\)</span> and <span class="math">\(\Sigma_{s}\)</span> when we define our model.</p>
<p>Note that if we have only one real valued variable, the normal
inverse-Wishart distribution is often referred to as the normal
inverse-gamma distribution. In this case, we learn the scalar valued
mean <span class="math">\(\mu\)</span> and variance <span class="math">\(\sigma^2\)</span> for each inferred
cluster.</p>
<p>Univariate real data, however, should be modeled with our normal
invese-chi-squared distribution, which is optimized for infering
univariate parameters.</p>
<p>See <a class="reference external" href="http://www.cs.ubc.ca/~murphyk/Papers/bayesGauss.pdf">Murphy
2007</a> for
derrivations of our normal likelihood models</p>
<hr class="docutils" />
<p>To specify the joint distribution of a multivariate normal
inverse-Wishart distribution, we would import our likelihood model</p>
<div class="code python highlight-python"><div class="highlight"><pre><span class="kn">from</span> <span class="nn">microscopes.models</span> <span class="kn">import</span> <span class="n">niw</span> <span class="k">as</span> <span class="n">normal_inverse_wishart</span>
</pre></div>
</div>
</div>
</div>
</div>
</div>
<!-- your html code here -->
<center> Datamicroscopes is developed by <a href="http://www.qadium.com">Qadium</a>, with funding from the <a href="http://www.darpa.mil">DARPA</a> <a href="http://www.darpa.mil/program/xdata">XDATA</a> program. Copyright Qadium 2015. </center>
</body>
</html>