Issue with large dataset

I want to run BO on a large dataset. For this reason I am using SVGP and minibatching, however it still fails with EI acquisition. A minimal example:

```
import gpflow
from gpflowopt.acquisition import ExpectedImprovement
import numpy as np
n = 200000
d = 50
X_train = np.random.randn(n,d)
y_train = np.random.randn(n,1)
sgp = gpflow.svgp.SVGP(X_train, y_train, gpflow.kernels.Matern52(1), gpflow.likelihoods.Gaussian(), X_train[:500,:].copy(), minibatch_size=500)
ei = ExpectedImprovement(sgp)
```

This gives error on the last line:
> InvalidArgumentError: Input to reshape is a tensor with 40000000000 values, but the requested shape has 1345294336
	 [[node Reshape_7 (defined at /home/puncochar/miniconda3/envs/mgr-work/lib/python3.6/site-packages/gpflowopt/transforms.py:138) ]]

basically an integer overflow.

The error points to line 138 in `transforms.py`, where there is `Yvar = tf.reshape(Yvar, [N * N, D])`.

I figured this is unnecesary and by passing `full_cov=False` from `build_predict(...)` in `scaling.py` we can simplify the variance linear transformation for diagonal matrices without constructing the full matrix.
```
L = tf.cholesky(tf.square(tf.transpose(self.A)))
XT = tf.cholesky_solve(L, tf.transpose(Yvar))
return tf.transpose(XT)
```

After doing this, the code above works and I am subsequently able to run optimization without any errors.

If you like I can make a PR fixing it, because more people could run into this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Issue with large dataset #119

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue with large dataset #119

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions