Skip to content

Issue with large dataset #119

@PuncocharM

Description

@PuncocharM

I want to run BO on a large dataset. For this reason I am using SVGP and minibatching, however it still fails with EI acquisition. A minimal example:

import gpflow
from gpflowopt.acquisition import ExpectedImprovement
import numpy as np
n = 200000
d = 50
X_train = np.random.randn(n,d)
y_train = np.random.randn(n,1)
sgp = gpflow.svgp.SVGP(X_train, y_train, gpflow.kernels.Matern52(1), gpflow.likelihoods.Gaussian(), X_train[:500,:].copy(), minibatch_size=500)
ei = ExpectedImprovement(sgp)

This gives error on the last line:

InvalidArgumentError: Input to reshape is a tensor with 40000000000 values, but the requested shape has 1345294336
[[node Reshape_7 (defined at /home/puncochar/miniconda3/envs/mgr-work/lib/python3.6/site-packages/gpflowopt/transforms.py:138) ]]

basically an integer overflow.

The error points to line 138 in transforms.py, where there is Yvar = tf.reshape(Yvar, [N * N, D]).

I figured this is unnecesary and by passing full_cov=False from build_predict(...) in scaling.py we can simplify the variance linear transformation for diagonal matrices without constructing the full matrix.

L = tf.cholesky(tf.square(tf.transpose(self.A)))
XT = tf.cholesky_solve(L, tf.transpose(Yvar))
return tf.transpose(XT)

After doing this, the code above works and I am subsequently able to run optimization without any errors.

If you like I can make a PR fixing it, because more people could run into this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions