Changelogs » Gpytorch

PyUp Safety actively tracks 295,363 Python packages for vulnerabilities and notifies you when to upgrade.



This release includes many major speed improvements, especially to Kronecker-factorized multi-output models.
  Performance improvements
  - Major speed improvements for Kronecker product multitask models (1355, 1430, 1440, 1469, 1477)
  - Unwhitened VI speed improvements (1487)
  - SGPR speed improvements (1493)
  - Large scale exact GP speed improvements (1495)
  - Random Fourier feature speed improvements (1446, 1493)
  New Features
  - Dirichlet Classification likelihood (1484) - based on Milios et al. (NeurIPS 2018)
  - MultivariateNormal objects have a `base_sample_shape` attribute for low-rank/degenerate distributions (1502)
  New documentation
  - Tutorial for designing your own kernels (1421)
  Debugging utilities
  - Better naming conventions for AdditiveKernel and ProductKernel (1488)
  - `gpytorch.settings.verbose_linalg` context manager for seeing what linalg routines are run (1489)
  - Unit test improvements (1430, 1437)
  Bug Fixes
  - `inverse_transform` is applied to the initial values of constraints (1482)
  - `psd_safe_cholesky` obeys cholesky_jitter settings (1476)
  - fix scaling issue with priors on variational models (1485)
  Breaking changes
  - `MultitaskGaussianLikelihoodKronecker` (deprecated) is fully incorporated in `MultitaskGaussianLikelihood` (1471)


  - Spectral mixture kernels work with SKI (1392)
  - Natural gradient descent is compatible with batch-mode GPs (1416)
  - Fix prior mean in whitened SVGP (1427)
  - RBFKernelGrad has no more in-place operations (1389)
  - Fixes to ConstantDiagLazyTensor (1381, 1385)
  - Include example notebook for multitask Deep GPs (1410)
  - Documentation updates (1408, 1434, 1385, 1393)
  - KroneckerProductLazyTensors use root decompositions of children (1394)
  - SGPR now uses Woodbury formula and matrix determinant lemma (1356)
  - Delta distributions have an `arg_constraints` attribute (1422)
  - Cholesky factorization now takes optional diagonal noise argument (1377)


This release primarily focuses on performance improvements, and adds contour integral quadrature based variational models.
  Major Features
  Variational models with contour integral quadrature
  - Add an MVM-based approach to whitened variatiational inference (1372)
  - This is based on the work in [Fast Matrix Square Roots with Applications to Gaussian Processes and Bayesian Optimization](
  Minor Features
  Performance improvements
  - Kronecker product models compute a deterministic logdet (faster than the Lanczos-based logdet) (1332)
  - Improve efficiency of `KroneckerProductLazyTensor` symeig method (1338)
  - Improve SGPR efficiency (1356)
  Other improvements
  - `SpectralMixtureKernel` accepts arbitrary batch shapes (1350)
  - Variational models pass around arbitrary `**kwargs` to the `forward` method (1339)
  - `gpytorch.settings` context managers keep track of their default value (1347)
  - Kernel objects can be pickle-d (1336)
  Bug Fixes
  - Fix `requires_grad` checks in `gpytorch.inv_matmul` (1322)
  - Fix reshaping bug for batch independent multi-output GPs (1368)
  - `ZeroMean` accepts a `batch_shape` argument (1371)
  - Various doc fixes/improvements (1327, 1343, 1315, 1373)


This release includes the following fixes:
  - Fix caching issues with variational GPs (1274, 1311)
  - Ensure that constraint bounds are properly cast to floating point types (1307)
  - Fix bug with broadcasting multitask multivariate normal shapes (1312)
  - Bypass KeOps for small/rectangular kernels (1319)
  - Fix issues with `eigenvectors=False` in LazyTensorsymeig (1283)
  - Fix issues with fixed-noise LazyTensor preconditioner (1299)
  - Doc fixes (1275, 1301)


Major Features
  New variational and approximate models
  This release features a number of new and added features for approximate GP models:
  - Linear model of coregionalization for variational multitask GPs (1180)
  - Deep Sigma Point Process models (1193)
  - Mean-field decoupled (MFD) models from "Parametric Gaussian Process Regressors" (Jankowiak et al., 2020) (1179)
  - Implement natural gradient descent (1258)
  - Additional non-conjugate likelihoods (Beta, StudentT, Laplace) (1211)
  New kernels
  We have just added a number of new specialty kernels:
  - `gpytorch.kernels.GaussianSymmetrizedKLKernel` for performing regression with uncertain inputs (1186)
  - `gpytorch.kernels.RFFKernel` (random Fourier features kernel) (1172, 1233)
  - `gpytorch.kernels.SpectralDeltaKernel` (a parametric kernel for patterns/extrapolation) (1231)
  More scalable sampling
  - Large-scale sampling with contour integral quadrature from Pleiss et al., 2020 (1194)
  Minor features
  - Ability to set amount of jitter added when performing Cholesky factorizations (1136)
  - Improve scalability of KroneckerProductLazyTensor (1199, 1208)
  - Improve speed of preconditioner (1224)
  - Add symeig and svd methods to LazyTensors (1105)
  - Add TriangularLazyTensor for Cholesky methods (1102)
  Bug fixes
  - Fix initialization code for `gpytorch.kernels.SpectralMixtureKernel` (1171)
  - Fix bugs with LazyTensor addition (1174)
  - Fix issue with loading smoothed box priors (1195)
  - Throw warning when variances are not positive, check for valid correlation matrices (1237, 1241, 1245)
  - Fix sampling issues with Pyro integration (1238)


Major features
  - GPyTorch is compatible with PyTorch 1.5 (latest release)
  - Several bugs with task-independent multitask models are fixed (1110)
  - Task-dependent multitask models are more batch-mode compatible (1087, 1089, 1095)
  Minor features
  - `gpytorch.priors.MultivariateNormalPrior` has an expand method (1018)
  - Better broadcasting for batched inducing point models (1047)
  - `LazyTensor` repeating works with rectangular matrices (1068)
  - `gpytorch.kernels.ScaleKernel` inherits the `active_dims` property from its base kernel (1072)
  - Fully-bayesian models can be saved (1076)
  Bug Fixes
  - `gpytorch.kernels.PeriodicKernel` is batch-mode compatible (1012)
  - Fix `gpytorch.priors.MultivariateNormalPrior` expand method (1018)
  - Fix indexing issues with `LazyTensors` (1029)
  - Fix constants with `gpytorch.mlls.GammaRobustVariationalELBO` (1038, 1053)
  - Prevent doubly-computing derivatives of kernel inputs (1042)
  - Fix initialization issues with `gpytorch.kernels.SpectralMixtureKernel` (1052)
  - Fix stability of `gpytorch.variational.DeltaVariationalStrategy`


Major New Features and Improvements
  Each feature in this section comes with a new example notebook and documentation for how to use them -- check the new docs!
  - Added support for deep gaussian processes (564).
  - KeOps integration has been added -- replace certain `gpytorch.kernels.SomeKernel` with `gpytorch.kernels.keops.SomeKernel` with KeOps installed, and run exact GPs on 100000+ data points (812).
  - Variational inference has undergone significant internal refactoring! All old variational objects should still function, but many are deprecated. (903).
  - Our integration with Pyro has been completely overhauled and is now much improved. For examples of interesting GP + Pyro models, see our new examples (903).
  - Our example notebooks have been completely reorganized, and our documentation surrounding them has been rewritten to hopefully provide a better tutorial to GPyTorch (954).
  - Added support for fully Bayesian GP modelling via NUTS (918).
  Minor New Features and Improvements
  - `GridKernel` and `GridInterpolationKernel` now support rectangular grids (888).
  - Added cylindrical kernel (577).
  - Added polynomial kernel (668).
  - Added tutorials on basic usage (hyperparameters, saving/loading, etc) (685).
  - `get_fantasy_model` now supports batched models (693).
  - Added a `prior_mode` context manager that causes GP models to evaluate in prior mode (707).
  - Added linear mean (676).
  - Added horseshoe prior (719).
  - Added polynomial kernel with derivatives (783).
  - Fantasy model computations now use QR for solving least squares problems, improving numerical stability (790).
  - All legacy functions have been removed, in favor of new function format in PyTorch (799).
  - Added Newton Girard kernel (821).
  - GP predictions now automatically clear caches when backpropagating through them. Previously, if you wanted to train through a GP in eval mode, you had to clear the caches manually by toggling the GP back to train mode and then to eval mode again. This is no longer necessary (916).
  - Added rational quadratic kernel (330)
  - Switch to using `torch.cholesky_solve` and `torch.logdet` now that they support batch mode / backwards (880)
  - Better / less redundant parameterization for correlation matrices e.g. in `IndexKernel` (912).
  - Kernels now define `__getitem__`, which allows slicing batch dimensions (782).
  - Performance improvements in the small data regime, e.g. n < 2000 (926).
  - Increased the size of kernel matrix for which Cholesky is the default solve strategy to n=800 (946).
  - Added an option for manually specifying a different preconditioner for `AddedDiagLazyTensor` (930).
  - Added precommit hooks that enforce code style (927).
  - Lengthscales have been refactored, and kernels have an `is_stationary` attribute (925).
  - All of our example notebooks now get smoke tested by our CI.
  - Added a `deterministic_probes` setting that causes our MLL computation to be fully deterministic when using CG+Lanczos, which improves L-BFGS convergence (929).
  - The use of the Woodbury formula for preconditioner computations is now fully replaced by QR, which improves numerical stability (968).
  Bug fixes
  - Fix a type error when calling `backward ` on `gpytorch.functions.logdet` (711).
  - Variational models now properly skip posterior variance calculations if the `skip_posterior_variances` context is active (741).
  - Fixed an issue with `diag` mode for `PeriodicKernel` (761).
  - Stability improvements for `inv_softplus` and `inv_sigmoid` (776).
  - Fix incorrect size handling in `InterpolatedLazyTensor` for rectangular matrices (906)
  - Fix indexing in `IndexKernel` for batch mode (911).
  - Fixed an issue where slicing batch mode lazy covariance matrices resulted in incorrect behavior (782).
  - Cholesky gives a better error when there are NaNs (944).
  - Use `psd_safe_cholesky` in prediction strategies rather than `torch.cholesky` (956).
  - An error is now raised if Cholesky is used with KeOps, which is not supported (959).
  - Fixed a bug where NaNs could occur during interpoilation (971).
  - Fix MLL computation for heteroskedastic noise models (870).


A full list of bug fixes and features will be out with the 0.4 release.


This release addresses breaking changes in the recent PyTorch 1.2 release. Currently, GPyTorch will run on either PyTorch 1.1 or PyTorch 1.2.
  A full list of new features and bug fixes will be coming soon in a GPyTorch 0.4 release.




New Features
  - Implement kernel checkpointing, allowing exact GPs on up to 1M data points with multiple GPUs (499)
  - GPyTorch now supports hard parameter constraints (e.g. bounds) via the register_constraint method on `Module` (596)
  - All GPyTorch objects now support multiple batch dimensions. In addition to training `b` GPs simultaneously, you can now train a `b1 x b2` matrix of GPs simultaneously if you so choose (492, 589, 627)
  - `RBFKernelGrad` now supports ARD (602)
  - `FixedNoiseGaussianLikelihood` offers a better interface for dealing with known observation noise values. `WhiteNoiseKernel` is now hard deprecated (593)
  - `InvMatmul`, `InvQuadLogDet` and `InvQuad` are now twice differentiable (603)
  - `Likelihood` has been redesigned. See the new documentation for details if you are creating custom likelihoods (591)
  - Better support for more flexible Pyro models. You can now define likelihoods of the form `p(y|f, z)` where `f` is a GP and `z` are arbitrary latent variables learned by Pyro (591).
  - Parameters can now be recursively initialized with full names, e.g. `model.initialize(**{"covar_module.base_kernel.lengthscale": 1., "covar_module.outputscale": 1.})` (484)
  - Added `ModelList` and `LikelihoodList` for training multiple GPs when batch mode can't be used -- see example notebooks (471)
  Performance and stability improvements
  - CG termination is now more tolerance based, and will much more rarely terminate without returning good solves. Furthermore, a warning is raised if it ever does that includes suggested courses of action. (569)
  - In non-ARD mode, RBFKernel and MaternKernel use custom backward implementations for performance (517)
  - Up to a 3x performance improvement in the regime where the test set is very small (615)
  - The noise parameter in `GaussianLikelihood` now has a default lower bound, similar to sklearn (596)
  - `psd_safe_cholesky` now adds successively increasing amounts of jitter rather than only once (610)
  - Variational inference initialization now uses `psd_safe_cholesky` rather than `torch.cholesky` to initialize with the prior (610)
  - The pivoted Cholesky preconditioner now uses a QR decomposition for its solve rather than the Woodbury formula for speed and stability (617)
  - GPyTorch now uses Cholesky for solves with very small matrices rather than CG, resulting in reduced overhead for that setting (586)
  - Cholesky can additionally be turned on manually for help debugging (586)
  - Kernel distance computations now use `torch.cdist` when on PyTorch 1.1.0 in the non-batch setting (642)
  - CUDA unit tests now default to using the least used available GPU when run (515)
  - `MultiDeviceKernel` is now much faster (491)
  Bug Fixes
  - Fixed an issue with variational covariances at test time (638)
  - Fixed an issue where the training covariance wasn't being detached for variance computations, occasionally resulting in backward errors (566)
  - Fixed an issue where `active_dims` in kernels was being applied twice (576)
  - Fixes and stability improvements for `MultiDeviceKernel` (560)
  - Fixed an issue where `fast_pred_var` was failing for single training inputs (574)
  - Fixed an issue when initializing parameter values with non-tensor values (630)
  - Fixed an issue with handling the preconditioner log determinant value for MLL computation (634)
  - Fixed an issue where `prior_dist` was being cached for VI, which was problematic for pyro models (599)
  - Fixed a number of issues with `LinearKernel`, including one where the variance could go negative (584)
  - Fixed a bug where training inputs couldn't be set with `set_train_data` if they are currently `None` (565)
  - Fixed a number of bugs in `MultitaskMultivariateNormal` (545, 553)
  - Fixed an indexing bug in `batch_symeig` (547)
  - Fixed an issue where `MultitaskMultivariateNormal` wasn't interleaving rows correctly (540)
  - GPyTorch is now fully Python 3.6, and we've begun to include static type hints (581)
  - Parameters in GPyTorch no longer have default singleton batch dimensions. For example, the default shape of `lengthscale` is now `torch.Size([1])` rather than `torch.Size([1, 1])` (605)
  - `` now includes optional dependents, reads requirements from `requirements.txt`, does not require `torch` if `pytorch-nightly` is installed (495)


You can install GPyTorch via Anaconda (463)
  Speed and stability
  - Kernel distances use the JIT for fast computations (464)
  - LinearCG uses the JIT for fast computations (464)
  - Improve the stability of computing kernel distances (455)
  Variational inference improvements
  - Sped up variational models by batching all matrix solves in one call (454)
  - Can use the same set of inducing points for batch variational GPs (445)
  - Whitened variational inference for improved convergence (493)
  - Variational log likelihoods for BernoulliLikelihood are computed with quadrature (473)
  Multi-GPU Gaussian processes
  - Can train and test GPs by dividing the kernel onto multiple GPUs (450)
  GPs with derivatives
  - Can define RBFKernels for observations and their derivatives (462)
  - LazyTensors can broadcast matrix multiplication (459)
  - Can use `` sign for matrix multiplication with LazyTensors
  - Convenience methods for training/testing multiple GPs in a list (471)
  - Added a `gpytorch.settings.fast_computations` feature to (optionally) use Cholesky-based inference (456)
  - Distributions define event shapes (469)
  - Can recursively initialize parameters on GP modules (484)
  - Can initialize `noise` in GaussianLikelihood (479)
  - Fixed bugs in SGPR kernel (487)


  - Batch GPs, which previously were a feature, are now well-documented and much more stable [(see docs)](
  - Can add "fantasy observations" to models.
  - Option for exact marginal log likelihood and sampling computations (this is slower, but potentially useful for debugging) (`gpytorch.settings.fast_computations`)
  Bug fixes
  - Easier usage of batch GPs
  - Reduce bugs in [additive regression models](




Stability of hyperparameters
  - Hyperparameters taht are constrained to be positive (e.g. variance, lengthscale, etc.) are now parameterized throught the softplus function (`log(1 + e^x)`) rather than through the log function
  - This dramatically improves the numerical stability and optimization of hyperparameters
  - Old models that were trained with `log` parameters will still work, but this is deprecated.
  - Inference now handles certain numerical floating point round-off errors more gracefully.
  Various stability improvements to variational inference
  Other changes
  - `GridKernel` can be used for data that lies on a perfect grid.
  - New preconditioner for LazyTensors.
  - Use batched cholesky functions for improved performance (requires updating PyTorch)


New features
  - Implement diagonal correction for basic variational inference, improving predictive variance estimates. This is on by default.
  - `LazyTensor._quad_form_derivative` now has a default implementation! While custom implementations are likely to still be faster in many cases, this means that it is no longer required to implement a custom `_quad_form_derivative` when implementing a new `LazyTensor` subclass.
  Bug fixes
  - Fix a number of critical bugs for the new variational inference.
  - Do some hyperparameter tuning for the SV-DKL example notebook, and include fancier NN features like batch normalization.
  - Made it more likely that operations internally preserve the ability to perform preconditioning for linear solves and log determinants. This may have a positive impact on model performance in some cases.


Variational inference has been refactored
  - Easier to experiment with different variational approximations
  - Massive performance improvement for [SV-DKL](
  Experimental Pyro integration for variational inference
  - See the [example Pyro notebooks](
  Lots of tiny bug fixes
  (Too many to name, but everything should be better 😬)


  Alpha release
  We strongly encourage you to check out our beta release for lots of improvements!
  However, if you still need an old version, or need to use PyTorch 0.4, you can install this release.


Beta release
  GPyTorch is now available on pip! `pip install gpytorch`.
  **Important!** This release requires the preview build of PyTorch (>= 1.0). You should either build from source or install **pytorch-nightly**. See [the PyTorch docs]( for specific installation instructions.
  If you were previously using GPyTorch, see [the migration guide]( to help you move over.
  What's new
  - Batch mode: it is possible to train multiple GPs simultaneously
  - Improved multitask models
  Breaking changes
  - `gpytorch.random_variables` have been replaced by `gpytorch.distributions`. These build upon PyTorch distributions.
  - `gpytorch.random_variables.GaussianRandomVariable` -> `gpytorch.distributions.MultivariateNormal`.
  - `gpytorch.random_variables.MultitaskGaussianRandomVariable` -> `gpytorch.distributions.MultitaskMultivariateNormal`.
  - `gpytorch.utils.scale_to_bounds` is now `gpytorch.utils.grid.scale_to_bounds`
  - `GridInterpolationKernel`, `GridKernel`, `InducingPointKernel` - the attribute `base_kernel_module` has become `base_kernel` (for consistency)
  - `AdditiveGridInterpolationKernel` no longer exists. Now use `AdditiveStructureKernel(GridInterpolationKernel(...))
  - `MultiplicativeGridInterpolationKernel no longer exists. Now use `ProductStructureKernel(GridInterpolationKernel(...))`.
  Attributes (`n_*` -> `num_*`)
  - IndexKernel: n_tasks -> num_tasks
  - LCMKernel: n_tasks -> num_tasks
  - MultitaskKernel: n_tasks -> num_tasks
  - MultitaskGaussianLikelihood: n_tasks -> num_tasks
  - SoftmaxLikelihood: n_features -> num_features
  - MultitaskMean: n_tasks -> num_tasks
  - VariationalMarginalLogLikelihood: n_data -> num_data
  - SpectralMixtureKernel: n_dimensions -> ard_num_dims, n_mixtures -> num_mixtures