GameServerO

Sleeping

App Files Files Community

GameServerO / MLPY /Lib /site-packages /torch /distributions /__init__.py

Kano001

Upload 5252 files

c61ccee verified 12 months ago

raw

history blame

6.21 kB

	r"""
	The ``distributions`` package contains parameterizable probability distributions
	and sampling functions. This allows the construction of stochastic computation
	graphs and stochastic gradient estimators for optimization. This package
	generally follows the design of the `TensorFlow Distributions`_ package.

	.. _`TensorFlow Distributions`:
	https://arxiv.org/abs/1711.10604

	It is not possible to directly backpropagate through random samples. However,
	there are two main methods for creating surrogate functions that can be
	backpropagated through. These are the score function estimator/likelihood ratio
	estimator/REINFORCE and the pathwise derivative estimator. REINFORCE is commonly
	seen as the basis for policy gradient methods in reinforcement learning, and the
	pathwise derivative estimator is commonly seen in the reparameterization trick
	in variational autoencoders. Whilst the score function only requires the value
	of samples :math:`f(x)`, the pathwise derivative requires the derivative
	:math:`f'(x)`. The next sections discuss these two in a reinforcement learning
	example. For more details see
	`Gradient Estimation Using Stochastic Computation Graphs`_ .

	.. _`Gradient Estimation Using Stochastic Computation Graphs`:
	https://arxiv.org/abs/1506.05254

	Score function
	^^^^^^^^^^^^^^

	When the probability density function is differentiable with respect to its
	parameters, we only need :meth:`~torch.distributions.Distribution.sample` and
	:meth:`~torch.distributions.Distribution.log_prob` to implement REINFORCE:

	.. math::

	\Delta\theta = \alpha r \frac{\partial\log p(a\|\pi^\theta(s))}{\partial\theta}

	where :math:`\theta` are the parameters, :math:`\alpha` is the learning rate,
	:math:`r` is the reward and :math:`p(a\|\pi^\theta(s))` is the probability of
	taking action :math:`a` in state :math:`s` given policy :math:`\pi^\theta`.

	In practice we would sample an action from the output of a network, apply this
	action in an environment, and then use ``log_prob`` to construct an equivalent
	loss function. Note that we use a negative because optimizers use gradient
	descent, whilst the rule above assumes gradient ascent. With a categorical
	policy, the code for implementing REINFORCE would be as follows::

	probs = policy_network(state)
	# Note that this is equivalent to what used to be called multinomial
	m = Categorical(probs)
	action = m.sample()
	next_state, reward = env.step(action)
	loss = -m.log_prob(action) * reward
	loss.backward()

	Pathwise derivative
	^^^^^^^^^^^^^^^^^^^

	The other way to implement these stochastic/policy gradients would be to use the
	reparameterization trick from the
	:meth:`~torch.distributions.Distribution.rsample` method, where the
	parameterized random variable can be constructed via a parameterized
	deterministic function of a parameter-free random variable. The reparameterized
	sample therefore becomes differentiable. The code for implementing the pathwise
	derivative would be as follows::

	params = policy_network(state)
	m = Normal(*params)
	# Any distribution with .has_rsample == True could work based on the application
	action = m.rsample()
	next_state, reward = env.step(action) # Assuming that reward is differentiable
	loss = -reward
	loss.backward()
	"""

	from .bernoulli import Bernoulli
	from .beta import Beta
	from .binomial import Binomial
	from .categorical import Categorical
	from .cauchy import Cauchy
	from .chi2 import Chi2
	from .constraint_registry import biject_to, transform_to
	from .continuous_bernoulli import ContinuousBernoulli
	from .dirichlet import Dirichlet
	from .distribution import Distribution
	from .exp_family import ExponentialFamily
	from .exponential import Exponential
	from .fishersnedecor import FisherSnedecor
	from .gamma import Gamma
	from .geometric import Geometric
	from .gumbel import Gumbel
	from .half_cauchy import HalfCauchy
	from .half_normal import HalfNormal
	from .independent import Independent
	from .inverse_gamma import InverseGamma
	from .kl import _add_kl_info, kl_divergence, register_kl
	from .kumaraswamy import Kumaraswamy
	from .laplace import Laplace
	from .lkj_cholesky import LKJCholesky
	from .log_normal import LogNormal
	from .logistic_normal import LogisticNormal
	from .lowrank_multivariate_normal import LowRankMultivariateNormal
	from .mixture_same_family import MixtureSameFamily
	from .multinomial import Multinomial
	from .multivariate_normal import MultivariateNormal
	from .negative_binomial import NegativeBinomial
	from .normal import Normal
	from .one_hot_categorical import OneHotCategorical, OneHotCategoricalStraightThrough
	from .pareto import Pareto
	from .poisson import Poisson
	from .relaxed_bernoulli import RelaxedBernoulli
	from .relaxed_categorical import RelaxedOneHotCategorical
	from .studentT import StudentT
	from .transformed_distribution import TransformedDistribution
	from .transforms import * # noqa: F403
	from . import transforms
	from .uniform import Uniform
	from .von_mises import VonMises
	from .weibull import Weibull
	from .wishart import Wishart

	_add_kl_info()
	del _add_kl_info

	__all__ = [
	"Bernoulli",
	"Beta",
	"Binomial",
	"Categorical",
	"Cauchy",
	"Chi2",
	"ContinuousBernoulli",
	"Dirichlet",
	"Distribution",
	"Exponential",
	"ExponentialFamily",
	"FisherSnedecor",
	"Gamma",
	"Geometric",
	"Gumbel",
	"HalfCauchy",
	"HalfNormal",
	"Independent",
	"InverseGamma",
	"Kumaraswamy",
	"LKJCholesky",
	"Laplace",
	"LogNormal",
	"LogisticNormal",
	"LowRankMultivariateNormal",
	"MixtureSameFamily",
	"Multinomial",
	"MultivariateNormal",
	"NegativeBinomial",
	"Normal",
	"OneHotCategorical",
	"OneHotCategoricalStraightThrough",
	"Pareto",
	"RelaxedBernoulli",
	"RelaxedOneHotCategorical",
	"StudentT",
	"Poisson",
	"Uniform",
	"VonMises",
	"Weibull",
	"Wishart",
	"TransformedDistribution",
	"biject_to",
	"kl_divergence",
	"register_kl",
	"transform_to",
	]
	__all__.extend(transforms.__all__)