pytorch关于卷积操作的初始化方式(kaiming_uniform_详解)

更新时间：2023年09月08日 10:58:52 作者：两只蜡笔的小新

这篇文章主要介绍了pytorch关于卷积操作的初始化方式(kaiming_uniform_详解),具有很好的参考价值,希望对大家有所帮助,如有错误或未考虑完全的地方,望不吝赐教

1. pytorch中的卷积运算分类

在pycharm的IDE中，按住ctrl+鼠标点击torch.nn.Conv2d可以进入torch的内部卷积运算的源码(conv.py)

搭建网络经常使用到的模块

如下图所示：

class _ConvNd(Module):
class Conv1d(_ConvNd):
class Conv2d(_ConvNd):
class Conv3d(_ConvNd):
class _ConvTransposeNd(_ConvNd):
class ConvTranspose1d(_ConvTransposeNd):
class ConvTranspose2d(_ConvTransposeNd):
class ConvTranspose3d(_ConvTransposeNd):

可以看到：常用的卷积的父类均是

class _ConvNd(Module):

并且点开 class Conv2d(_ConvNd): 并没有发现参数初始化的具体方法，

如下图所示

所以猜想卷积初始化参数的方法应该在父类 _ConvNd(Module):

2. pytorch中的卷积操作的父类

下面是父类 _ConvNd 的源码，其中初始化参数的方法是

def reset_parameters(self) -> None:

class _ConvNd(Module):
    __constants__ = ['stride', 'padding', 'dilation', 'groups',
                     'padding_mode', 'output_padding', 'in_channels',
                     'out_channels', 'kernel_size']
    __annotations__ = {'bias': Optional[torch.Tensor]}
    def _conv_forward(self, input: Tensor, weight: Tensor, bias: Optional[Tensor]) -> Tensor:
        ...
    _in_channels: int
    out_channels: int
    kernel_size: Tuple[int, ...]
    stride: Tuple[int, ...]
    padding: Tuple[int, ...]
    dilation: Tuple[int, ...]
    transposed: bool
    output_padding: Tuple[int, ...]
    groups: int
    padding_mode: str
    weight: Tensor
    bias: Optional[Tensor]
    def __init__(self,
                 in_channels: int,
                 out_channels: int,
                 kernel_size: Tuple[int, ...],
                 stride: Tuple[int, ...],
                 padding: Tuple[int, ...],
                 dilation: Tuple[int, ...],
                 transposed: bool,
                 output_padding: Tuple[int, ...],
                 groups: int,
                 bias: bool,
                 padding_mode: str) -> None:
        super(_ConvNd, self).__init__()
        if in_channels % groups != 0:
            raise ValueError('in_channels must be divisible by groups')
        if out_channels % groups != 0:
            raise ValueError('out_channels must be divisible by groups')
        valid_padding_modes = {'zeros', 'reflect', 'replicate', 'circular'}
        if padding_mode not in valid_padding_modes:
            raise ValueError("padding_mode must be one of {}, but got padding_mode='{}'".format(
                valid_padding_modes, padding_mode))
        self.in_channels = in_channels
        self.out_channels = out_channels
        self.kernel_size = kernel_size
        self.stride = stride
        self.padding = padding
        self.dilation = dilation
        self.transposed = transposed
        self.output_padding = output_padding
        self.groups = groups
        self.padding_mode = padding_mode
        # `_reversed_padding_repeated_twice` is the padding to be passed to
        # `F.pad` if needed (e.g., for non-zero padding types that are
        # implemented as two ops: padding + conv). `F.pad` accepts paddings in
        # reverse order than the dimension.
        self._reversed_padding_repeated_twice = _reverse_repeat_tuple(self.padding, 2)
        if transposed:
            self.weight = Parameter(torch.Tensor(
                in_channels, out_channels // groups, *kernel_size))
        else:
            self.weight = Parameter(torch.Tensor(
                out_channels, in_channels // groups, *kernel_size))
        if bias:
            self.bias = Parameter(torch.Tensor(out_channels))
        else:
            self.register_parameter('bias', None)
        self.reset_parameters()
    def reset_parameters(self) -> None:
        init.kaiming_uniform_(self.weight, a=math.sqrt(5))
        if self.bias is not None:
            fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
            bound = 1 / math.sqrt(fan_in)
            init.uniform_(self.bias, -bound, bound)
    def extra_repr(self):
        s = ('{in_channels}, {out_channels}, kernel_size={kernel_size}'
             ', stride={stride}')
        if self.padding != (0,) * len(self.padding):
            s += ', padding={padding}'
        if self.dilation != (1,) * len(self.dilation):
            s += ', dilation={dilation}'
        if self.output_padding != (0,) * len(self.output_padding):
            s += ', output_padding={output_padding}'
        if self.groups != 1:
            s += ', groups={groups}'
        if self.bias is None:
            s += ', bias=False'
        if self.padding_mode != 'zeros':
            s += ', padding_mode={padding_mode}'
        return s.format(**self.__dict__)
    def __setstate__(self, state):
        super(_ConvNd, self).__setstate__(state)
        if not hasattr(self, 'padding_mode'):
            self.padding_mode = 'zeros'

3. def reset_parameters(self) -> None

卷积操作的默认的初始化方式：

    def reset_parameters(self) -> None:
        init.kaiming_uniform_(self.weight, a=math.sqrt(5))
        if self.bias is not None:
            fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
            bound = 1 / math.sqrt(fan_in)
            init.uniform_(self.bias, -bound, bound)

该类中的参数的初始化方式是： Kaiming 初始化

由我国计算机视觉领域专家何凯明提出了针对于relu的初始化方法，pytorch默认使用kaiming正态分布初始化卷积层参数。

Fills the input Tensor with values according to the method described in Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification - He, K. et al. (2015),

using a uniform distribution.

The resulting tensor will have values sampled from U( − bound, bound) where bound = gain × √((3)/( fan_mode))Also known as He initialization.

3.1 卷积核部分的参数初始化：

init.kaiming_uniform_(self.weight, a=math.sqrt(5))

关于init.kaiming_uniform_这个函数，源码如下：

 
def kaiming_uniform_(tensor, a=0, mode='fan_in', nonlinearity='leaky_relu'):
    r"""Fills the input `Tensor` with values according to the method
    described in `Delving deep into rectifiers: Surpassing human-level
    performance on ImageNet classification` - He, K. et al. (2015), using a
    uniform distribution. The resulting tensor will have values sampled from
    :math:`\mathcal{U}(-\text{bound}, \text{bound})` where
    .. math::
        \text{bound} = \text{gain} \times \sqrt{\frac{3}{\text{fan\_mode}}}
    Also known as He initialization.
    Args:
        tensor: an n-dimensional `torch.Tensor`
        a: the negative slope of the rectifier used after this layer (only
            used with ``'leaky_relu'``)
        mode: either ``'fan_in'`` (default) or ``'fan_out'``. Choosing ``'fan_in'``
            preserves the magnitude of the variance of the weights in the
            forward pass. Choosing ``'fan_out'`` preserves the magnitudes in the
            backwards pass.
        nonlinearity: the non-linear function (`nn.functional` name),
            recommended to use only with ``'relu'`` or ``'leaky_relu'`` (default).
    Examples:
        >>> w = torch.empty(3, 5)
        >>> nn.init.kaiming_uniform_(w, mode='fan_in', nonlinearity='relu')
    """
    fan = _calculate_correct_fan(tensor, mode)
    gain = calculate_gain(nonlinearity, a)
    std = gain / math.sqrt(fan)
    bound = math.sqrt(3.0) * std  # Calculate uniform bounds from standard deviation
    with torch.no_grad():
        return tensor.uniform_(-bound, bound)

torch中卷积核默认的初始化的详细参数为：

 init.kaiming_uniform_(self.weight, a=math.sqrt(5),mode='fan_in', nonlinearity='leaky_relu'))

关于 init.kaiming_uniform_中所使用的其他函数，如下不做进一步的分析，不过还是简单介绍一下。

_calculate_correct_fan(tensor, mode) # 用于计算计算当前网络层的fan_in（输入神经元个数）或  fan_out（输出神经元个数的），取决于 mode 的值 'fan_in' 'fan_out'
calculate_gain：# 对于给定的非线性函数，返回推荐的增益值，其实就是一个数，从下面图中的列表中选出对应的值

_calculate_correct_fan：在这里 model = fan_in， 计算的是当前网络层的fan_in（输入神经元个数）
calculate_gain：在这里 nonlinearity='leaky_relu'，param = a = math.sqrt(5) 得到的值就是：（negative_slope = param = math.sqrt(5)）

gan = math.sqrt(2.0 / (1 + negative_slope ** 2))

前文讲到，

The resulting tensor will have values sampled from U( − bound, bound) where bound = gain × √((3)/( fan_mode))

所以上面的一通计算得到了bound

下面的 uniform_(from=0, to=1) → Tensor，将tensor用从均匀分布中抽样得到的值填充。

3.2 bias部分的初始化

这里不做详细介绍了，相信认真看了 weights部分的初始化过程，这部分自然会明白。

   if self.bias is not None:
            fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
            bound = 1 / math.sqrt(fan_in)
            init.uniform_(self.bias, -bound, bound)

附加的：

init._calculate_fan_in_and_fan_out(self.weight)

函数来计算当前网络层的fan_in（输入神经元个数）和fan_out（输出神经元个数的）

总结

以上为个人经验，希望能给大家一个参考，也希望大家多多支持脚本之家。

您可能感兴趣的文章:

django正续或者倒序查库实例
这篇文章主要介绍了django正续或者倒序查库实例，具有很好的参考价值，希望对大家有所帮助。一起跟随小编过来看看吧
2020-05-05
python使用mysqldb连接数据库操作方法示例详解
这篇文章主要介绍了python mysqldb使用方法,大家参考使用
2013-12-12
Python封装SNMP调用接口的示例代码
PySNMP 是一个纯粹用Python实现的SNMP,用PySNMP的最抽象的API为One-line Applications。本文为大家分享了Python封装SNMP调用接口的示例代码，需要的可以参考一下
2022-07-07
Python 装饰器实现DRY(不重复代码)原则
python的装饰器就是一种代码简洁的手段，在函数和方法有改动时，使得改动量最小。这篇文章给大家介绍了Python 装饰器实现DRY(不重复代码)原则，感兴趣的朋友一起看看吧
2018-03-03
python代码 if not x: 和 if x is not None: 和 if not x is None:使用
这篇文章主要介绍了python代码 if not x: 和 if x is not None: 和 if not x is None:使用介绍,需要的朋友可以参考下
2016-09-09
python中f字符串f-string用法详解
f-string用大括号{}表示被替换字段,其中直接填入替换内容,本文给大家介绍python中f字符串f-string用法详解,感兴趣的朋友一起看看吧
2023-10-10
pandas中聚合函数agg的具体用法
Pandas中的的agg()函数为aggregate的缩写.总数、合计、聚合的意思.是一个功能非常强大的函数.在Pandas中可以利用agg()对Series、DataFrame以及groupby()后的结果进行聚合操作,下面这篇文章主要给大家介绍了关于pandas中聚合函数agg的具体用法,需要的朋友可以参考下
2022-07-07
ansible作为python模块库使用的方法实例
ansible是一个python package，是个完全的unpack and play软件，对客户端唯一的要求是有ssh有python，并且装了python-simplejson包，部署上简单到发指。下面这篇文章就给大家主要介绍了ansible作为python模块库使用的方法实例,需要的朋友可以参考借鉴。
2017-01-01
Python图像灰度变换及图像数组操作
这篇文章主要介绍了Python图像灰度变换及图像数组操作的相关资料,需要的朋友可以参考下
2016-01-01
在Python中使用第三方模块的教程
这篇文章主要介绍了在Python中使用第三方模块的教程,是Python学习当中的基础知识,需要的朋友可以参考下
2015-04-04