论文"Multi-Sample Dropout for Accelerated Training and Better Generalization"提出了一种Dropout的新用法,Multi-Sample Dropout可以加快训练速度以及产生更好的结果,本文我们将使用pytorch框架进行实验.

Dropout是一种简单但有效的正则化技术,能够使深度神经网络(DNN)产生更好的泛化效果,因此它广泛用于基于DNN的任务。在训练过程,Dropout随机丢弃一部分神经元以避免过拟合。Multi-Sample Dropout可以说是原始Dropout的一种增强版本.即多次丢弃一部分神经元,并求平均获得最终结果.整个结构如下所示:

模型结构

实验

本节,我们将对Multi-Sample Dropout进行实验,使用数据主要是CIFAR10,模型为ResNet,优化器为Adma.文件说明:

1
2
3
4
5
├── nn.py # 模型文件
├── progressbar.py # 进度条文件
├── run.py   #主程序
├── tools.py # 常用工具
├── trainingmonitor.py # 训练指标可视化

首先,我们先下载CIFAR10数据集并进行预处理,在Pytorch模块中已经包装好了数据获取,如下所示:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
data = {
'train': datasets.CIFAR10(
root='./data', download=True,
transform=transforms.Compose([
# transforms.RandomCrop((32, 32), padding=4),
# transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))]
)
),
'valid': datasets.CIFAR10(
root='./data', train=False, download=True,
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))]
)
)
}

接着,我们定义模型结果,ResNet模型主体上保持不变,我们增加Multi-Sample Dropout模块,即:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
class ResNet(nn.Module):
def __init__(self, ResidualBlock, num_classes,dropout_num,dropout_p):
super(ResNet, self).__init__()
......
self.dropouts = nn.ModuleList([nn.Dropout(dropout_p) for _ in range(dropout_num)])

def forward(self, x,y = None,loss_fn = None):
......
feature = F.avg_pool2d(out, 4)
if len(self.dropouts) == 0:
out = feature.view(feature.size(0), -1)
out = self.fc(out)
if loss_fn is not None:
loss = loss_fn(out,y)
return out,loss
return out,None
else:
for i,dropout in enumerate(self.dropouts):
if i== 0:
out = dropout(feature)
out = out.view(out.size(0),-1)
out = self.fc(out)
if loss_fn is not None:
loss = loss_fn(out, y)
else:
temp_out = dropout(feature)
temp_out = temp_out.view(temp_out.size(0),-1)
out =out+ self.fc(temp_out)
if loss_fn is not None:
loss = loss+loss_fn(temp_out, y)
if loss_fn is not None:
return out / len(self.dropouts),loss / len(self.dropouts)
return out,None

备注: 这里实现与原文稍微不同,我们相对而言简单点.

分别运行以下命令:

1
2
3
4

python run.py --dropout_num=0 .
python run.py --dropout_num=1 .
python run.py --dropout_num=8

实验结果;

训练损失变化

验证精度变化

验证损失变化

从实验结果中来看,我们实验的简单版本在CIFAR10数据集上还是有点效果的,详情请大家看原始论文或者对应的代码.

论文地址: https://arxiv.org/abs/1905.09788
实验代码地址: github