Reproducing Our NeurIPS 2020 Paper

In our paper, we demonstrated the applications of our framework on robustness verification and certified training. Please follow this guide to reproduce the results.

Vision Models

CIFAR-10

For CIFAR-10, we provided some sample models in examples/vision/models:

cnn_7layer_bn, DenseNet, ResNet18, ResNeXt.

To reproduce our state-of-the-art results CNN-7+BN model, just run:

cd examples/vision
python cifar_training.py --batch_size 256 --lr_decay_milestones 1400 1700 --model cnn_7layer_bn

Or you can change the model to ResNeXt like:

python cifar_training.py --batch_size 256 --lr_decay_milestones 1400 1700 --model ResNeXt_cifar

To evaluate the clean error and verified error of our CNN-7+BN model:

python cifar_training.py --verify  --model cnn_7layer_bn --load saved_models/cnn_7layer_bn_cifar --eps 0.03137254901961

Or you can evaluate your models by specific $DIR in –load

In case there is a need to train the model without loss fusion (would be slower noticebly), please add –no_loss_fusion flag:

python cifar_training.py --batch_size 256 --lr_decay_milestones 1400 1700 --model cnn_7layer_bn --no_loss_fusion

Tiny-ImageNet

First, we need to prepare the data:

cd examples/vision/data/tinyImageNet
bash tinyimagenet_download.sh

To reproduce our results on WideResNet model, just run:

cd examples/vision
python tinyimagenet_training.py --batch_size 100 --lr_decay_milestones 600 700 --model wide_resnet_imagenet64

To evaluate the clean error and verified error:

python tinyimagenet_training.py --verify  --model wide_resnet_imagenet64 --load $DIR --eps 0.003921568627451

MNIST

Certified training with backward mode perturbation analysis for L2 perturbation on weights:

cd examples/vision
python weights_training.py --norm 2 --bound_type CROWN-IBP --lr_decay_milestones 120 140

To reproduce the model with “flat” optimization landscape, we only 10% data on MNIST, we can set:

python weights_training.py --norm 2 --ratio 0.1 --bound_type CROWN-IBP --batch_size 500 --lr_decay_milestones 3700 4000 --scheduler_opts start=200,length=3200 --opt SGD --lr 0.1

Evaluate the certified cross entropy and test accuracy:

python weights_training.py --load $DIR --norm 2  --bound_type CROWN-IBP --batch_size 500 --verify

FashionMNIST

Similarly for FashionMNIST with a different dataset argument:

cd examples/vision
python weights_training.py --data FashionMNIST --norm 2 --ratio 0.1 --bound_type CROWN-IBP --batch_size 500 --lr_decay_milestones 3700 4000 --scheduler_opts start=200,length=3200 --opt SGD --lr 0.1 --eps 0.05

Scalability

We provide multi-GPU training and Loss Fusion in our framework to improve scalability.

All experiments on Transformer/LSTM and other vision models on MNIST dataset can be conducted on a single Nvidia GTX 1080Ti GPU.

Certified training on CIFAR-10 dataset with Loss Fusion can be conducted on two Nvidia GTX 1080Ti GPUs with batch size = 256. By contrast, the batch size can only be set to 64 (or lower) without Loss Fusion.

Certified training on Tiny-ImageNet dataset can be conducted on four Nvidia GTX 1080TI GPUs only with Loss Fusion. The batch size can be set as 100~256 depends on the model.

Language Models

Please follow this example to prepare the data. And we have two environment variables:

  • DIR: the path of the directory to save or load the trained model.

  • BUDGET: the budget for synonym-based word substitution (for testing certifiably trained language models only, set to 1~6 in our paper).

To run the experiments:

LSTM

Regular training:

python train.py --dir=$DIR --num_epochs=10 --model=lstm --lr=1e-3 --dropout=0.5 --train
python train.py --dir=$DIR --model=lstm --load=$DIR/ckpt_10 --robust --method=IBP|IBP+backward|forward|forward+backward # for verification

IBP training:

python train.py --dir=$DIR --model=lstm --lr=1e-3 --robust --method=IBP --dropout=0.5 --train
python train.py --dir=$DIR --model=lstm --load=$DIR/ckpt_25 --robust --method=IBP # for verification

LiRPA training:

python train.py --dir=$DIR --model=lstm --lr=1e-3 --robust --method=IBP+backward_train --dropout=0.5 --train
python train.py --dir=$DIR --model=lstm --load=$DIR/ckpt_25 --robust --method=IBP+backward # for verification

Transformer

Regular training:

python train.py --dir=$DIR --num_epochs=2 --train
python train.py --dir=$DIR --robust --method=IBP|IBP+backward|forward|forward+backward # for verification

IBP training:

python train.py --dir=$DIR --robust --method=IBP --train
python train.py --dir=$DIR --robust --method=IBP # for verification

LiRPA training:

python train.py --dir=$DIR --robust --method=IBP+backward_train --train
python train.py --dir=$DIR --robust --method=IBP+backward # for verification

Other options

You may add —budget OTHER_BUDGET to set a different budget for word substitution.