# Reproducing Our NeurIPS 2020 Paper

In our paper, we demonstrated the applications of our framework on robustness verification and certified training. Please follow this guide to reproduce the results.

## Vision Models

### CIFAR-10

For CIFAR-10, we provided some sample models in `examples/vision/models`:

[cnn_7layer_bn](https://github.com/KaidiXu/auto_LiRPA/tree/HEAD/docs/src/../examples/vision/models/feedforward.py), 
[DenseNet](https://github.com/KaidiXu/auto_LiRPA/tree/HEAD/docs/src/../examples/vision/models/densenet.py), 
[ResNet18](https://github.com/KaidiXu/auto_LiRPA/tree/HEAD/docs/src/../examples/vision/models/resnet18.py), 
[ResNeXt](https://github.com/KaidiXu/auto_LiRPA/tree/HEAD/docs/src/../examples/vision/models/resnext.py).

To reproduce our state-of-the-art results CNN-7+BN model, just run:

```bash
cd examples/vision
python cifar_training.py --batch_size 256 --lr_decay_milestones 1400 1700 --model cnn_7layer_bn
```

Or you can change the model to ResNeXt like:

```bash
python cifar_training.py --batch_size 256 --lr_decay_milestones 1400 1700 --model ResNeXt_cifar
```

To evaluate the clean error and verified error of our CNN-7+BN model:

```bash
python cifar_training.py --verify  --model cnn_7layer_bn --load saved_models/cnn_7layer_bn_cifar --eps 0.03137254901961
```

Or you can evaluate your models by specific $DIR  in --load

In case there is a need to train the model without loss fusion (would be slower noticebly),  please add --no_loss_fusion flag:

```bash
python cifar_training.py --batch_size 256 --lr_decay_milestones 1400 1700 --model cnn_7layer_bn --no_loss_fusion
```

### Tiny-ImageNet

First, we need to prepare the data:

```bash
cd examples/vision/data/tinyImageNet
bash tinyimagenet_download.sh
```

To reproduce our results on WideResNet model, just run:

```bash
cd examples/vision
python tinyimagenet_training.py --batch_size 100 --lr_decay_milestones 600 700 --model wide_resnet_imagenet64
```

To evaluate the clean error and verified error:

```bash
python tinyimagenet_training.py --verify  --model wide_resnet_imagenet64 --load $DIR --eps 0.003921568627451
```

### MNIST

Certified training with backward mode perturbation analysis for L2 perturbation on weights:

```bash
cd examples/vision
python weights_training.py --norm 2 --bound_type CROWN-IBP --lr_decay_milestones 120 140
```

To reproduce the model with  "flat" optimization landscape, we only 10% data on MNIST, we can set:

```bash
python weights_training.py --norm 2 --ratio 0.1 --bound_type CROWN-IBP --batch_size 500 --lr_decay_milestones 3700 4000 --scheduler_opts start=200,length=3200 --opt SGD --lr 0.1
```

Evaluate the certified cross entropy and test accuracy:

```bash
python weights_training.py --load $DIR --norm 2  --bound_type CROWN-IBP --batch_size 500 --verify
```

### FashionMNIST

Similarly for FashionMNIST with a different dataset argument:

```bash
cd examples/vision
python weights_training.py --data FashionMNIST --norm 2 --ratio 0.1 --bound_type CROWN-IBP --batch_size 500 --lr_decay_milestones 3700 4000 --scheduler_opts start=200,length=3200 --opt SGD --lr 0.1 --eps 0.05
```

### Scalability

We provide multi-GPU training and **Loss Fusion** in our framework to improve scalability.

All experiments on Transformer/LSTM and other vision models on MNIST dataset
can be conducted on a single Nvidia GTX 1080Ti GPU.

Certified training on CIFAR-10 dataset with **Loss Fusion** can be conducted
on two Nvidia GTX 1080Ti GPUs with batch size = 256.  By contrast, the batch
size can only be set to 64 (or lower) without **Loss Fusion**.

Certified training on Tiny-ImageNet dataset can be conducted on four Nvidia GTX
1080TI GPUs only with **Loss Fusion**. The batch size can be  set as 100~256
depends on the model.

## Language Models

Please follow [this example](examples.html#certifiably-robust-language-classifier-with-transformer-and-lstm) to prepare the data. And we have two environment variables:

- `DIR`: the path of the directory to save or load the trained model.
- `BUDGET`: the budget for synonym-based word substitution (for testing certifiably trained language models only, set to 1~6 in our paper).

To run the experiments:

### LSTM

Regular training:

```bash
python train.py --dir=$DIR --num_epochs=10 --model=lstm --lr=1e-3 --dropout=0.5 --train
python train.py --dir=$DIR --model=lstm --load=$DIR/ckpt_10 --robust --method=IBP|IBP+backward|forward|forward+backward # for verification
```

IBP training:

```bash
python train.py --dir=$DIR --model=lstm --lr=1e-3 --robust --method=IBP --dropout=0.5 --train
python train.py --dir=$DIR --model=lstm --load=$DIR/ckpt_25 --robust --method=IBP # for verification
```

LiRPA training:

```bash
python train.py --dir=$DIR --model=lstm --lr=1e-3 --robust --method=IBP+backward_train --dropout=0.5 --train
python train.py --dir=$DIR --model=lstm --load=$DIR/ckpt_25 --robust --method=IBP+backward # for verification
```

### Transformer

Regular training:

```bash
python train.py --dir=$DIR --num_epochs=2 --train
python train.py --dir=$DIR --robust --method=IBP|IBP+backward|forward|forward+backward # for verification
```

IBP training:

```bash
python train.py --dir=$DIR --robust --method=IBP --train
python train.py --dir=$DIR --robust --method=IBP # for verification
```

LiRPA training:

```bash
python train.py --dir=$DIR --robust --method=IBP+backward_train --train
python train.py --dir=$DIR --robust --method=IBP+backward # for verification
```

### Other options

You may add `—budget OTHER_BUDGET` to set a different budget for word substitution.