DeepLabv2 (ResNet-101) employs (1) re-purposed ResNet-101 for semantic segmentation by atrous convolution, (2) multi-scale inputs with max-pooling to merge the results from all scales, and (3) atrous spatial pyramid pooling. The model has been pretrained on MS-COCO dataset.
Performance
After DenseCRF, the model yields 79.7% performance on PASCAL VOC 2012 test set.
CRF parameters: bi_w = 4, bi_xy_std = 67, bi_rgb_std = 3, pos_w = 3, pos_xy_std = 1.
Pretrained models and corresponding prototxt files
Please download from this link. Note that the provided init.caffemodel has already been pretrained on MS-COCO.