DeepLabv2 (VGG-16) employs (1) re-purposed VGG-16 for semantic segmentation by atrous convolution, (2) multi-scale inputs with max-pooling to merge the results from all scales, and (3) atrous spatial pyramid pooling.
After DenseCRF, the model yields 72.6% performance on PASCAL VOC 2012 test set.
CRF parameters: bi_w = 4, bi_xy_std = 65, bi_rgb_std = 3, pos_w = 2, pos_xy_std = 2.
Pretrained models and corresponding prototxt files
Please download from this link. Note that the provided init.caffemodel has NOT been pretrained on MS-COCO.