Spaces:

NCTCMumbai
/

NCTC

Running

App Files Files Community

NCTC / models /research /deeplab /g3doc /model_zoo.md

NCTCMumbai

Upload 2571 files

0b8359d almost 2 years ago

preview code

raw

history blame contribute delete

18.3 kB

	# TensorFlow DeepLab Model Zoo

	We provide deeplab models pretrained several datasets, including (1) PASCAL VOC
	2012, (2) Cityscapes, and (3) ADE20K for reproducing our results, as well as
	some checkpoints that are only pretrained on ImageNet for training your own
	models.

	## DeepLab models trained on PASCAL VOC 2012

	Un-tar'ed directory includes:

	* a frozen inference graph (`frozen_inference_graph.pb`). All frozen inference
	graphs by default use output stride of 8, a single eval scale of 1.0 and
	no left-right flips, unless otherwise specified. MobileNet-v2 based models
	do not include the decoder module.

	* a checkpoint (`model.ckpt.data-00000-of-00001`, `model.ckpt.index`)

	### Model details

	We provide several checkpoints that have been pretrained on VOC 2012 train_aug
	set or train_aug + trainval set. In the former case, one could train their model
	with smaller batch size and freeze batch normalization when limited GPU memory
	is available, since we have already fine-tuned the batch normalization for you.
	In the latter case, one could directly evaluate the checkpoints on VOC 2012 test
	set or use this checkpoint for demo. Note MobileNet-v2 based models do not
	employ ASPP and decoder modules for fast computation.

	Checkpoint name \| Network backbone \| Pretrained dataset \| ASPP \| Decoder
	--------------------------- \| :--------------: \| :-----------------: \| :---: \| :-----:
	mobilenetv2_dm05_coco_voc_trainaug \| MobileNet-v2 <br> Depth-Multiplier = 0.5 \| ImageNet <br> MS-COCO <br> VOC 2012 train_aug set\| N/A \| N/A
	mobilenetv2_dm05_coco_voc_trainval \| MobileNet-v2 <br> Depth-Multiplier = 0.5 \| ImageNet <br> MS-COCO <br> VOC 2012 train_aug + trainval sets \| N/A \| N/A
	mobilenetv2_coco_voc_trainaug \| MobileNet-v2 \| ImageNet <br> MS-COCO <br> VOC 2012 train_aug set\| N/A \| N/A
	mobilenetv2_coco_voc_trainval \| MobileNet-v2 \| ImageNet <br> MS-COCO <br> VOC 2012 train_aug + trainval sets \| N/A \| N/A
	xception65_coco_voc_trainaug \| Xception_65 \| ImageNet <br> MS-COCO <br> VOC 2012 train_aug set\| [6,12,18] for OS=16 <br> [12,24,36] for OS=8 \| OS = 4
	xception65_coco_voc_trainval \| Xception_65 \| ImageNet <br> MS-COCO <br> VOC 2012 train_aug + trainval sets \| [6,12,18] for OS=16 <br> [12,24,36] for OS=8 \| OS = 4

	In the table, OS denotes output stride.

	Checkpoint name \| Eval OS \| Eval scales \| Left-right Flip \| Multiply-Adds \| Runtime (sec) \| PASCAL mIOU \| File Size
	------------------------------------------------------------------------------------------------------------------------ \| :-------: \| :------------------------: \| :-------------: \| :------------------: \| :------------: \| :----------------------------: \| :-------:
	[mobilenetv2_dm05_coco_voc_trainaug](http://download.tensorflow.org/models/deeplabv3_mnv2_dm05_pascal_trainaug_2018_10_01.tar.gz) \| 16 \| [1.0] \| No \| 0.88B \| - \| 70.19% (val) \| 7.6MB
	[mobilenetv2_dm05_coco_voc_trainval](http://download.tensorflow.org/models/deeplabv3_mnv2_dm05_pascal_trainval_2018_10_01.tar.gz) \| 8 \| [1.0] \| No \| 2.84B \| - \| 71.83% (test) \| 7.6MB
	[mobilenetv2_coco_voc_trainaug](http://download.tensorflow.org/models/deeplabv3_mnv2_pascal_train_aug_2018_01_29.tar.gz) \| 16 <br> 8 \| [1.0] <br> [0.5:0.25:1.75] \| No <br> Yes \| 2.75B <br> 152.59B \| 0.1 <br> 26.9 \| 75.32% (val) <br> 77.33 (val) \| 23MB
	[mobilenetv2_coco_voc_trainval](http://download.tensorflow.org/models/deeplabv3_mnv2_pascal_trainval_2018_01_29.tar.gz) \| 8 \| [0.5:0.25:1.75] \| Yes \| 152.59B \| 26.9 \| 80.25% (test) \| 23MB
	[xception65_coco_voc_trainaug](http://download.tensorflow.org/models/deeplabv3_pascal_train_aug_2018_01_04.tar.gz) \| 16 <br> 8 \| [1.0] <br> [0.5:0.25:1.75] \| No <br> Yes \| 54.17B <br> 3055.35B \| 0.7 <br> 223.2 \| 82.20% (val) <br> 83.58% (val) \| 439MB
	[xception65_coco_voc_trainval](http://download.tensorflow.org/models/deeplabv3_pascal_trainval_2018_01_04.tar.gz) \| 8 \| [0.5:0.25:1.75] \| Yes \| 3055.35B \| 223.2 \| 87.80% (test) \| 439MB

	In the table, we report both computation complexity (in terms of Multiply-Adds
	and CPU Runtime) and segmentation performance (in terms of mIOU) on the PASCAL
	VOC val or test set. The reported runtime is calculated by tfprof on a
	workstation with CPU E5-1650 v3 @ 3.50GHz and 32GB memory. Note that applying
	multi-scale inputs and left-right flips increases the segmentation performance
	but also significantly increases the computation and thus may not be suitable
	for real-time applications.

	## DeepLab models trained on Cityscapes

	### Model details

	We provide several checkpoints that have been pretrained on Cityscapes
	train_fine set. Note MobileNet-v2 based model has been pretrained on MS-COCO
	dataset and does not employ ASPP and decoder modules for fast computation.

	Checkpoint name \| Network backbone \| Pretrained dataset \| ASPP \| Decoder
	------------------------------------- \| :--------------: \| :-------------------------------------: \| :----------------------------------------------: \| :-----:
	mobilenetv2_coco_cityscapes_trainfine \| MobileNet-v2 \| ImageNet <br> MS-COCO <br> Cityscapes train_fine set \| N/A \| N/A
	mobilenetv3_large_cityscapes_trainfine \| MobileNet-v3 Large \| Cityscapes train_fine set <br> (No ImageNet) \| N/A \| OS = 8
	mobilenetv3_small_cityscapes_trainfine \| MobileNet-v3 Small \| Cityscapes train_fine set <br> (No ImageNet) \| N/A \| OS = 8
	xception65_cityscapes_trainfine \| Xception_65 \| ImageNet <br> Cityscapes train_fine set \| [6, 12, 18] for OS=16 <br> [12, 24, 36] for OS=8 \| OS = 4
	xception71_dpc_cityscapes_trainfine \| Xception_71 \| ImageNet <br> MS-COCO <br> Cityscapes train_fine set \| Dense Prediction Cell \| OS = 4
	xception71_dpc_cityscapes_trainval \| Xception_71 \| ImageNet <br> MS-COCO <br> Cityscapes trainval_fine and coarse set \| Dense Prediction Cell \| OS = 4

	In the table, OS denotes output stride.

	Note for mobilenet v3 models, we use additional commandline flags as follows:

	```
	--model_variant={ mobilenet_v3_large_seg \| mobilenet_v3_small_seg }
	--image_pooling_crop_size=769,769
	--image_pooling_stride=4,5
	--add_image_level_feature=1
	--aspp_convs_filters=128
	--aspp_with_concat_projection=0
	--aspp_with_squeeze_and_excitation=1
	--decoder_use_sum_merge=1
	--decoder_filters=19
	--decoder_output_is_logits=1
	--image_se_uses_qsigmoid=1
	--decoder_output_stride=8
	--output_stride=32
	```

	Checkpoint name \| Eval OS \| Eval scales \| Left-right Flip \| Multiply-Adds \| Runtime (sec) \| Cityscapes mIOU \| File Size
	-------------------------------------------------------------------------------------------------------------------------------- \| :-------: \| :-------------------------: \| :-------------: \| :-------------------: \| :------------: \| :----------------------------: \| :-------:
	[mobilenetv2_coco_cityscapes_trainfine](http://download.tensorflow.org/models/deeplabv3_mnv2_cityscapes_train_2018_02_05.tar.gz) \| 16 <br> 8 \| [1.0] <br> [0.75:0.25:1.25] \| No <br> Yes \| 21.27B <br> 433.24B \| 0.8 <br> 51.12 \| 70.71% (val) <br> 73.57% (val) \| 23MB
	[mobilenetv3_large_cityscapes_trainfine](http://download.tensorflow.org/models/deeplab_mnv3_large_cityscapes_trainfine_2019_11_15.tar.gz) \| 32 \| [1.0] \| No \| 15.95B \| 0.6 \| 72.41% (val) \| 17MB
	[mobilenetv3_small_cityscapes_trainfine](http://download.tensorflow.org/models/deeplab_mnv3_small_cityscapes_trainfine_2019_11_15.tar.gz) \| 32 \| [1.0] \| No \| 4.63B \| 0.4 \| 68.99% (val) \| 5MB
	[xception65_cityscapes_trainfine](http://download.tensorflow.org/models/deeplabv3_cityscapes_train_2018_02_06.tar.gz) \| 16 <br> 8 \| [1.0] <br> [0.75:0.25:1.25] \| No <br> Yes \| 418.64B <br> 8677.92B \| 5.0 <br> 422.8 \| 78.79% (val) <br> 80.42% (val) \| 439MB
	[xception71_dpc_cityscapes_trainfine](http://download.tensorflow.org/models/deeplab_cityscapes_xception71_trainfine_2018_09_08.tar.gz) \| 16 \| [1.0] \| No \| 502.07B \| - \| 80.31% (val) \| 445MB
	[xception71_dpc_cityscapes_trainval](http://download.tensorflow.org/models/deeplab_cityscapes_xception71_trainvalfine_2018_09_08.tar.gz) \| 8 \| [0.75:0.25:2] \| Yes \| - \| - \| 82.66% (test) \| 446MB

	### EdgeTPU-DeepLab models on Cityscapes

	EdgeTPU is Google's machine learning accelerator architecture for edge devices
	(exists in Coral devices and Pixel4's Neural Core). Leveraging nerual
	architecture search (NAS, also named as Auto-ML) algorithms,
	[EdgeTPU-Mobilenet](https://github.com/tensorflow/models/tree/master/research/slim/nets/mobilenet)
	has been released which yields higher hardware utilization, lower latency, as
	well as better accuracy over Mobilenet-v2/v3. We use EdgeTPU-Mobilenet as the
	backbone and provide checkpoints that have been pretrained on Cityscapes
	train_fine set. We named them as EdgeTPU-DeepLab models.

	Checkpoint name \| Network backbone \| Pretrained dataset \| ASPP \| Decoder
	-------------------- \| :----------------: \| :----------------: \| :--: \| :-----:
	EdgeTPU-DeepLab \| EdgeMobilenet-1.0 \| ImageNet \| N/A \| N/A
	EdgeTPU-DeepLab-slim \| EdgeMobilenet-0.75 \| ImageNet \| N/A \| N/A

	For EdgeTPU-DeepLab-slim, the backbone feature extractor has depth multiplier =
	0.75 and aspp_convs_filters = 128. We do not employ ASPP nor decoder modules to
	further reduce the latency. We employ the same train/eval flags used for
	MobileNet-v2 DeepLab model. Flags changed for EdgeTPU-DeepLab model are listed
	here.

	```
	--decoder_output_stride=''
	--aspp_convs_filters=256
	--model_variant=mobilenet_edgetpu
	```

	For EdgeTPU-DeepLab-slim, also include the following flags.

	```
	--depth_multiplier=0.75
	--aspp_convs_filters=128
	```

	Checkpoint name \| Eval OS \| Eval scales \| Cityscapes mIOU \| Multiply-Adds \| Simulator latency on Pixel 4 EdgeTPU
	---------------------------------------------------------------------------------------------------- \| :--------: \| :---------: \| :--------------------------: \| :------------: \| :----------------------------------:
	[EdgeTPU-DeepLab](http://download.tensorflow.org/models/edgetpu-deeplab_2020_03_09.tar.gz) \| 32 <br> 16 \| [1.0] \| 70.6% (val) <br> 74.1% (val) \| 5.6B <br> 7.1B \| 13.8 ms <br> 17.5 ms
	[EdgeTPU-DeepLab-slim](http://download.tensorflow.org/models/edgetpu-deeplab-slim_2020_03_09.tar.gz) \| 32 <br> 16 \| [1.0] \| 70.0% (val) <br> 73.2% (val) \| 3.5B <br> 4.3B \| 9.9 ms <br> 13.2 ms

	## DeepLab models trained on ADE20K

	### Model details

	We provide some checkpoints that have been pretrained on ADE20K training set.
	Note that the model has only been pretrained on ImageNet, following the
	dataset rule.

	Checkpoint name \| Network backbone \| Pretrained dataset \| ASPP \| Decoder \| Input size
	------------------------------------- \| :--------------: \| :-------------------------------------: \| :----------------------------------------------: \| :-----: \| :-----:
	mobilenetv2_ade20k_train \| MobileNet-v2 \| ImageNet <br> ADE20K training set \| N/A \| OS = 4 \| 257x257
	xception65_ade20k_train \| Xception_65 \| ImageNet <br> ADE20K training set \| [6, 12, 18] for OS=16 <br> [12, 24, 36] for OS=8 \| OS = 4 \| 513x513

	The input dimensions of ADE20K have a huge amount of variation. We resize inputs so that the longest size is 257 for MobileNet-v2 (faster inference) and 513 for Xception_65 (better performation). Note that we also include the decoder module in the MobileNet-v2 checkpoint.

	Checkpoint name \| Eval OS \| Eval scales \| Left-right Flip \| mIOU \| Pixel-wise Accuracy \| File Size
	------------------------------------- \| :-------: \| :-------------------------: \| :-------------: \| :-------------------: \| :-------------------: \| :-------:
	[mobilenetv2_ade20k_train](http://download.tensorflow.org/models/deeplabv3_mnv2_ade20k_train_2018_12_03.tar.gz) \| 16 \| [1.0] \| No \| 32.04% (val) \| 75.41% (val) \| 24.8MB
	[xception65_ade20k_train](http://download.tensorflow.org/models/deeplabv3_xception_ade20k_train_2018_05_29.tar.gz) \| 8 \| [0.5:0.25:1.75] \| Yes \| 45.65% (val) \| 82.52% (val) \| 439MB


	## Checkpoints pretrained on ImageNet

	Un-tar'ed directory includes:

	* model checkpoint (`model.ckpt.data-00000-of-00001`, `model.ckpt.index`).

	### Model details

	We also provide some checkpoints that are pretrained on ImageNet and/or COCO (as
	post-fixed in the model name) so that one could use this for training your own
	models.

	* mobilenet_v2: We refer the interested users to the TensorFlow open source
	[MobileNet-V2](https://github.com/tensorflow/models/tree/master/research/slim/nets/mobilenet)
	for details.

	* xception_{41,65,71}: We adapt the original Xception model to the task of
	semantic segmentation with the following changes: (1) more layers, (2) all
	max pooling operations are replaced by strided (atrous) separable
	convolutions, and (3) extra batch-norm and ReLU after each 3x3 depthwise
	convolution are added. We provide three Xception model variants with
	different network depths.

	* resnet_v1_{50,101}_beta: We modify the original ResNet-101 [10], similar to
	PSPNet [11] by replacing the first 7x7 convolution with three 3x3
	convolutions. See resnet_v1_beta.py for more details.

	Model name \| File Size
	-------------------------------------------------------------------------------------- \| :-------:
	[xception_41_imagenet](http://download.tensorflow.org/models/xception_41_2018_05_09.tar.gz ) \| 288MB
	[xception_65_imagenet](http://download.tensorflow.org/models/deeplabv3_xception_2018_01_04.tar.gz) \| 447MB
	[xception_65_imagenet_coco](http://download.tensorflow.org/models/xception_65_coco_pretrained_2018_10_02.tar.gz) \| 292MB
	[xception_71_imagenet](http://download.tensorflow.org/models/xception_71_2018_05_09.tar.gz ) \| 474MB
	[resnet_v1_50_beta_imagenet](http://download.tensorflow.org/models/resnet_v1_50_2018_05_04.tar.gz) \| 274MB
	[resnet_v1_101_beta_imagenet](http://download.tensorflow.org/models/resnet_v1_101_2018_05_04.tar.gz) \| 477MB

	## References

	1. Mobilenets: Efficient convolutional neural networks for mobile vision applications<br />
	Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam<br />
	[[link]](https://arxiv.org/abs/1704.04861). arXiv:1704.04861, 2017.

	2. Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation<br />
	Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen<br />
	[[link]](https://arxiv.org/abs/1801.04381). arXiv:1801.04381, 2018.

	3. Xception: Deep Learning with Depthwise Separable Convolutions<br />
	François Chollet<br />
	[[link]](https://arxiv.org/abs/1610.02357). In the Proc. of CVPR, 2017.

	4. Deformable Convolutional Networks -- COCO Detection and Segmentation Challenge 2017 Entry<br />
	Haozhi Qi, Zheng Zhang, Bin Xiao, Han Hu, Bowen Cheng, Yichen Wei, Jifeng Dai<br />
	[[link]](http://presentations.cocodataset.org/COCO17-Detect-MSRA.pdf). ICCV COCO Challenge
	Workshop, 2017.

	5. The Pascal Visual Object Classes Challenge: A Retrospective<br />
	Mark Everingham, S. M. Ali Eslami, Luc Van Gool, Christopher K. I. Williams, John M. Winn, Andrew Zisserman<br />
	[[link]](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/). IJCV, 2014.

	6. Semantic Contours from Inverse Detectors<br />
	Bharath Hariharan, Pablo Arbelaez, Lubomir Bourdev, Subhransu Maji, Jitendra Malik<br />
	[[link]](http://home.bharathh.info/pubs/codes/SBD/download.html). In the Proc. of ICCV, 2011.

	7. The Cityscapes Dataset for Semantic Urban Scene Understanding<br />
	Cordts, Marius, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, Bernt Schiele. <br />
	[[link]](https://www.cityscapes-dataset.com/). In the Proc. of CVPR, 2016.

	8. Microsoft COCO: Common Objects in Context<br />
	Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, Piotr Dollar<br />
	[[link]](http://cocodataset.org/). In the Proc. of ECCV, 2014.

	9. ImageNet Large Scale Visual Recognition Challenge<br />
	Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, Li Fei-Fei<br />
	[[link]](http://www.image-net.org/). IJCV, 2015.

	10. Deep Residual Learning for Image Recognition<br />
	Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun<br />
	[[link]](https://arxiv.org/abs/1512.03385). CVPR, 2016.

	11. Pyramid Scene Parsing Network<br />
	Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, Jiaya Jia<br />
	[[link]](https://arxiv.org/abs/1612.01105). In CVPR, 2017.

	12. Scene Parsing through ADE20K Dataset<br />
	Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, Antonio Torralba<br />
	[[link]](http://groups.csail.mit.edu/vision/datasets/ADE20K/). In CVPR,
	2017.

	13. Searching for MobileNetV3<br />
	Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, Quoc V. Le, Hartwig Adam<br />
	[[link]](https://arxiv.org/abs/1905.02244). In ICCV, 2019.