Spaces:

NCTCMumbai
/

NCTC

Running

App Files Files Community

NCTC / models /research /object_detection /g3doc /tpu_compatibility.md

NCTCMumbai

Upload 2571 files

0b8359d almost 2 years ago

preview code

raw

history blame contribute delete

6.35 kB

	# TPU compatible detection pipelines

	[TOC]

	The Tensorflow Object Detection API supports TPU training for some models. To
	make models TPU compatible you need to make a few tweaks to the model config as
	mentioned below. We also provide several sample configs that you can use as a
	template.

	## TPU compatibility

	### Static shaped tensors

	TPU training currently requires all tensors in the Tensorflow Graph to have
	static shapes. However, most of the sample configs in Object Detection API have
	a few different tensors that are dynamically shaped. Fortunately, we provide
	simple alternatives in the model configuration that modifies these tensors to
	have static shape:

	* Image tensors with static shape - This can be achieved either by using a
	`fixed_shape_resizer` that resizes images to a fixed spatial shape or by
	setting `pad_to_max_dimension: true` in `keep_aspect_ratio_resizer` which
	pads the resized images with zeros to the bottom and right. Padded image
	tensors are correctly handled internally within the model.

	```
	image_resizer {
	fixed_shape_resizer {
	height: 640
	width: 640
	}
	}
	```

	or

	```
	image_resizer {
	keep_aspect_ratio_resizer {
	min_dimension: 640
	max_dimension: 640
	pad_to_max_dimension: true
	}
	}
	```

	* Groundtruth tensors with static shape - Images in a typical detection
	dataset have variable number of groundtruth boxes and associated classes.
	Setting `max_number_of_boxes` to a large enough number in `train_config`
	pads the groundtruth tensors with zeros to a static shape. Padded
	groundtruth tensors are correctly handled internally within the model.

	```
	train_config: {
	fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/model.ckpt"
	batch_size: 64
	max_number_of_boxes: 200
	unpad_groundtruth_tensors: false
	}
	```

	### TPU friendly ops

	Although TPU supports a vast number of tensorflow ops, a few used in the
	Tensorflow Object Detection API are unsupported. We list such ops below and
	recommend compatible substitutes.

	* Anchor sampling - Typically we use hard example mining in standard SSD
	pipeliens to balance positive and negative anchors that contribute to the
	loss. Hard Example mining uses non max suppression as a subroutine and since
	non max suppression is not currently supported on TPUs we cannot use hard
	example mining. Fortunately, we provide an implementation of focal loss that
	can be used instead of hard example mining. Remove `hard_example_miner` from
	the config and substitute `weighted_sigmoid` classification loss with
	`weighted_sigmoid_focal` loss.

	```
	loss {
	classification_loss {
	weighted_sigmoid_focal {
	alpha: 0.25
	gamma: 2.0
	}
	}
	localization_loss {
	weighted_smooth_l1 {
	}
	}
	classification_weight: 1.0
	localization_weight: 1.0
	}
	```

	* Target Matching - Object detection API provides two choices for matcher
	used in target assignment: `argmax_matcher` and `bipartite_matcher`.
	Bipartite matcher is not currently supported on TPU, therefore we must
	modify the configs to use `argmax_matcher`. Additionally, set
	`use_matmul_gather: true` for efficiency on TPU.

	```
	matcher {
	argmax_matcher {
	matched_threshold: 0.5
	unmatched_threshold: 0.5
	ignore_thresholds: false
	negatives_lower_than_unmatched: true
	force_match_for_each_row: true
	use_matmul_gather: true
	}
	}
	```

	### TPU training hyperparameters

	Object Detection training on TPU uses synchronous SGD. On a typical cloud TPU
	with 8 cores we recommend batch sizes that are 8x large when compared to a GPU
	config that uses asynchronous SGD. We also use fewer training steps (~ 1/100 x)
	due to the large batch size. This necessitates careful tuning of some other
	training parameters as listed below.

	* Batch size - Use the largest batch size that can fit on cloud TPU.

	```
	train_config {
	batch_size: 1024
	}
	```

	* Training steps - Typically only 10s of thousands.

	```
	train_config {
	num_steps: 25000
	}
	```

	* Batch norm decay - Use smaller decay constants (0.97 or 0.997) since we
	take fewer training steps.

	```
	batch_norm {
	scale: true,
	decay: 0.97,
	epsilon: 0.001,
	}
	```

	* Learning rate - Use large learning rate with warmup. Scale learning rate
	linearly with batch size. See `cosine_decay_learning_rate` or
	`manual_step_learning_rate` for examples.

	```
	learning_rate: {
	cosine_decay_learning_rate {
	learning_rate_base: .04
	total_steps: 25000
	warmup_learning_rate: .013333
	warmup_steps: 2000
	}
	}
	```

	or

	```
	learning_rate: {
	manual_step_learning_rate {
	warmup: true
	initial_learning_rate: .01333
	schedule {
	step: 2000
	learning_rate: 0.04
	}
	schedule {
	step: 15000
	learning_rate: 0.004
	}
	}
	}
	```

	## Example TPU compatible configs

	We provide example config files that you can use to train your own models on TPU

	* <a href='https://github.com/tensorflow/models/blob/master/research/object_detection/samples/configs/ssd_mobilenet_v1_300x300_coco14_sync.config'>ssd_mobilenet_v1_300x300</a> <br>
	* <a href='https://github.com/tensorflow/models/blob/master/research/object_detection/samples/configs/ssd_mobilenet_v1_ppn_shared_box_predictor_300x300_coco14_sync.config'>ssd_mobilenet_v1_ppn_300x300</a> <br>
	* <a href='https://github.com/tensorflow/models/blob/master/research/object_detection/samples/configs/ssd_mobilenet_v1_fpn_shared_box_predictor_640x640_coco14_sync.config'>ssd_mobilenet_v1_fpn_640x640
	(mobilenet based retinanet)</a> <br>
	* <a href='https://github.com/tensorflow/models/blob/master/research/object_detection/samples/configs/ssd_resnet50_v1_fpn_shared_box_predictor_640x640_coco14_sync.config'>ssd_resnet50_v1_fpn_640x640
	(retinanet)</a> <br>

	## Supported Meta architectures

	Currently, `SSDMetaArch` models are supported on TPUs. `FasterRCNNMetaArch` is
	going to be supported soon.