Spaces:
Running
Running
 | |
 | |
# Tensorflow Object Detection API | |
Creating accurate machine learning models capable of localizing and identifying | |
multiple objects in a single image remains a core challenge in computer vision. | |
The TensorFlow Object Detection API is an open source framework built on top of | |
TensorFlow that makes it easy to construct, train and deploy object detection | |
models. At Google we’ve certainly found this codebase to be useful for our | |
computer vision needs, and we hope that you will as well. <p align="center"> | |
<img src="g3doc/img/kites_detections_output.jpg" width=676 height=450> </p> | |
Contributions to the codebase are welcome and we would love to hear back from | |
you if you find this API useful. Finally if you use the Tensorflow Object | |
Detection API for a research publication, please consider citing: | |
``` | |
"Speed/accuracy trade-offs for modern convolutional object detectors." | |
Huang J, Rathod V, Sun C, Zhu M, Korattikara A, Fathi A, Fischer I, Wojna Z, | |
Song Y, Guadarrama S, Murphy K, CVPR 2017 | |
``` | |
\[[link](https://arxiv.org/abs/1611.10012)\]\[[bibtex](https://scholar.googleusercontent.com/scholar.bib?q=info:l291WsrB-hQJ:scholar.google.com/&output=citation&scisig=AAGBfm0AAAAAWUIIlnPZ_L9jxvPwcC49kDlELtaeIyU-&scisf=4&ct=citation&cd=-1&hl=en&scfhb=1)\] | |
<p align="center"> | |
<img src="g3doc/img/tf-od-api-logo.png" width=140 height=195> | |
</p> | |
## Maintainers | |
Name | GitHub | |
-------------- | --------------------------------------------- | |
Jonathan Huang | [jch1](https://github.com/jch1) | |
Vivek Rathod | [tombstone](https://github.com/tombstone) | |
Ronny Votel | [ronnyvotel](https://github.com/ronnyvotel) | |
Derek Chow | [derekjchow](https://github.com/derekjchow) | |
Chen Sun | [jesu9](https://github.com/jesu9) | |
Menglong Zhu | [dreamdragon](https://github.com/dreamdragon) | |
Alireza Fathi | [afathi3](https://github.com/afathi3) | |
Zhichao Lu | [pkulzc](https://github.com/pkulzc) | |
## Table of contents | |
Setup: | |
* <a href='g3doc/installation.md'>Installation</a><br> | |
Quick Start: | |
* <a href='object_detection_tutorial.ipynb'> | |
Quick Start: Jupyter notebook for off-the-shelf inference</a><br> | |
* <a href="g3doc/running_pets.md">Quick Start: Training a pet detector</a><br> | |
Customizing a Pipeline: | |
* <a href='g3doc/configuring_jobs.md'> | |
Configuring an object detection pipeline</a><br> | |
* <a href='g3doc/preparing_inputs.md'>Preparing inputs</a><br> | |
Running: | |
* <a href='g3doc/running_locally.md'>Running locally</a><br> | |
* <a href='g3doc/running_on_cloud.md'>Running on the cloud</a><br> | |
Extras: | |
* <a href='g3doc/detection_model_zoo.md'>Tensorflow detection model zoo</a><br> | |
* <a href='g3doc/exporting_models.md'> | |
Exporting a trained model for inference</a><br> | |
* <a href='g3doc/tpu_exporters.md'> | |
Exporting a trained model for TPU inference</a><br> | |
* <a href='g3doc/defining_your_own_model.md'> | |
Defining your own model architecture</a><br> | |
* <a href='g3doc/using_your_own_dataset.md'> | |
Bringing in your own dataset</a><br> | |
* <a href='g3doc/evaluation_protocols.md'> | |
Supported object detection evaluation protocols</a><br> | |
* <a href='g3doc/oid_inference_and_evaluation.md'> | |
Inference and evaluation on the Open Images dataset</a><br> | |
* <a href='g3doc/instance_segmentation.md'> | |
Run an instance segmentation model</a><br> | |
* <a href='g3doc/challenge_evaluation.md'> | |
Run the evaluation for the Open Images Challenge 2018/2019</a><br> | |
* <a href='g3doc/tpu_compatibility.md'> | |
TPU compatible detection pipelines</a><br> | |
* <a href='g3doc/running_on_mobile_tensorflowlite.md'> | |
Running object detection on mobile devices with TensorFlow Lite</a><br> | |
* <a href='g3doc/context_rcnn.md'> | |
Context R-CNN documentation for data preparation, training, and export</a><br> | |
## Getting Help | |
To get help with issues you may encounter using the Tensorflow Object Detection | |
API, create a new question on [StackOverflow](https://stackoverflow.com/) with | |
the tags "tensorflow" and "object-detection". | |
Please report bugs (actually broken code, not usage questions) to the | |
tensorflow/models GitHub | |
[issue tracker](https://github.com/tensorflow/models/issues), prefixing the | |
issue name with "object_detection". | |
Please check [FAQ](g3doc/faq.md) for frequently asked questions before reporting | |
an issue. | |
## Release information | |
### June 17th, 2020 | |
We have released [Context R-CNN](https://arxiv.org/abs/1912.03538), a model that | |
uses attention to incorporate contextual information images (e.g. from | |
temporally nearby frames taken by a static camera) in order to improve accuracy. | |
Importantly, these contextual images need not be labeled. | |
* When applied to a challenging wildlife detection dataset ([Snapshot Serengeti](http://lila.science/datasets/snapshot-serengeti)), | |
Context R-CNN with context from up to a month of images outperforms a | |
single-frame baseline by 17.9% mAP, and outperforms S3D (a 3d convolution | |
based baseline) by 11.2% mAP. | |
* Context R-CNN leverages temporal context from the unlabeled frames of a | |
novel camera deployment to improve performance at that camera, boosting | |
model generalizeability. | |
We have provided code for generating data with associated context | |
[here](g3doc/context_rcnn.md), and a sample config for a Context R-CNN | |
model [here](samples/configs/context_rcnn_resnet101_snapshot_serengeti_sync.config). | |
Snapshot Serengeti-trained Faster R-CNN and Context R-CNN models can be found in | |
the [model zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md#snapshot-serengeti-camera-trap-trained-models). | |
A colab demonstrating Context R-CNN is provided | |
[here](colab_tutorials/context_rcnn_tutorial.ipynb). | |
<b>Thanks to contributors</b>: Sara Beery, Jonathan Huang, Guanhang Wu, Vivek | |
Rathod, Ronny Votel, Zhichao Lu, David Ross, Pietro Perona, Tanya Birch, and | |
the Wildlife Insights AI Team. | |
### May 19th, 2020 | |
We have released [MobileDets](https://arxiv.org/abs/2004.14525), a set of | |
high-performance models for mobile CPUs, DSPs and EdgeTPUs. | |
* MobileDets outperform MobileNetV3+SSDLite by 1.7 mAP at comparable mobile | |
CPU inference latencies. MobileDets also outperform MobileNetV2+SSDLite by | |
1.9 mAP on mobile CPUs, 3.7 mAP on EdgeTPUs and 3.4 mAP on DSPs while | |
running equally fast. MobileDets also offer up to 2x speedup over MnasFPN on | |
EdgeTPUs and DSPs. | |
For each of the three hardware platforms we have released model definition, | |
model checkpoints trained on the COCO14 dataset and converted TFLite models in | |
fp32 and/or uint8. | |
<b>Thanks to contributors</b>: Yunyang Xiong, Hanxiao Liu, Suyog Gupta, Berkin | |
Akin, Gabriel Bender, Pieter-Jan Kindermans, Mingxing Tan, Vikas Singh, Bo Chen, | |
Quoc Le, Zhichao Lu. | |
### May 7th, 2020 | |
We have released a mobile model with the | |
[MnasFPN head](https://arxiv.org/abs/1912.01106). | |
* MnasFPN with MobileNet-V2 backbone is the most accurate (26.6 mAP at 183ms | |
on Pixel 1) mobile detection model we have released to date. With | |
depth-multiplier, MnasFPN with MobileNet-V2 backbone is 1.8 mAP higher than | |
MobileNet-V3-Large with SSDLite (23.8 mAP vs 22.0 mAP) at similar latency | |
(120ms) on Pixel 1. | |
We have released model definition, model checkpoints trained on the COCO14 | |
dataset and a converted TFLite model. | |
<b>Thanks to contributors</b>: Bo Chen, Golnaz Ghiasi, Hanxiao Liu, Tsung-Yi | |
Lin, Dmitry Kalenichenko, Hartwig Adam, Quoc Le, Zhichao Lu, Jonathan Huang, Hao | |
Xu. | |
### Nov 13th, 2019 | |
We have released MobileNetEdgeTPU SSDLite model. | |
* SSDLite with MobileNetEdgeTPU backbone, which achieves 10% mAP higher than | |
MobileNetV2 SSDLite (24.3 mAP vs 22 mAP) on a Google Pixel4 at comparable | |
latency (6.6ms vs 6.8ms). | |
Along with the model definition, we are also releasing model checkpoints trained | |
on the COCO dataset. | |
<b>Thanks to contributors</b>: Yunyang Xiong, Bo Chen, Suyog Gupta, Hanxiao Liu, | |
Gabriel Bender, Mingxing Tan, Berkin Akin, Zhichao Lu, Quoc Le | |
### Oct 15th, 2019 | |
We have released two MobileNet V3 SSDLite models (presented in | |
[Searching for MobileNetV3](https://arxiv.org/abs/1905.02244)). | |
* SSDLite with MobileNet-V3-Large backbone, which is 27% faster than Mobilenet | |
V2 SSDLite (119ms vs 162ms) on a Google Pixel phone CPU at the same mAP. | |
* SSDLite with MobileNet-V3-Small backbone, which is 37% faster than MnasNet | |
SSDLite reduced with depth-multiplier (43ms vs 68ms) at the same mAP. | |
Along with the model definition, we are also releasing model checkpoints trained | |
on the COCO dataset. | |
<b>Thanks to contributors</b>: Bo Chen, Zhichao Lu, Vivek Rathod, Jonathan Huang | |
### July 1st, 2019 | |
We have released an updated set of utils and an updated | |
[tutorial](g3doc/challenge_evaluation.md) for all three tracks of the | |
[Open Images Challenge 2019](https://storage.googleapis.com/openimages/web/challenge2019.html)! | |
The Instance Segmentation metric for | |
[Open Images V5](https://storage.googleapis.com/openimages/web/index.html) and | |
[Challenge 2019](https://storage.googleapis.com/openimages/web/challenge2019.html) | |
is part of this release. Check out | |
[the metric description](https://storage.googleapis.com/openimages/web/evaluation.html#instance_segmentation_eval) | |
on the Open Images website. | |
<b>Thanks to contributors</b>: Alina Kuznetsova, Rodrigo Benenson | |
### Feb 11, 2019 | |
We have released detection models trained on the Open Images Dataset V4 in our | |
detection model zoo, including | |
* Faster R-CNN detector with Inception Resnet V2 feature extractor | |
* SSD detector with MobileNet V2 feature extractor | |
* SSD detector with ResNet 101 FPN feature extractor (aka RetinaNet-101) | |
<b>Thanks to contributors</b>: Alina Kuznetsova, Yinxiao Li | |
### Sep 17, 2018 | |
We have released Faster R-CNN detectors with ResNet-50 / ResNet-101 feature | |
extractors trained on the | |
[iNaturalist Species Detection Dataset](https://github.com/visipedia/inat_comp/blob/master/2017/README.md#bounding-boxes). | |
The models are trained on the training split of the iNaturalist data for 4M | |
iterations, they achieve 55% and 58% mean AP@.5 over 2854 classes respectively. | |
For more details please refer to this [paper](https://arxiv.org/abs/1707.06642). | |
<b>Thanks to contributors</b>: Chen Sun | |
### July 13, 2018 | |
There are many new updates in this release, extending the functionality and | |
capability of the API: | |
* Moving from slim-based training to | |
[Estimator](https://www.tensorflow.org/api_docs/python/tf/estimator/Estimator)-based | |
training. | |
* Support for [RetinaNet](https://arxiv.org/abs/1708.02002), and a | |
[MobileNet](https://ai.googleblog.com/2017/06/mobilenets-open-source-models-for.html) | |
adaptation of RetinaNet. | |
* A novel SSD-based architecture called the | |
[Pooling Pyramid Network](https://arxiv.org/abs/1807.03284) (PPN). | |
* Releasing several [TPU](https://cloud.google.com/tpu/)-compatible models. | |
These can be found in the `samples/configs/` directory with a comment in the | |
pipeline configuration files indicating TPU compatibility. | |
* Support for quantized training. | |
* Updated documentation for new binaries, Cloud training, and | |
[Tensorflow Lite](https://www.tensorflow.org/mobile/tflite/). | |
See also our | |
[expanded announcement blogpost](https://ai.googleblog.com/2018/07/accelerated-training-and-inference-with.html) | |
and accompanying tutorial at the | |
[TensorFlow blog](https://medium.com/tensorflow/training-and-serving-a-realtime-mobile-object-detector-in-30-minutes-with-cloud-tpus-b78971cf1193). | |
<b>Thanks to contributors</b>: Sara Robinson, Aakanksha Chowdhery, Derek Chow, | |
Pengchong Jin, Jonathan Huang, Vivek Rathod, Zhichao Lu, Ronny Votel | |
### June 25, 2018 | |
Additional evaluation tools for the | |
[Open Images Challenge 2018](https://storage.googleapis.com/openimages/web/challenge.html) | |
are out. Check out our short tutorial on data preparation and running evaluation | |
[here](g3doc/challenge_evaluation.md)! | |
<b>Thanks to contributors</b>: Alina Kuznetsova | |
### June 5, 2018 | |
We have released the implementation of evaluation metrics for both tracks of the | |
[Open Images Challenge 2018](https://storage.googleapis.com/openimages/web/challenge.html) | |
as a part of the Object Detection API - see the | |
[evaluation protocols](g3doc/evaluation_protocols.md) for more details. | |
Additionally, we have released a tool for hierarchical labels expansion for the | |
Open Images Challenge: check out | |
[oid_hierarchical_labels_expansion.py](dataset_tools/oid_hierarchical_labels_expansion.py). | |
<b>Thanks to contributors</b>: Alina Kuznetsova, Vittorio Ferrari, Jasper | |
Uijlings | |
### April 30, 2018 | |
We have released a Faster R-CNN detector with ResNet-101 feature extractor | |
trained on [AVA](https://research.google.com/ava/) v2.1. Compared with other | |
commonly used object detectors, it changes the action classification loss | |
function to per-class Sigmoid loss to handle boxes with multiple labels. The | |
model is trained on the training split of AVA v2.1 for 1.5M iterations, it | |
achieves mean AP of 11.25% over 60 classes on the validation split of AVA v2.1. | |
For more details please refer to this [paper](https://arxiv.org/abs/1705.08421). | |
<b>Thanks to contributors</b>: Chen Sun, David Ross | |
### April 2, 2018 | |
Supercharge your mobile phones with the next generation mobile object detector! | |
We are adding support for MobileNet V2 with SSDLite presented in | |
[MobileNetV2: Inverted Residuals and Linear Bottlenecks](https://arxiv.org/abs/1801.04381). | |
This model is 35% faster than Mobilenet V1 SSD on a Google Pixel phone CPU | |
(200ms vs. 270ms) at the same accuracy. Along with the model definition, we are | |
also releasing a model checkpoint trained on the COCO dataset. | |
<b>Thanks to contributors</b>: Menglong Zhu, Mark Sandler, Zhichao Lu, Vivek | |
Rathod, Jonathan Huang | |
### February 9, 2018 | |
We now support instance segmentation!! In this API update we support a number of | |
instance segmentation models similar to those discussed in the | |
[Mask R-CNN paper](https://arxiv.org/abs/1703.06870). For further details refer | |
to [our slides](http://presentations.cocodataset.org/Places17-GMRI.pdf) from the | |
2017 Coco + Places Workshop. Refer to the section on | |
[Running an Instance Segmentation Model](g3doc/instance_segmentation.md) for | |
instructions on how to configure a model that predicts masks in addition to | |
object bounding boxes. | |
<b>Thanks to contributors</b>: Alireza Fathi, Zhichao Lu, Vivek Rathod, Ronny | |
Votel, Jonathan Huang | |
### November 17, 2017 | |
As a part of the Open Images V3 release we have released: | |
* An implementation of the Open Images evaluation metric and the | |
[protocol](g3doc/evaluation_protocols.md#open-images). | |
* Additional tools to separate inference of detection and evaluation (see | |
[this tutorial](g3doc/oid_inference_and_evaluation.md)). | |
* A new detection model trained on the Open Images V2 data release (see | |
[Open Images model](g3doc/detection_model_zoo.md#open-images-models)). | |
See more information on the | |
[Open Images website](https://github.com/openimages/dataset)! | |
<b>Thanks to contributors</b>: Stefan Popov, Alina Kuznetsova | |
### November 6, 2017 | |
We have re-released faster versions of our (pre-trained) models in the | |
<a href='g3doc/detection_model_zoo.md'>model zoo</a>. In addition to what was | |
available before, we are also adding Faster R-CNN models trained on COCO with | |
Inception V2 and Resnet-50 feature extractors, as well as a Faster R-CNN with | |
Resnet-101 model trained on the KITTI dataset. | |
<b>Thanks to contributors</b>: Jonathan Huang, Vivek Rathod, Derek Chow, Tal | |
Remez, Chen Sun. | |
### October 31, 2017 | |
We have released a new state-of-the-art model for object detection using the | |
Faster-RCNN with the | |
[NASNet-A image featurization](https://arxiv.org/abs/1707.07012). This model | |
achieves mAP of 43.1% on the test-dev validation dataset for COCO, improving on | |
the best available model in the zoo by 6% in terms of absolute mAP. | |
<b>Thanks to contributors</b>: Barret Zoph, Vijay Vasudevan, Jonathon Shlens, | |
Quoc Le | |
### August 11, 2017 | |
We have released an update to the | |
[Android Detect demo](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples/android) | |
which will now run models trained using the Tensorflow Object Detection API on | |
an Android device. By default, it currently runs a frozen SSD w/Mobilenet | |
detector trained on COCO, but we encourage you to try out other detection | |
models! | |
<b>Thanks to contributors</b>: Jonathan Huang, Andrew Harp | |
### June 15, 2017 | |
In addition to our base Tensorflow detection model definitions, this release | |
includes: | |
* A selection of trainable detection models, including: | |
* Single Shot Multibox Detector (SSD) with MobileNet, | |
* SSD with Inception V2, | |
* Region-Based Fully Convolutional Networks (R-FCN) with Resnet 101, | |
* Faster RCNN with Resnet 101, | |
* Faster RCNN with Inception Resnet v2 | |
* Frozen weights (trained on the COCO dataset) for each of the above models to | |
be used for out-of-the-box inference purposes. | |
* A [Jupyter notebook](colab_tutorials/object_detection_tutorial.ipynb) for | |
performing out-of-the-box inference with one of our released models | |
* Convenient [local training](g3doc/running_locally.md) scripts as well as | |
distributed training and evaluation pipelines via | |
[Google Cloud](g3doc/running_on_cloud.md). | |
<b>Thanks to contributors</b>: Jonathan Huang, Vivek Rathod, Derek Chow, Chen | |
Sun, Menglong Zhu, Matthew Tang, Anoop Korattikara, Alireza Fathi, Ian Fischer, | |
Zbigniew Wojna, Yang Song, Sergio Guadarrama, Jasper Uijlings, Viacheslav | |
Kovalevskyi, Kevin Murphy | |