Spaces:
Running
Running
## Attention-based Extraction of Structured Information from Street View Imagery | |
[](https://paperswithcode.com/sota/optical-character-recognition-on-fsns-test?p=attention-based-extraction-of-structured) | |
[](https://arxiv.org/abs/1704.03549) | |
[](https://github.com/tensorflow/tensorflow/releases/tag/v1.15.0) | |
*A TensorFlow model for real-world image text extraction problems.* | |
This folder contains the code needed to train a new Attention OCR model on the | |
[FSNS dataset][FSNS] dataset to transcribe street names in France. You can | |
also use it to train it on your own data. | |
More details can be found in our paper: | |
["Attention-based Extraction of Structured Information from Street View | |
Imagery"](https://arxiv.org/abs/1704.03549) | |
## Contacts | |
Authors | |
* Zbigniew Wojna (zbigniewwojna@gmail.com) | |
* Alexander Gorban (gorban@google.com) | |
Maintainer: Xavier Gibert [@xavigibert](https://github.com/xavigibert) | |
## Requirements | |
1. Install the TensorFlow library ([instructions][TF]). For example: | |
``` | |
python3 -m venv ~/.tensorflow | |
source ~/.tensorflow/bin/activate | |
pip install --upgrade pip | |
pip install --upgrade tensorflow-gpu=1.15 | |
``` | |
2. At least 158GB of free disk space to download the FSNS dataset: | |
``` | |
cd research/attention_ocr/python/datasets | |
aria2c -c -j 20 -i ../../../street/python/fsns_urls.txt | |
cd .. | |
``` | |
3. 16GB of RAM or more; 32GB is recommended. | |
4. `train.py` works with both CPU and GPU, though using GPU is preferable. It has been tested with a Titan X and with a GTX980. | |
[TF]: https://www.tensorflow.org/install/ | |
[FSNS]: https://github.com/tensorflow/models/tree/master/research/street | |
## How to use this code | |
To run all unit tests: | |
``` | |
cd research/attention_ocr/python | |
find . -name "*_test.py" -printf '%P\n' | xargs python3 -m unittest | |
``` | |
To train from scratch: | |
``` | |
python train.py | |
``` | |
To train a model using pre-trained Inception weights as initialization: | |
``` | |
wget http://download.tensorflow.org/models/inception_v3_2016_08_28.tar.gz | |
tar xf inception_v3_2016_08_28.tar.gz | |
python train.py --checkpoint_inception=./inception_v3.ckpt | |
``` | |
To fine tune the Attention OCR model using a checkpoint: | |
``` | |
wget http://download.tensorflow.org/models/attention_ocr_2017_08_09.tar.gz | |
tar xf attention_ocr_2017_08_09.tar.gz | |
python train.py --checkpoint=model.ckpt-399731 | |
``` | |
## How to use your own image data to train the model | |
You need to define a new dataset. There are two options: | |
1. Store data in the same format as the FSNS dataset and just reuse the | |
[python/datasets/fsns.py](https://github.com/tensorflow/models/blob/master/research/attention_ocr/python/datasets/fsns.py) | |
module. E.g., create a file datasets/newtextdataset.py: | |
``` | |
import fsns | |
DEFAULT_DATASET_DIR = 'path/to/the/dataset' | |
DEFAULT_CONFIG = { | |
'name': | |
'MYDATASET', | |
'splits': { | |
'train': { | |
'size': 123, | |
'pattern': 'tfexample_train*' | |
}, | |
'test': { | |
'size': 123, | |
'pattern': 'tfexample_test*' | |
} | |
}, | |
'charset_filename': | |
'charset_size.txt', | |
'image_shape': (150, 600, 3), | |
'num_of_views': | |
4, | |
'max_sequence_length': | |
37, | |
'null_code': | |
42, | |
'items_to_descriptions': { | |
'image': | |
'A [150 x 600 x 3] color image.', | |
'label': | |
'Characters codes.', | |
'text': | |
'A unicode string.', | |
'length': | |
'A length of the encoded text.', | |
'num_of_views': | |
'A number of different views stored within the image.' | |
} | |
} | |
def get_split(split_name, dataset_dir=None, config=None): | |
if not dataset_dir: | |
dataset_dir = DEFAULT_DATASET_DIR | |
if not config: | |
config = DEFAULT_CONFIG | |
return fsns.get_split(split_name, dataset_dir, config) | |
``` | |
You will also need to include it into the `datasets/__init__.py` and specify the | |
dataset name in the command line. | |
``` | |
python train.py --dataset_name=newtextdataset | |
``` | |
Please note that eval.py will also require the same flag. | |
To learn how to store a data in the FSNS | |
format please refer to the https://stackoverflow.com/a/44461910/743658. | |
2. Define a new dataset format. The model needs the following data to train: | |
- images: input images, shape [batch_size x H x W x 3]; | |
- labels: ground truth label ids, shape=[batch_size x seq_length]; | |
- labels_one_hot: labels in one-hot encoding, shape [batch_size x seq_length x num_char_classes]; | |
Refer to [python/data_provider.py](https://github.com/tensorflow/models/blob/master/research/attention_ocr/python/data_provider.py#L33) | |
for more details. You can use [python/datasets/fsns.py](https://github.com/tensorflow/models/blob/master/research/attention_ocr/python/datasets/fsns.py) | |
as the example. | |
## How to use a pre-trained model | |
The inference part was not released yet, but it is pretty straightforward to | |
implement one in Python or C++. | |
The recommended way is to use the [Serving infrastructure][serving]. | |
Alternatively you can: | |
1. define a placeholder for images (or use directly an numpy array) | |
2. [create a graph ](https://github.com/tensorflow/models/blob/master/research/attention_ocr/python/eval.py#L60) | |
``` | |
endpoints = model.create_base(images_placeholder, labels_one_hot=None) | |
``` | |
3. [load a pretrained model](https://github.com/tensorflow/models/blob/master/research/attention_ocr/python/model.py#L494) | |
4. run computations through the graph: | |
``` | |
predictions = sess.run(endpoints.predicted_chars, | |
feed_dict={images_placeholder:images_actual_data}) | |
``` | |
5. Convert character IDs (predictions) to UTF8 using the provided charset file. | |
Please note that tensor names may change overtime and old stored checkpoints can | |
become unloadable. In many cases such backward incompatible changes can be | |
fixed with a [string substitution][1] to update the checkpoint itself or using a | |
custom var_list with [assign_from_checkpoint_fn][2]. For anything | |
other than a one time experiment please use the [TensorFlow Serving][serving]. | |
[1]: https://github.com/tensorflow/tensorflow/blob/aaf7adc/tensorflow/contrib/rnn/python/tools/checkpoint_convert.py | |
[2]: https://www.tensorflow.org/api_docs/python/tf/contrib/framework/assign_from_checkpoint_fn | |
[serving]: https://tensorflow.github.io/serving/serving_basic | |
## Disclaimer | |
This code is a modified version of the internal model we used for our paper. | |
Currently it reaches 83.79% full sequence accuracy after 400k steps of training. | |
The main difference between this version and the version used in the paper - for | |
the paper we used a distributed training with 50 GPU (K80) workers (asynchronous | |
updates), the provided checkpoint was created using this code after ~6 days of | |
training on a single GPU (Titan X) (it reached 81% after 24 hours of training), | |
the coordinate encoding is disabled by default. | |