File size: 6,029 Bytes
73274cd ac6bdd8 73274cd a7dd31e 73274cd db5fae7 50f2171 db5fae7 50f2171 db5fae7 50f2171 8361efa 73274cd ac6bdd8 73274cd 50f2171 db5fae7 50f2171 73274cd 50f2171 db5fae7 50f2171 73274cd 6c23e49 73274cd |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 |
---
license: apache-2.0
language: en
tags:
- text-to-image
- design
pipeline_tag: text-to-image
---
# Type-R official repository
This repository contains model weights and data resources used in the **Type-R** project. The dataset is designed to support text-to-image generation, OCR, text erasing, editing, and evaluation pipelines used in the Type-R system.
## π Directory structure
β οΈ The code in the [repository](https://github.com/CyberAgentAILab/Type-R) is designed to operate directly using this structure.
<pre>
resources/
βββ weight/
β βββ ocr/ # OCR-related model weights
β β βββ solo.pth # β οΈRequired Manual Downloads
β β βββ masktextspotterv3.pth # β οΈRequired Manual Downloads
β β βββ modelscope
β β βββ craft
β β βββ clova
β β βββ hisam_weight
β βββ text_eraser/ # Text erasure model weights
β β βββ big-lama.pt
β β βββ garnet.pth
β βββ text_editor/ # Text editing model weights
β β βββ anytext.ckpt
β β βββ udifftext
β βββ t2i/ # Text-to-image model weights
β βββ (weights will be cached here)
β ~
βββ data/
β βββ marioevalbench/ # Mario-Eval benchmark dataset
| β βββ hfds
β βββ arial_unicode_ms.ttf # β οΈRequired Manual Downloads
β βββ LiberationSans-Regular.ttf
βββ prompt
βββ example.txt
</pre>
## π β οΈManual download required dataβ οΈ
- `resources/weight/ocr/solo.pth`
- Please download this weight from the official [Deeosolo](https://github.com/ViTAE-Transformer/DeepSolo) implementation.[[link](https://onedrive.live.com/?redeem=aHR0cHM6Ly8xZHJ2Lm1zL3UvcyFBaW1CZ1lWN0pqVGxnY2Q5d2k0MzJ1aXRNZ1RNLXc%5FZT1manVKYm0&cid=E534267B85818129&id=E534267B85818129%2125597&parId=E534267B85818129%2125575&o=OneUp)]
- This weight has ViTAEv2-S as its backbone and is trained on `Synth150K+Total-Text+MLT17+IC13+IC15+TextOCR`.
- `resources/weight/ocr/masktextspotterv3.pth`
- Please download this weight from the official [MaskTextSpotterV3](https://github.com/MhLiao/MaskTextSpotterV3) implementation. [[link](https://drive.google.com/file/d/1XQsikiNY7ILgZvmvOeUf9oPDG4fTp0zs/view)]
- `resources/data/arial_unicode_ms.ttf`
- Since the Arial font cannot be redistributed, please obtain it through your operating system or another legal source. As an alternative, you may use an open font such as Liberation Sans (resources/data/LiberationSans-Regular.ttf). However, please note that we have observed a drop of 1β2 points in OCR accuracy on the Mario-Eval benchmark when using AnyText with Liberation Sans under our best configuration.
## π Dataset details
- `weight/`
- This dicrectory contains pretrained weights used for various modules in the Type-R pipeline
- **ocr/**: Models for OCR detection/recognition.
- **text_eraser/**: Inpainting or erasure modules for removing text.
- **text_editor/**: Models for rendering text into images.
- **t2i/**: Large text-to-image models.
- If the T2I model requires authentication, make sure to log in to Hugging Face (e.g., using huggingface-cli login) before executing the pipeline.
- `data/marioevalbench/`
- The dataset containing prompts and reference images for evaluating Type-R
- **hfds/**: includes prompts, augmented prompts, and images of the Mario-Eval Benchmark
## π License
### Weights
- [DeepSolo](https://github.com/ViTAE-Transformer/DeepSolo): `resources/weight/ocr/solo.pth` β Licensed under [Adelaidet](https://github.com/ViTAE-Transformer/DeepSolo/blob/main/LICENSE)
- [MaskTextSpotterV3](https://github.com/MhLiao/MaskTextSpotterV3): `resources/weight/ocr/masktextspotterv3.pth` β Licensed under [Creative commons](https://github.com/MhLiao/MaskTextSpotterV3/blob/master/LICENSE.md)
- [Paddle](https://github.com/PaddlePaddle/PaddleOCR): `resources/weight/ocr/modelscope` β Licensed under [Apache 2.0](https://github.com/PaddlePaddle/PaddleOCR/blob/main/LICENSE)
- [CRAFT](https://github.com/clovaai/CRAFT-pytorch): `resources/weight/ocr/craft` β Licensed under [MIT License](https://github.com/clovaai/CRAFT-pytorch/blob/master/LICENSE)
- [Clova Recognition](https://github.com/clovaai/deep-text-recognition-benchmark): `resources/weight/ocr/clova` β Licensed under [Apache 2.0](https://github.com/clovaai/deep-text-recognition-benchmark/blob/master/LICENSE.md)
- [Hi-SAM](https://github.com/ymy-k/Hi-SAM): `resources/weight/ocr/hisam_weight` β Licensed under [Apache 2.0](https://github.com/ymy-k/Hi-SAM/blob/main/LICENSE)
- [Lama](https://github.com/advimman/lama): `resources/weight/text_eraser/big-lama.pt` β Licensed under [Apache 2.0](https://github.com/advimman/lama/blob/main/LICENSE)
- [Garnet](https://github.com/advimman/lama): `resources/weight/text_eraser/garnet.pth` β Licensed under [Apache 2.0](https://github.com/advimman/lama/blob/main/LICENSE)
- [AnyText](https://github.com/tyxsspa/AnyText): `resources/weight/text_editor/anytext.ckpt` β Licensed under [Apache 2.0](https://github.com/tyxsspa/AnyText/blob/main/LICENSE)
- [UDiffText](https://github.com/ZYM-PKU/UDiffText): `resources/weight/text_editor/udifftext` β Licensed under [MIT License](https://github.com/ZYM-PKU/UDiffText/blob/main/LICENSE)
### Data
- [Mario-Eval Benchmark](https://github.com/microsoft/unilm/tree/master/textdiffuser): `resources/data/marioevalbench` β Licensed under [MIT License](https://github.com/microsoft/unilm/blob/master/LICENSE)
- Arial Font: `resources/data/arial_unicode_ms.ttf` - Licensed under [License Microsoft fonts](https://www.myfonts.com/a/font/form/microsoft-typography-licensing)
- [Liberation Sans](https://github.com/liberationfonts/liberation-fonts): `resources/data/LiberationSans-Regular.ttf` - Licensed under [OFL 1.1](https://github.com/liberationfonts/liberation-fonts/blob/main/LICENSE)
|