|
--- |
|
library_name: transformers |
|
tags: |
|
- video |
|
- feature |
|
- face |
|
license: cc |
|
base_model: |
|
- ControlNet/MARLIN |
|
pipeline_tag: feature-extraction |
|
--- |
|
|
|
|
|
# MARLIN: Masked Autoencoder for facial video Representation LearnINg |
|
|
|
This repo is the official PyTorch implementation for the paper |
|
[MARLIN: Masked Autoencoder for facial video Representation LearnINg](https://openaccess.thecvf.com/content/CVPR2023/html/Cai_MARLIN_Masked_Autoencoder_for_Facial_Video_Representation_LearnINg_CVPR_2023_paper) (CVPR 2023) ([arXiv](https://arxiv.org/abs/2211.06627)). |
|
|
|
|
|
## Use `transformers` (HuggingFace) for Feature Extraction |
|
|
|
Requirements: |
|
- Python |
|
- PyTorch |
|
- transformers |
|
- einops |
|
|
|
Currently the huggingface model is only for direct feature extraction without any video pre-processing (e.g. face detection, cropping, strided window, etc). |
|
|
|
|
|
```python |
|
import torch |
|
from transformers import AutoModel |
|
|
|
model = AutoModel.from_pretrained( |
|
"ControlNet/marlin_vit_large_ytf", # or other variants |
|
trust_remote_code=True |
|
) |
|
tensor = torch.rand([1, 3, 16, 224, 224]) # (B, C, T, H, W) |
|
output = model(tensor) # torch.Size([1, 1568, 384]) |
|
``` |
|
|
|
## License |
|
|
|
This project is under the CC BY-NC 4.0 license. See [LICENSE](LICENSE) for details. |
|
|
|
## References |
|
If you find this work useful for your research, please consider citing it. |
|
```bibtex |
|
@inproceedings{cai2022marlin, |
|
title = {MARLIN: Masked Autoencoder for facial video Representation LearnINg}, |
|
author = {Cai, Zhixi and Ghosh, Shreya and Stefanov, Kalin and Dhall, Abhinav and Cai, Jianfei and Rezatofighi, Hamid and Haffari, Reza and Hayat, Munawar}, |
|
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, |
|
year = {2023}, |
|
month = {June}, |
|
pages = {1493-1504}, |
|
doi = {10.1109/CVPR52729.2023.00150}, |
|
publisher = {IEEE}, |
|
} |
|
``` |
|
|