Spaces:
Running
Running
File size: 8,917 Bytes
873d3e4 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 |
[**Installation**](#installing-torchcodec) | [**Simple Example**](#using-torchcodec) | [**Detailed Example**](https://pytorch.org/torchcodec/stable/generated_examples/) | [**Documentation**](https://pytorch.org/torchcodec) | [**Contributing**](CONTRIBUTING.md) | [**License**](#license)
# TorchCodec
TorchCodec is a Python library for decoding video and audio data into PyTorch
tensors, on CPU and CUDA GPU. It also supports audio encoding, and video
encoding will come soon! It aims to be fast, easy to use, and well integrated
into the PyTorch ecosystem. If you want to use PyTorch to train ML models on
videos and audio, TorchCodec is how you turn these into data.
We achieve these capabilities through:
* Pythonic APIs that mirror Python and PyTorch conventions.
* Relying on [FFmpeg](https://www.ffmpeg.org/) to do the decoding and encoding.
TorchCodec uses the version of FFmpeg you already have installed. FFmpeg is a
mature library with broad coverage available on most systems. It is, however,
not easy to use. TorchCodec abstracts FFmpeg's complexity to ensure it is used
correctly and efficiently.
* Returning data as PyTorch tensors, ready to be fed into PyTorch transforms
or used directly to train models.
## Using TorchCodec
Here's a condensed summary of what you can do with TorchCodec. For more detailed
examples, [check out our
documentation](https://pytorch.org/torchcodec/stable/generated_examples/)!
#### Decoding
```python
from torchcodec.decoders import VideoDecoder
device = "cpu" # or e.g. "cuda" !
decoder = VideoDecoder("path/to/video.mp4", device=device)
decoder.metadata
# VideoStreamMetadata:
# num_frames: 250
# duration_seconds: 10.0
# bit_rate: 31315.0
# codec: h264
# average_fps: 25.0
# ... (truncated output)
# Simple Indexing API
decoder[0] # uint8 tensor of shape [C, H, W]
decoder[0 : -1 : 20] # uint8 stacked tensor of shape [N, C, H, W]
# Indexing, with PTS and duration info:
decoder.get_frames_at(indices=[2, 100])
# FrameBatch:
# data (shape): torch.Size([2, 3, 270, 480])
# pts_seconds: tensor([0.0667, 3.3367], dtype=torch.float64)
# duration_seconds: tensor([0.0334, 0.0334], dtype=torch.float64)
# Time-based indexing with PTS and duration info
decoder.get_frames_played_at(seconds=[0.5, 10.4])
# FrameBatch:
# data (shape): torch.Size([2, 3, 270, 480])
# pts_seconds: tensor([ 0.4671, 10.3770], dtype=torch.float64)
# duration_seconds: tensor([0.0334, 0.0334], dtype=torch.float64)
```
#### Clip sampling
```python
from torchcodec.samplers import clips_at_regular_timestamps
clips_at_regular_timestamps(
decoder,
seconds_between_clip_starts=1.5,
num_frames_per_clip=4,
seconds_between_frames=0.1
)
# FrameBatch:
# data (shape): torch.Size([9, 4, 3, 270, 480])
# pts_seconds: tensor([[ 0.0000, 0.0667, 0.1668, 0.2669],
# [ 1.4681, 1.5682, 1.6683, 1.7684],
# [ 2.9696, 3.0697, 3.1698, 3.2699],
# ... (truncated), dtype=torch.float64)
# duration_seconds: tensor([[0.0334, 0.0334, 0.0334, 0.0334],
# [0.0334, 0.0334, 0.0334, 0.0334],
# [0.0334, 0.0334, 0.0334, 0.0334],
# ... (truncated), dtype=torch.float64)
```
You can use the following snippet to generate a video with FFmpeg and tryout
TorchCodec:
```bash
fontfile=/usr/share/fonts/dejavu-sans-mono-fonts/DejaVuSansMono-Bold.ttf
output_video_file=/tmp/output_video.mp4
ffmpeg -f lavfi -i \
color=size=640x400:duration=10:rate=25:color=blue \
-vf "drawtext=fontfile=${fontfile}:fontsize=30:fontcolor=white:x=(w-text_w)/2:y=(h-text_h)/2:text='Frame %{frame_num}'" \
${output_video_file}
```
## Installing TorchCodec
### Installing CPU-only TorchCodec
1. Install the latest stable version of PyTorch following the
[official instructions](https://pytorch.org/get-started/locally/). For other
versions, refer to the table below for compatibility between versions of
`torch` and `torchcodec`.
2. Install FFmpeg, if it's not already installed. Linux distributions usually
come with FFmpeg pre-installed. TorchCodec supports all major FFmpeg versions
in [4, 7].
If FFmpeg is not already installed, or you need a more recent version, an
easy way to install it is to use `conda`:
```bash
conda install "ffmpeg<8"
# or
conda install "ffmpeg<8" -c conda-forge
```
3. Install TorchCodec:
```bash
pip install torchcodec
```
The following table indicates the compatibility between versions of
`torchcodec`, `torch` and Python.
| `torchcodec` | `torch` | Python |
| ------------------ | ------------------ | ------------------- |
| `main` / `nightly` | `main` / `nightly` | `>=3.10`, `<=3.13` |
| `0.6` | `2.8` | `>=3.9`, `<=3.13` |
| `0.5` | `2.7` | `>=3.9`, `<=3.13` |
| `0.4` | `2.7` | `>=3.9`, `<=3.13` |
| `0.3` | `2.7` | `>=3.9`, `<=3.13` |
| `0.2` | `2.6` | `>=3.9`, `<=3.13` |
| `0.1` | `2.5` | `>=3.9`, `<=3.12` |
| `0.0.3` | `2.4` | `>=3.8`, `<=3.12` |
### Installing CUDA-enabled TorchCodec
First, make sure you have a GPU that has NVDEC hardware that can decode the
format you want. Refer to Nvidia's GPU support matrix for more details
[here](https://developer.nvidia.com/video-encode-and-decode-gpu-support-matrix-new).
1. Install Pytorch corresponding to your CUDA Toolkit using the
[official instructions](https://pytorch.org/get-started/locally/). You'll
need the `libnpp` and `libnvrtc` CUDA libraries, which are usually part of
the CUDA Toolkit.
2. Install or compile FFmpeg with NVDEC support.
TorchCodec with CUDA should work with FFmpeg versions in [4, 7].
If FFmpeg is not already installed, or you need a more recent version, an
easy way to install it is to use `conda`:
```bash
conda install "ffmpeg<8"
# or
conda install "ffmpeg<8" -c conda-forge
```
If you are building FFmpeg from source you can follow Nvidia's guide to
configuring and installing FFmpeg with NVDEC support
[here](https://docs.nvidia.com/video-technologies/video-codec-sdk/12.0/ffmpeg-with-nvidia-gpu/index.html).
After installing FFmpeg make sure it has NVDEC support when you list the supported
decoders:
```bash
ffmpeg -decoders | grep -i nvidia
# This should show a line like this:
# V..... h264_cuvid Nvidia CUVID H264 decoder (codec h264)
```
To check that FFmpeg libraries work with NVDEC correctly you can decode a sample video:
```bash
ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i test/resources/nasa_13013.mp4 -f null -
```
3. Install TorchCodec by passing in an `--index-url` parameter that corresponds
to your CUDA Toolkit version, example:
```bash
# This corresponds to CUDA Toolkit version 12.6. It should be the same one
# you used when you installed PyTorch (If you installed PyTorch with pip).
pip install torchcodec --index-url=https://download.pytorch.org/whl/cu126
```
Note that without passing in the `--index-url` parameter, `pip` installs
the CPU-only version of TorchCodec.
## Benchmark Results
The following was generated by running [our benchmark script](./benchmarks/decoders/generate_readme_data.py) on a lightly loaded 22-core machine with an Nvidia A100 with
5 [NVDEC decoders](https://docs.nvidia.com/video-technologies/video-codec-sdk/12.1/nvdec-application-note/index.html#).

The top row is a [Mandelbrot](https://ffmpeg.org/ffmpeg-filters.html#mandelbrot) video
generated from FFmpeg that has a resolution of 1280x720 at 60 fps and is 120 seconds long.
The bottom row is [promotional video from NASA](https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4_small.mp4)
that has a resolution of 960x540 at 29.7 fps and is 206 seconds long. Both videos were
encoded with libx264 and yuv420p pixel format. All decoders, except for TorchVision, used FFmpeg 6.1.2. TorchVision used FFmpeg 4.2.2.
For TorchCodec, the "approx" label means that it was using [approximate mode](https://pytorch.org/torchcodec/stable/generated_examples/approximate_mode.html)
for seeking.
## Contributing
We welcome contributions to TorchCodec! Please see our [contributing
guide](CONTRIBUTING.md) for more details.
## License
TorchCodec is released under the [BSD 3 license](./LICENSE).
However, TorchCodec may be used with code not written by Meta which may be
distributed under different licenses.
For example, if you build TorchCodec with ENABLE_CUDA=1 or use the CUDA-enabled
release of torchcodec, please review CUDA's license here:
[Nvidia licenses](https://docs.nvidia.com/cuda/eula/index.html).
|