|
If you'd like to learn more about our coding philosophy for models, check out our Repeat Yourself blog post. |
|
|
|
Provide state-of-the-art models with performances as close as possible to the original models: |
|
|
|
We provide at least one example for each architecture which reproduces a result provided by the official authors |
|
of said architecture. |
|
|
|
The code is usually as close to the original code base as possible which means some PyTorch code may be not as |
|
pytorchic as it could be as a result of being converted TensorFlow code and vice versa. |
|
|
|
A few other goals: |
|
|
|
Expose the models' internals as consistently as possible: |
|
|
|
We give access, using a single API, to the full hidden-states and attention weights. |
|
|
|
The preprocessing classes and base model APIs are standardized to easily switch between models. |
|
|
|
Incorporate a subjective selection of promising tools for fine-tuning and investigating these models: |
|
|
|
A simple and consistent way to add new tokens to the vocabulary and embeddings for fine-tuning. |
|
|
|
Simple ways to mask and prune Transformer heads. |
|
|
|
Easily switch between PyTorch, TensorFlow 2.0 and Flax, allowing training with one framework and inference with another. |
|
|
|
Main concepts |
|
The library is built around three types of classes for each model: |
|
|
|
Model classes can be PyTorch models (torch.nn.Module), Keras models (tf.keras.Model) or JAX/Flax models (flax.linen.Module) that work with the pretrained weights provided in the library. |
|
Configuration classes store the hyperparameters required to build a model (such as the number of layers and hidden size). |