Ahmadzei's picture
update 1
57bdca5
If you'd like to learn more about our coding philosophy for models, check out our Repeat Yourself blog post.
Provide state-of-the-art models with performances as close as possible to the original models:
We provide at least one example for each architecture which reproduces a result provided by the official authors
of said architecture.
The code is usually as close to the original code base as possible which means some PyTorch code may be not as
pytorchic as it could be as a result of being converted TensorFlow code and vice versa.
A few other goals:
Expose the models' internals as consistently as possible:
We give access, using a single API, to the full hidden-states and attention weights.
The preprocessing classes and base model APIs are standardized to easily switch between models.
Incorporate a subjective selection of promising tools for fine-tuning and investigating these models:
A simple and consistent way to add new tokens to the vocabulary and embeddings for fine-tuning.
Simple ways to mask and prune Transformer heads.
Easily switch between PyTorch, TensorFlow 2.0 and Flax, allowing training with one framework and inference with another.
Main concepts
The library is built around three types of classes for each model:
Model classes can be PyTorch models (torch.nn.Module), Keras models (tf.keras.Model) or JAX/Flax models (flax.linen.Module) that work with the pretrained weights provided in the library.
Configuration classes store the hyperparameters required to build a model (such as the number of layers and hidden size).