How to Instantiate a Transformers Model Using the Transformers Library
Heads up!
This summary and transcript were automatically generated using AI with the Free YouTube Transcript Summary Tool by LunaNotes.
Generate a summary for freeIf you found this summary useful, consider buying us a coffee. It would help us a lot!
Introduction
In the world of natural language processing (NLP), transformer models have revolutionized the way we approach tasks such as text generation, translation, and sentiment analysis. With the Transformers library from Hugging Face, instantiating a model has never been easier. In this article, we will delve into how you can create and use a pretrained model from the Hugging Face Hub using the TFAutoModel
class, and we will cover the steps involved in loading configurations and weights, as well as how to train models from scratch.
What is the Transformers Library?
The Transformers library is an open-source framework that provides a wide range of state-of-the-art pretrained models for NLP tasks. It supports frameworks such as PyTorch and TensorFlow, making it versatile for different use cases. The library allows users to seamlessly load models from the Hugging Face Hub or instantiate them from local checkpoints.
Why Use Pretrained Models?
Pretrained models have been trained on large datasets and can be fine-tuned for specific tasks, reducing the time and resources required to develop a model from scratch. They provide a strong starting point and can often achieve better performance than models that are trained from scratch on small datasets.
Instantiating a Model: Step-by-Step Guide
To create a Transformers model, we focus on utilizing the TFAutoModel
class. Below are the steps you need to follow for instantiation.
Step 1: Import Required Libraries
To get started, ensure you have the necessary libraries installed. Here is how to import them in your Python script:
from transformers import TFAutoModel, AutoConfig
Step 2: Selecting a Checkpoint
To instantiate a model, you need to select an appropriate checkpoint name from the Hugging Face Hub. For example, you may choose a BERT checkpoint such as bert-base-cased
, GPT-2, or BART. Each checkpoint corresponds to a specific architecture.
Step 3: Instantiate the Model
The TFAutoModel
class automates the process of selecting the model class based on the given checkpoint. Here’s how you can do that:
model = TFAutoModel.from_pretrained('bert-base-cased')
This command downloads the configuration and weights from the specified checkpoint and automatically initializes the appropriate model class, which in this case is TFBertModel
.
Step 4: Loading a Local Model
If you have a local path with the configuration and weight files, you can load the model as follows:
model = TFAutoModel.from_pretrained('/path/to/local/model')
This function looks for a valid configuration file and weights in the specified directory to instantiate the model.
Step 5: Understanding the Configuration
The configuration of a model serves as a blueprint detailing how to construct the model architecture. To load the configuration, you can use the AutoConfig
class from the Transformers library:
config = AutoConfig.from_pretrained('bert-base-cased')
This command retrieves the configuration for the specified checkpoint, including parameters like:
- Number of layers
- Hidden size
- Vocabulary size
Forbert-base-cased
, the configuration indicates 12 layers, a hidden size of 768, and a vocabulary size of 28,996.
Step 6: Creating a Custom Model Architecture
You can modify the model's architecture by changing parameters in the configuration. For example, to create a BERT model with ten layers instead of 12:
config.num_hidden_layers = 10
model = TFAutoModel(config)
This flexibility enables you to experiment with different architectures quickly and easily.
Step 7: Training the Model
After initializing a model with either pretrained weights or a custom architecture, the next step is training or fine-tuning the model for your specific task. Use the normal training routines defined in PyTorch or TensorFlow to undertake this process.
Step 8: Saving the Trained Model
Once the model is trained or fine-tuned, you can save it for future use. You can leverage the save_pretrained
method as shown below:
model.save_pretrained('./my-bert-model')
This command saves the model to a directory named my-bert-model
in the current working directory.
Step 9: Reloading the Model
To reload a previously saved model, use the following command:
model = TFAutoModel.from_pretrained('./my-bert-model')
This retrieves the model architecture and weights from the saved directory, allowing you to continue training or utilize the model directly.
Conclusion
In conclusion, instantiating a Transformers model using the Hugging Face library is a streamlined and efficient process. By using TFAutoModel
, you can easily retrieve pretrained models or create custom architectures based on your specific needs. The ability to load configurations, modify them, and save trained models provides a robust interface for working with modern NLP tasks. This approach not only simplifies the coding process but also enhances the overall workflow in model development.
How to instantiate a Transformers model? In this video we will look at how we can create and use a model from the Transformers library. As we've seen before, the TFAutoModel class allows you to instantiate a pretrained model
from any checkpoint on the Hugging Face Hub. It will pick the right model class from the library to instantiate the proper architecture and load the weights of the pretrained model inside it.
As we can see, when given a BERT checkpoint, we end up with a TFBertModel, and similarly for GPT-2 or BART. Behind the scenes, this API can take the name of a checkpoint on the Hub, in which case
it will download and cache the configuration file as well as the model weights file. You can also specify the path to a local folder that contains a valid configuration file and a model weights file.
To instantiate the pretrained model, the AutoModel API will first open the configuration file to look at the configuration class that should be used. The configuration class depends on the type of the model (BERT, GPT-2 or BART for instance).
Once it has the proper configuration class, it can instantiate that configuration, which is a blueprint to know how to create the model. It also uses this configuration class to find the proper model class, which is combined
with the loaded configuration, to load the model. This model is not yet our pretrained model as it has just been initialized with random weights.
The last step is to load the weights from the model file inside this model. To easily load the configuration of a model from any checkpoint or a folder containing the configuration folder, we can use the AutoConfig class.
Like the TFAutoModel class, it will pick the right configuration class from the library. We can also use the specific class corresponding to a checkpoint, but we will need to change the code each time we want to try a different model.
As we said before, the configuration of a model is a blueprint that contains all the information necessary to create the model architecture. For instance the BERT model associated with the bert-base-cased checkpoint has 12 layers,
a hidden size of 768, and a vocabulary size of 28,996. Once we have the configuration, we can create a model that has the same architecture as our checkpoint but is randomly initialized.
We can then train it from scratch like any PyTorch module/TensorFlow model. We can also change any part of the configuration by using keyword arguments. The second snippet of code instantiates a randomly initialized BERT model with ten layers
instead of 12. Saving a model once it's trained or fine-tuned is very easy: we just have to use the save_pretrained method.
Here the model will be saved in a folder named my-bert-model inside the current working directory. Such a model can then be reloaded using the from_pretrained method.