Forefront
Search…
Fine-tune GPT-J
Train GPT-J on your specific task.

Introduction

Fine-tuning is a powerful technique to create a new model that's specific to your use case. Fine-tuning lets you get more out of GPT-J by providing:
    1.
    Higher quality results than prompt design
    2.
    Ability to train on many more example than can fit in a prompt
    3.
    Token savings due to shorter prompts
    4.
    Lower latency requests
GPT-J is pre-trained on a vast amount of text from the open internet. When given a prompt with a few examples, it can often understand what task you are trying to perform and generate a useful completion. This is called "few-shot learning".
Fine-tuning improves on few-shot learning by training on many more examples than can fit in a prompt, letting you achieve better results on a wide number of tasks. Once a model has been fine-tuned, you won't need to provide examples in the prompt anymore. This saves costs and enables lower-latency requests.
At a high level, fine-tuning involves the following steps:
    1.
    Prepare training data
    2.
    Train a new fine-tuned model
    3.
    Use your fine-tuned model
Whereas fine-tuning GPT-3 incurs increased usage costs, fine-tuned GPT-J models on Forefront are the same flat-rate per replica cost as vanilla GPT-J. Fine-tuning the model itself is absolutely free.

Prepare training data

Training data is how you teach GPT-J what you'd like it to say.
Your data must be a single text file. The training examples in your text file should generally consist of a single input example and its associated output.
1
Company Name: Nike
2
3
Product Description: Nike Air Jordan is an American brand of basketball shoes, athletic, casual, and style clothing produced by Nike. Founded in Chicago, Air Jordan was created for Hall of Fame former basketball player Michael Jordan during his time with the Chicago Bulls.
4
5
Blog Idea: Jordan Brand History - A blog about the history of the brand.
Copied!
Fine-tuning performs better with more examples. To fine-tune a model that performs better than using a high-quality prompt with vanilla GPT-J, you should provide at least one hundred high-quality examples, ideally vetted by a human knowledgeable in the given task. From there, performance tends to linearly increase with every doubling of the number of examples.
In most cases, you'll want to use some kind of separator at the end of the prompts in your dataset to make it clear to the model where each training example begins and ends. A simple separator which works well in almost all cases is <|endoftext|>.
When using the fine-tuned model at inference time, all the data modifications applied on the dataset during fine-tuning should be applied. This means that the same separator should be used and the prompt should be formatted in the same way as it is in your training examples.
For more information on preparing datasets, check out our guide.

Train a new fine-tuned model

Once you have sufficient training data formatted in a single text file, training can easily be executed in your dashboard.

Create a new deployment

Select 'Fine-tuned GPT-J'

Upload text file

If your text file is 100MB or greater, get in touch with our team for custom support.

Set training duration

A good rule of thumb for smaller datasets is to train 5-10 minutes every 100kb. For text files in the order of megabytes, you’ll want to train 45-60 minutes for every 10MB.
The maximum training duration is 600 minutes.

Set number of checkpoints

A checkpoint is a saved model version that you can deploy. You’ll want to set a number of checkpoints that evenly divides the training duration.
The maximum number of checkpoints is 15.

Add test prompts

Test prompts are prompts that every checkpoint will automatically provide completions for so you can compare the performance of the different models. Test prompts should be pieces of text that are not found in your training text file. This allows you to see how good the model is at understanding your topic and prevents the model from regurgitating information it has seen in your training set.
You can also customize model parameters for your specific task.
The maximum number of test prompts is 15.
Once your test prompts are set, you can press 'Fine-tune' and your fine-tuned model will begin training. You may notice the estimated completion time is longer than your specified training time. This is because it takes time to load the base weights prior to training.
As checkpoints being to appear, you can press 'View test prompts' to start comparing performance between your different checkpoints.

Use your fine-tuned model

Once you find a well-performing checkpoint, you can press 'Deploy' to make the model available via HTTP API or the Playground. Deploying a fine-tuned model typically takes 5 minutes before it's available.
You can also control how many replicas are running to scale with increased usage. Auto-scaling will be available within the next few weeks.
Last modified 1d ago