GPT-J understands and processes text by breaking it down into tokens. As a rough rule of thumb, 1 token is approximately 4 characters. For example, the word “television” gets broken up into the tokens “tele”, “vis” and “ion”, while a short and common word like “dog” is a single token. Tokens are important to understand because GPT-J, like other language models, have a maximum context length of 2048 tokens, or roughly 1500 words. The context length includes both the text prompt and generated response.