AI tokens: What is a token at ChatGPT, Gemini and Co.?

When it comes to generative AI, i.e. artificial intelligence that creates content, a certain number of tokens is often important for its strength, but also for its ultimate use. But what are AI tokens? What does the amount of possible tokens per command entered (“prompt”) and response issued say? What can be imagined by the token cost that a given subscription or API usage brings? And what is the difference between tokens and the token ID? Below I have summarized all the important facts on the topic for you.

What is an AI token at ChatGPT, Google Gemini and Co.? How many tokens does my prompt have? And what is the token ID for individual words? You can get answers to these and other questions here. (The preview image and this image were created with Microsoft Copilot.)
What is an AI token at ChatGPT, Google Gemini and Co.? How many tokens does my prompt have? And what is the token ID for individual words? You can get answers to these and other questions here. (The preview image and this image were created with Microsoft Copilot.)

What are AI tokens?

Individual tokens should not necessarily be thought of as currencies that can be credited 1:1 to certain entries or characters. They are more approximate or rounded values. They can also vary depending on the language. The “native language” of most large AIs is English, which is why inputs in this language mean comparatively fewer tokens than inputs in other languages ​​– such as German. Shorter and simpler inputs and outputs are worth fewer tokens than long, complex inputs and outputs.

Average token values ​​at ChatGPT

If you want to determine how many or how complex prompts you can create with a certain amount of tokens (e.g. those that correspond to a certain budget), this general description is of little use. It only helps to save tokens because it helps to understand that content entered in a shorter form requires less processing effort than more complex and detailed entries.

OpenAI therefore provides a few guiding principles for using ChatGPT that you can follow. This gives you the opportunity to estimate the amount of tokens required for prompts in advance and to imagine the costs incurred in the respective subscription or when using the ChatGPT API. Here are the possible values ​​that OpenAI specifies for ChatGPT tokens:

  • 1 token is around 4 characters in English
  • So 1 token is about 3/4 of an average English word
  • 100 tokens are around 75 words in English

There are also the following estimates for the English language:

  • One or two sets correspond to around 30 tokens
  • One paragraph is approximately 100 tokens
  • A text with 1.500 words is around 2.048 tokens

As stated, these are only estimates. They can differ, especially when using longer words, but also when using other languages. OpenAI therefore offers its own web tool, the Tokenizer, for the more precise calculation of tokens. This states, for example, that the German sentence “What are AI tokens?” consists not only of 30 characters, but also of 7 tokens. Here you can try out the tokenizer for ChatGPT yourself.

The limitation of tokens for input and output

One may ask: What's the point of all this? Well, developer companies of generative AIs can use token amounts to indicate how complex an AI can “think”, i.e. how receptive it is and how extensive the possible answers can be. If an AI is limited to a few tokens, then it is not very strong.

However, if the AI ​​can accept or process a large number of tokens per input and then output numerous tokens as a response, it is considered strong - the fact that the input and output match in terms of content and that the output content makes sense must of course also be taken into account. However, if this is the case, the complexity of the AI ​​increases with the possible number of tokens.

This also explains the usage costs, for example of the ChatGPT API. Entries for GPT-4 Turbo currently cost $0,01 per 1.000 tokens entered and $0,03 per 1.000 tokens issued. For GPT-4, input returns $0,03 per 1.000 tokens and $0,06 per 1.000 tokens issued. This is how the use of chatbots and multimodal AIs can be monetized. Because not every request and answer can be kept extremely short. And the evaluation of PDFs and the questions answered are token-heavy. 

Google Gemini 1.5 with up to 1 million tokens

Finally I got you yes already shown, that Google has renamed its “Bard” AI to “Gemini”. Gemini 1.0 was also released and paid access to an Ultra version was introduced. Not much later, the Gemini 1.5 model, which was not yet available to the general public, was introduced. This should be able to handle up to 1.000.000 tokens (input + output) per prompt. According to the explanation given above, this statement clearly shows how developed this model is and how complex it can “think”.

According to Google, understanding long contexts and especially media within a single prompt is still on an experimental basis. Those who can test Gemini 1.5 are still limited to 128.000 tokens per prompt by default (corresponds to “GPT-4 Turbo”). Only a small group of testers can already access the 1 million token model. That should be over 700.000 words or over 30.000 lines of code - as well as 11 hours of audio or 1 hour of video apart from text input. 

But why audio and video? Because Gemini 1.5 is not just a chatbot, but a multimodal AI model. In addition to text information, it can also handle images, videos and other media. Google offers various examples of this in video form, such as examining the transcript of radio traffic from the Apollo 11 mission (first moon landing). After evaluating the corresponding PDF, a drawing was uploaded and asked which scene from the transcript it described. The AI ​​was able to assign them correctly.

Another video shows the evaluation of a 44-minute film within Gemini 1.5. When evaluating the film for the following prompt queries, 696.417 tokens were already used. It was possible to successfully ask at which timecode a certain scene (described as text) can be found. Furthermore, a drawing could be uploaded as a scene description and asked for its timecode. Here too, the multimodal AI model found the right data.

Further details and examples can be found in the corresponding Blog Post to Gemini 1.5 from Google.

What is a Token ID?

Now you have to briefly forget everything you just learned about tokens. The number of tokens as a measurement of the complexity of media, prompts and outputs does not play a direct role here. Other numerical values ​​are used that have a different meaning. This is just a note because I was briefly confused while researching. Because the token quantity of a word (according to the values ​​above, around 1,4 tokens per word) has nothing to do with its token ID.

Because the token ID is, as the name suggests, an identification number. It assigns a specific value to the word, the letter of an abbreviation or the individual elements of an inflected word. This is compared with the AI ​​model and then the most likely combination of token IDs is returned as a response. This is how the digital neural network works - it doesn't actually "think" but instead works out the most likely sequence of words and word parts that fits the input and forms the answer from that.

To put it a little more figuratively: Token IDs are the AI ​​language into which inputs are converted to find an appropriate AI response, which in turn is converted back into human language.

The token ID using ChatGPT as an example

That certainly sounds very theoretical and complicated. And admittedly, I didn't fully understand the situation at first because of a similar description. A Example, which is given by OpenAI for the operation of the chatbot ChatGPT, helped me to understand it better. It also shows the criteria according to which token IDs for the same word can change. I've summarized it for you:

The example sentence in English is “My favorite color is red.” The point at the end is worth 13 tokens. The last word before (“red”) 2266 tokens. However, if the “red” is capitalized (“Red”), it is more unusual and is therefore worth 2297 tokens. If the sentence is changed to “Red is my favorite color.”, the value for the point remains at 13; But the one for “Red” at the beginning increases to 7738. The “is” is also so universal that its value remains at 318 everywhere.

This clearly shows that individual words are associated with a different context or with a different meaning in the same context, depending on their use and position in the prompt. So they are translated into a different token ID, which in turn causes a different response from the AI. This also explains why switching requests results in different output even though the content is the same. Furthermore, the weighting of individual content can be changed so that the answer follows it rather than other parts of the text.

Check the token IDs of your own ChatGPT input: Here's how!

Above I linked the OpenAI Tokenizer to calculate the tokens for your request. In addition to the counting functions for the characters entered and the tokens used, this also offers an analysis of the token IDs. My example sentence “What are AI tokens?” with its 7 tokens and 30 characters is broken down into the following individual elements: 

[What], [are], [actually], [K], [I], [Tokens], [?] - The token IDs for these individual elements are as follows: 27125, 12868, 84980, 735, 40, 59266, 30. The single “I” without an associated space is not very complex and has an ID of 40, the question mark has a 30. The word “actually” has the largest ID. The English sentence “What are AI Tokens though?” with 6 tokens and 26 characters offers the following values: 3923, 527, 15592, 59266, 3582, 30.

Conclusion on the topics of AI tokens and token ID

The number of possible tokens when using chatbots and multimodal AIs indicates how long or complex the inputs and outputs can be. From long, detailed text prompts to the evaluation of entire films, a lot is already possible and is already specified with token amounts in the millions - while simple everyday questions to chatbots hardly end up in the double-digit token range. However, long answers can reach into the three-digit range, which should also be taken into account when using it.

At the same time, there is the value of the token ID, which has less to do with the number of letters. The ID, which is equivalent to a word in AI language, results more from the frequency of use of the word, the abbreviation or the symbol as well as the positioning in the respective sentence. The more complex or unusual, the higher the ID. For processing, a larger training set and a larger area of ​​the model network must be accessed. This is the AI ​​equivalent of human knowledge. It needs to be larger to be able to answer more complex questions.

Did you like the article and did the instructions on the blog help you? Then I would be happy if you the blog via a Steady Membership would support.

Post a comment

Your e-mail address will not be published. Required fields are marked with * marked

In the Sir Apfelot Blog you will find advice, instructions and reviews on Apple products such as the iPhone, iPad, Apple Watch, AirPods, iMac, Mac Pro, Mac Mini and Mac Studio.