AI Basics for Business Owners — Lesson 3

LLMs, Tokens, and Context Windows

14 min read

Learning Objectives

  • 1Explain how large language models generate text.
  • 2Understand tokens, context windows, and their practical implications.
  • 3Evaluate cost and quality tradeoffs between different models.

How LLMs generate text

Large Language Models generate text by predicting the most likely next word based on everything that came before it. They do this thousands of times per response, one token at a time. The result reads like fluent human writing, but the mechanism is statistical prediction, not comprehension.

This distinction matters because it explains both why LLMs are remarkably useful and why they sometimes produce confident nonsense. They are exceptional at generating text that follows the patterns of good writing, useful analysis, and clear explanation. They are unreliable at ensuring factual accuracy because they predict plausible text, not verified truth.

For business users, the practical implication is: use LLMs for drafting, brainstorming, summarizing, restructuring, and analysis. Do not trust their output as verified fact for important decisions. The more consequential the output, the more human review it requires.

Tokens and pricing

A token is a chunk of text that the model processes as a single unit — roughly three-quarters of a word in English. The word "automation" is two tokens. A typical business email is about 200-400 tokens. AI services charge based on tokens processed: tokens in the prompt (input) plus tokens in the response (output).

Understanding token pricing helps you estimate AI costs for business workflows. If each customer support response uses 1,000 tokens and you handle 500 tickets per day, your daily token usage is 500,000 tokens. At typical pricing, this might cost $5-50 per day depending on the model.

Different models have different costs. More capable models cost more per token. For many business tasks, a less expensive model performs well enough. Match the model capability to the task complexity — do not use the most expensive model for simple classification when a cheaper one works equally well.

Context windows and memory

The context window is the maximum amount of text an LLM can consider at once — the combined size of the prompt and the response. A model with a 128,000-token context window can process roughly 200 pages of text in a single conversation. A model with a 4,000-token window is limited to about six pages.

Context window size determines what you can do in a single interaction. Summarizing a long document requires a context window large enough to hold the entire document plus the instructions and response. Analyzing a dataset requires fitting the data within the context window.

LLMs do not have memory between conversations. Each conversation starts fresh. If you had a productive discussion yesterday and return today, the model does not remember it. Some applications build "memory" by storing conversation history and re-including it in each new prompt, but this consumes context window space.

For business applications, context window size affects which tasks are practical. Document analysis, long report generation, and multi-document comparison require large context windows. Simple question answering and short content generation work fine with smaller windows.

Case Study

The token budget surprise

Situation

A customer service team deployed an AI assistant that used GPT-4 to draft responses to support tickets. Each ticket included the customer message, their account history, recent interactions, and the product knowledge base — about 8,000 tokens per ticket. With 200 tickets per day and output averaging 500 tokens, their monthly AI cost was $4,200, far exceeding the $500/month they budgeted.

Analysis

The team had not calculated token usage before deployment. Reducing the context by including only relevant account history and using a less expensive model for routine tickets reduced costs to $800/month while maintaining response quality for most tickets.

Takeaway

Calculate expected token usage before deploying AI at scale. Include context, instructions, and output in the calculation. Use the least expensive model that achieves acceptable quality.

Reflection Questions

  • 1. If you use an AI tool like ChatGPT, have you ever noticed the conversation becoming less coherent in very long exchanges? That is the context window filling up.
  • 2. For an AI project you are considering, estimate the input and output size. What model tier would balance cost and quality?

Key Takeaways

  • LLMs predict plausible text, not verified truth — always review important output.
  • Tokens are the billing unit for AI — estimate usage before deploying at scale.
  • Context windows limit how much information the model can consider at once.
  • Match model capability to task complexity — do not overspend on simple tasks.