Skip to main content

Chat History Compression

When a conversation grows long enough to approach the AI model's context limit, the platform automatically summarizes older messages instead of simply discarding them. This means you can have extended conversations without losing important context from earlier in the chat.

Why It Matters

  • Context is preserved. Key decisions, facts, and action items from older messages are captured in a concise summary.
  • Longer conversations. You can keep chatting well beyond what the model's context window would normally allow.
  • It's automatic. Compression happens behind the scenes when needed — there's nothing you need to configure.

What You'll See

When compression occurs during a conversation, a visual indicator appears in the chat:

  • Collapsed view: A brief note that says "Chat history summarized" along with how many messages were compressed.
  • Expanded view: Click to see the full summary, token statistics showing how much space was saved, and an Undo button.

Undoing Compression

If you want to restore the original messages, click the Undo button on the compression indicator. This brings back all the original messages and removes the summary. This is useful if you want to revisit the exact wording of earlier messages.

How It Works

  1. As you chat, the platform tracks how many tokens (the unit of text that AI models process) your conversation uses.
  2. When the conversation approaches the model's context limit, older messages are selected for compression — starting from the beginning of the chat.
  3. A summary is generated that captures the key information from those messages.
  4. The summary replaces the older messages in the context sent to the AI, while the originals are preserved in the background.
  5. You see a compression indicator in the chat where the summarized messages were.

If anything goes wrong during compression, the platform falls back to simply trimming older messages so your conversation can continue uninterrupted.

Good to Know

  • Compression is triggered automatically based on your conversation length and the model's context capacity. You don't need to enable or configure it.
  • The original messages are never deleted — they're kept in the background and can be restored with the Undo button.
  • The summary focuses on key decisions, important facts, action items, and context needed to continue the conversation naturally.
  • Compression uses a fast, lightweight AI model to generate summaries so it doesn't slow down your conversation.