Why Claude Code Sends Huge Prompts to Local Models — And How I Reduced Mine by 73%

On this page

The short version
Test environment
What Claude Code actually sent
Size by top-level component
Messages breakdown
System prompt breakdown
Why /clear did not fix it
Detected tools
Why tool definitions are so expensive
Official Claude Code behaviour that matters
Performance impact
Cost impact
The optimisation experiment
Recommended profiles
Recommended PowerShell aliases
Recommended local settings
Do not over-optimise the wrong thing
What should stay enabled for Windows coding
What should be disabled by default
A simple mental model
Before and after
Lessons learned
Final recommendation
References

I recently connected Claude Code to a local model:

qwen3.6:27b-mtp-q4_K_M

The model was running through Ollama / a local Claude-compatible setup.

Then I tested something intentionally stupid:

7+5

The response took around 12 seconds.

That is ridiculous for a three-character arithmetic prompt.

So I captured the actual request Claude Code sent to the model and analysed it. The result was very clear: the problem was not the complexity of the question. The problem was the size of the wrapper Claude Code sends around every request.

Claude Code is not a simple chat client. It is an agent runtime. It sends project context, system instructions, tool definitions, IDE context, git information, memory instructions, skills, agents, and other operational metadata.

That is useful for serious coding work.

It is terrible for trivial prompts.

This article documents the investigation, the numbers, the detected tools, the impact, and the practical command profiles I now use to reduce prompt size and improve performance.

The short version

The actual prompt was:

7+5

Size:

3 characters

The captured Claude Code request was:

130,971 bytes raw file size
116,089 bytes compact JSON
~28,856 estimated tokens

The largest component was tool definitions:

94,028 bytes
~23,393 estimated tokens
81.00% of compact JSON

That means the model was not really answering a tiny prompt. It was processing a full Claude Code runtime envelope.

After restricting tools, I got these sizes:

Mode	Command purpose	Request size
Original full request	Default / heavy Claude Code context	130,971 bytes
Coding profile	Limited coding tools	50,459 bytes
Analysis profile	Read/search/run only	34,821 bytes

That is a reduction of:

Mode	Reduction from original
Coding profile	61.5% smaller
Analysis profile	73.4% smaller

This is not a cosmetic optimisation. This is the difference between feeding a local model a reasonable request and forcing it to chew through a giant runtime contract before answering anything.

Test environment

The captured request came from a Claude Code session running against a local / gateway model.

Relevant details from the request:

Model: qwen3.6:27b-mtp-q4_K_M
Max output tokens requested: 32,000
Streaming: true
Temperature: 1.0
Thinking enabled: true
Platform: Windows 11
Primary shell: PowerShell
Working directory: c:\Projects\GitHub\myproject

The project was a real git repository with a CLAUDE.md file loaded into context.

Claude Code also detected VS Code context and included information such as the opened file:

c:\Projects\GitHub\myproject\.claude\settings.local.json

What Claude Code actually sent

The request was not just this:

{
  "messages": [
    {
      "role": "user",
      "content": "7+5"
    }
  ]
}

It was closer to this:

You are Claude Code.
Here are the system rules.
Here is the project CLAUDE.md.
Here is the current date.
Here is the shell and OS.
Here is the git status.
Here is the opened IDE file.
Here are all available agents.
Here are all available skills.
Here are all available tools and their JSON schemas.
Here is the actual user prompt: 7+5.

That is the core problem.

For real coding tasks, that context can be valuable. For a trivial local model prompt, it is pure overhead.

Size by top-level component

I parsed the request as JSON and measured compact JSON size for each top-level section.

Compact JSON excludes pretty-print whitespace, so it gives a better view of actual semantic payload.

Section	Bytes	Characters	Estimated tokens	Share of compact request
`tools`	94,028	93,573	~23,393	81.00%
`messages`	14,363	14,191	~3,548	12.37%
`system`	7,299	7,263	~1,816	6.29%
`metadata`	176	176	~44	0.15%
`model`	24	24	~6	0.02%
`thinking`	16	16	~4	0.01%
`max_tokens`	5	5	~1	~0%
`stop_sequences`	4	4	~1	~0%
`stream`	4	4	~1	~0%
`top_p`	4	4	~1	~0%
`top_k`	4	4	~1	~0%
`tool_choice`	4	4	~1	~0%
`temperature`	3	3	~1	~0%

The conclusion is obvious:

The tools section is the monster.

CLAUDE.md was not the main issue. The tool schema payload was.

Messages breakdown

The messages array contained two main messages.

Message	Role	Bytes	Characters	Estimated tokens	Purpose
`messages[0]`	`user`	7,135	6,983	~1,746	Project reminder, local command context, IDE context, actual user prompt
`messages[1]`	`system`	7,225	7,205	~1,801	Available agent types and skills

`messages[0].content[]`

Part	Bytes	Text chars	Estimated tokens	Description
`content[0]`	6,353	6,039	~1,510	`<system-reminder>` with project `CLAUDE.md`
`content[1]`	272	246	~62	Local command caveat
`content[2]`	159	131	~33	`/clear` command metadata
`content[3]`	72	46	~12	Empty local command stdout
`content[4]`	218	188	~47	IDE opened file context
`content[5]`	28	3	~1	Actual prompt: `7+5`

The actual useful input was one token or so.

Everything else was runtime context.

System prompt breakdown

The top-level system array contained Claude Code operating instructions.

System item	Bytes	Text chars	Estimated tokens	Purpose
`system[0]`	119	94	~24	Identity: Claude Code
`system[1]`	7,177	7,016	~1,754	Harness rules, safety rules, memory rules, environment, VS Code formatting

This part is not huge compared to tools, but it still matters.

The system prompt included instructions for:

Claude Code identity
software engineering behaviour
security boundaries
tool permission rules
file reference formatting
memory management
shell/environment details
git status
VS Code extension behaviour

Most of that is useful in an agentic coding environment.

It is still overhead for simple prompts.

Why `/clear` did not fix it

The captured request included /clear command metadata, but the request was still huge.

That is because /clear clears conversation history. It does not remove the Claude Code runtime contract.

Claude Code still needs to send:

system prompt
tool definitions
project memory / CLAUDE.md
environment info
IDE context
skills / agents
current user prompt

So /clear helps with conversation history, but it does not turn Claude Code into a minimal chat client.

Detected tools

The request contained 28 tools.

Here is the full list, sorted by size.

Tool	Size	Estimated tokens	What it does	Keep for Windows coding?
`Workflow`	21,010 bytes	~5,206	Runs structured workflows and orchestration patterns.	No. Biggest offender. Disable unless deliberately using workflows.
`PowerShell`	9,049 bytes	~2,251	Runs PowerShell commands on Windows.	Yes. Essential for Windows coding.
`DesignSync`	8,978 bytes	~2,232	Syncs local component/design files with Claude Design projects.	No for normal coding.
`Monitor`	6,288 bytes	~1,564	Watches commands/processes/logs and notifies when conditions happen.	Usually no. Useful only for long-running monitoring.
`AskUserQuestion`	4,901 bytes	~1,224	Presents structured interactive questions/options to the user.	Optional. Good UX, but not required daily.
`EnterPlanMode`	4,317 bytes	~1,079	Enters implementation planning mode before code changes.	Optional. Use for big refactors/features.
`EnterWorktree`	3,761 bytes	~938	Creates or enters an isolated git worktree.	No unless you explicitly use worktrees.
`CronCreate`	3,720 bytes	~924	Schedules future or recurring prompts.	No. Not needed for coding.
`ScheduleWakeup`	3,718 bytes	~922	Schedules self-paced wakeups or delayed checks.	No. Not needed for normal coding.
`Grep`	3,230 bytes	~806	Searches file contents using ripgrep semantics.	Yes. Essential for code analysis.
`Bash`	3,052 bytes	~758	Runs Git Bash / POSIX shell commands.	Usually no on Windows unless repo uses `.sh`.
`ExitPlanMode`	2,538 bytes	~634	Exits plan mode and asks for approval.	Optional with `EnterPlanMode`.
`ExitWorktree`	2,511 bytes	~626	Leaves/removes a Claude-created worktree.	No unless using worktrees.
`Agent`	2,507 bytes	~625	Launches subagents for delegated tasks.	Optional. Powerful but adds overhead.
`Skill`	2,112 bytes	~528	Invokes Claude Code skills.	Optional. Disable by default for local-model speed.
`NotebookEdit`	1,627 bytes	~406	Edits Jupyter notebooks.	No for .NET/React/PowerShell coding.
`Read`	1,592 bytes	~397	Reads files.	Yes. Essential.
`PushNotification`	1,560 bytes	~388	Sends notifications.	No.
`TaskOutput`	1,552 bytes	~386	Reads output from background tasks.	No by default.
`Edit`	966 bytes	~241	Performs exact string replacement in files.	Yes. Essential for safe edits.
`TodoWrite`	929 bytes	~232	Tracks implementation tasks/todos.	Yes/optional. Small and useful.
`WebSearch`	836 bytes	~208	Searches the web.	Optional. Use only when current docs/research are needed.
`WebFetch`	747 bytes	~186	Fetches web pages.	Optional. Use only when needed.
`Glob`	731 bytes	~183	Finds files by glob pattern.	Yes. Essential.
`Write`	639 bytes	~160	Creates or overwrites files.	Yes for coding.
`TaskStop`	537 bytes	~134	Stops background tasks.	No by default.
`CronDelete`	360 bytes	~90	Deletes scheduled jobs.	No.
`CronList`	231 bytes	~58	Lists scheduled jobs.	No.

The worst offenders by size were:

Workflow
PowerShell
DesignSync
Monitor
AskUserQuestion
EnterPlanMode

PowerShell is large, but I need it on Windows.

Workflow, DesignSync, Monitor, Cron*, ScheduleWakeup, and notification tools are not needed for everyday coding.

Why tool definitions are so expensive

Tools are not sent as just names.

They are sent with:

name
description
when to use the tool
when not to use the tool
safety warnings
examples
full JSON schema
required fields
optional fields
enums
nested object structures

For example, a small tool like Write is only 639 bytes.

But Workflow is 21,010 bytes because it includes a very long description and schema.

This is necessary for the model to call tools reliably. The model cannot safely call a tool unless it knows the contract.

But if the tool is irrelevant to the task, it is dead weight.

Official Claude Code behaviour that matters

Claude Code has documented options that directly relate to this problem.

`--tools`

Claude Code supports restricting which built-in tools are available:

claude --tools "Bash,Edit,Read"

It also supports disabling all built-in tools with:

claude --tools ""

MCP tools are not affected by --tools, so MCP tools should be blocked separately if needed:

--disallowedTools "mcp__*"

Source: Claude Code CLI reference

`--bare`

Claude Code has a minimal mode:

claude --bare -p "query"

The official documentation says bare mode skips auto-discovery of hooks, skills, plugins, MCP servers, auto memory, and CLAUDE.md, while still giving Claude access to Bash, file read, and file edit tools.

Source: Claude Code CLI reference

`--disable-slash-commands`

Claude Code can disable skills and commands for a session:

claude --disable-slash-commands

Source: Claude Code CLI reference

`claudeMdExcludes`

Claude Code settings support excluding specific CLAUDE.md files:

{
  "claudeMdExcludes": [
    "**/vendor/**/CLAUDE.md"
  ]
}

Source: Claude Code settings

`disableBundledSkills`

Claude Code settings support disabling bundled skills and workflows:

{
  "disableBundledSkills": true
}

This removes bundled skills and workflows from the model context while leaving built-in slash commands typable.

Source: Claude Code settings

`autoMemoryEnabled`

Claude Code auto memory is on by default. It can be disabled in settings:

{
  "autoMemoryEnabled": false
}

Or with an environment variable:

$env:CLAUDE_CODE_DISABLE_AUTO_MEMORY = "1"

Source: Claude Code memory documentation

`CLAUDE_CODE_ATTRIBUTION_HEADER`

When routing Claude Code through an LLM gateway, this environment variable can help prompt-cache behaviour:

$env:CLAUDE_CODE_ATTRIBUTION_HEADER = "0"

The official docs say setting it to 0 omits the attribution block from the start of the system prompt, and that disabling it improves prompt-cache hit rates when routing through an LLM gateway. Anthropic API caching is unaffected.

Source: Claude Code environment variables

Prompt caching

Claude Code sends the full context on each request. The documentation explains that the model itself does not remember previous requests, so Claude Code resends the system prompt, project context, prior messages/tool results, and the new message.

Prompt caching avoids reprocessing unchanged prefixes when supported.

Source: How Claude Code uses prompt caching

This is very important when using a local model or gateway. If your gateway does not implement compatible prompt caching, the model may process the full prompt every time.

Performance impact

The original huge request took around:

12 seconds

For a prompt as small as:

7+5

That delay is not because arithmetic is hard.

The delay comes from:

Serialising and sending a large request.
Tokenising a large prompt.
Processing tens of thousands of prompt tokens.
Loading tool schema context into attention.
Possible reasoning/thinking overhead.
Possible lack of prompt caching in the local/gateway path.
Local model speed limits, especially with large context.

With hosted Claude models, prompt caching can reduce the repeated cost of unchanged context.

With local LLMs, Ollama, llama.cpp, or custom gateways, do not assume you get the same benefit. If the gateway does not support Claude-compatible prompt caching, every request may be processed from scratch.

Cost impact

If you use a paid API, bigger prompts can increase cost.

Token cost generally scales with how much context the model processes. Claude Code documentation explicitly recommends reducing token usage because larger context means more tokens and higher cost.

Source: Claude Code cost documentation

Even if you run a local model and pay no API fee, you still pay in:

latency
GPU time
CPU time
VRAM pressure
power usage
worse responsiveness
lower throughput
more context pollution

Local is not free. It just changes the type of cost.

The optimisation experiment

I tested two reduced command profiles.

1. Coding profile

claude --tools "Read,Write,Edit,Glob,Grep,PowerShell,TodoWrite,AskUserQuestion,EnterPlanMode,ExitPlanMode,Skill,WebFetch,WebSearch" --strict-mcp-config --disallowedTools "mcp__*"

Measured request size:

49.28 KB
50,459 bytes

Reduction from the original raw request:

61.5% smaller

This profile is useful for serious coding, but it still includes optional tools:

AskUserQuestion
EnterPlanMode
ExitPlanMode
Skill
WebFetch
WebSearch

For daily coding, I think this is still heavier than necessary.

2. Analysis profile

claude --tools "Read,Glob,Grep,PowerShell" --strict-mcp-config --disallowedTools "mcp__*"

Measured request size:

34.00 KB
34,821 bytes

Reduction from the original raw request:

73.4% smaller

This is a very good read-only / analysis profile.

It can:

read files
find files
search content
run safe PowerShell commands
run tests/builds if needed

It cannot edit files, which is exactly what I want for analysis-only work.

Recommended profiles

I now think about Claude Code profiles based on the task.

Do not use one giant tool set for everything. That is lazy and expensive.

Profile 1: Simple question

Do not use Claude Code.

Use the model directly:

ollama run qwen3.6:27b-mtp-q4_K_M "7+5"

Or use your llama.cpp/Open WebUI/direct local chat client.

Claude Code is the wrong tool for basic chat.

Profile 2: Bare one-shot

For a lightweight Claude Code scripted call:

claude --bare -p "7+5"

This skips a lot of automatic Claude Code context.

Profile 3: Analysis mode

For architecture inspection, code search, and explanation:

claude --tools "Read,Glob,Grep,PowerShell" --strict-mcp-config --disallowedTools "mcp__*"

Use this for prompts like:

Find where this API endpoint is implemented.
Explain the project structure.
Analyse this error.
Search for all usages of this class.
Run tests and explain failures.

Profile 4: Daily coding mode

For normal code edits on Windows:

claude --tools "Read,Write,Edit,Glob,Grep,PowerShell,TodoWrite" --strict-mcp-config --disallowedTools "mcp__*"

This is my recommended default.

It includes only what a coding agent really needs:

Tool	Why
`Read`	inspect files
`Write`	create files
`Edit`	modify files
`Glob`	find files
`Grep`	search code
`PowerShell`	run Windows commands, builds, tests
`TodoWrite`	track multi-step work

Profile 5: Heavy coding / planning mode

For bigger refactors or multi-file features:

claude --tools "Read,Write,Edit,Glob,Grep,PowerShell,TodoWrite,AskUserQuestion,EnterPlanMode,ExitPlanMode" --strict-mcp-config --disallowedTools "mcp__*"

Use this when the implementation needs planning or there are multiple valid approaches.

Profile 6: Research coding mode

For coding tasks that require online documentation:

claude --tools "Read,Write,Edit,Glob,Grep,PowerShell,TodoWrite,WebFetch,WebSearch" --strict-mcp-config --disallowedTools "mcp__*"

Do not use this by default.

Web access is useful, but it increases prompt size and can distract the model from inspecting local code first.

Profile 7: Skills mode

When I explicitly want Claude skills:

claude --tools "Read,Write,Edit,Glob,Grep,PowerShell,TodoWrite,Skill" --strict-mcp-config --disallowedTools "mcp__*"

Do not keep Skill in the default profile unless you actually use skills every session.

Recommended PowerShell aliases

To make this practical, I would create PowerShell functions.

Add this to your PowerShell profile:

notepad $PROFILE

Then add:

function claude-analyse {
    claude --tools "Read,Glob,Grep,PowerShell" --strict-mcp-config --disallowedTools "mcp__*" @args
}

function claude-code-light {
    claude --tools "Read,Write,Edit,Glob,Grep,PowerShell,TodoWrite" --strict-mcp-config --disallowedTools "mcp__*" @args
}

function claude-code-plan {
    claude --tools "Read,Write,Edit,Glob,Grep,PowerShell,TodoWrite,AskUserQuestion,EnterPlanMode,ExitPlanMode" --strict-mcp-config --disallowedTools "mcp__*" @args
}

function claude-code-research {
    claude --tools "Read,Write,Edit,Glob,Grep,PowerShell,TodoWrite,WebFetch,WebSearch" --strict-mcp-config --disallowedTools "mcp__*" @args
}

function claude-bare {
    claude --bare -p @args
}

Then use:

claude-analyse
claude-code-light
claude-code-plan
claude-code-research
claude-bare "7+5"

This is better than remembering long commands.

Recommended local settings

For a local-model/gateway setup, I would also consider a local .claude/settings.local.json:

{
  "disableBundledSkills": true,
  "autoMemoryEnabled": false,
  "includeGitInstructions": false,
  "env": {
    "CLAUDE_CODE_DISABLE_AUTO_MEMORY": "1",
    "CLAUDE_CODE_DISABLE_BUNDLED_SKILLS": "1",
    "CLAUDE_CODE_ATTRIBUTION_HEADER": "0",
    "MAX_THINKING_TOKENS": "0"
  }
}

Be careful with this.

It improves performance and reduces noise, but it can also make Claude Code less aware of your project and workflow.

For serious coding, I would not blindly remove all project instructions. I would first reduce tools, because tools were the biggest problem in the captured request.

Do not over-optimise the wrong thing

CLAUDE.md is not the main villain.

In my captured request:

CLAUDE.md / system-reminder area: ~6 KB
Tools: ~94 KB

So yes, keep CLAUDE.md concise.

But do not waste two hours shaving 500 bytes from CLAUDE.md while sending a 21 KB Workflow tool you never use.

That is bad engineering.

Optimise the biggest cost first.

What should stay enabled for Windows coding

For my Windows + PowerShell environment, the minimum useful coding set is:

Read
Write
Edit
Glob
Grep
PowerShell
TodoWrite

I would not include Bash by default on Windows unless the project genuinely uses shell scripts.

Having both Bash and PowerShell gives the model two different command syntaxes. Local models are more likely to mix them up.

Use PowerShell as the primary shell.

Add Bash only when needed.

What should be disabled by default

For local Qwen / Ollama / llama.cpp coding, I would disable these unless I explicitly need them:

Agent
Bash
CronCreate
CronDelete
CronList
DesignSync
EnterWorktree
ExitWorktree
Monitor
NotebookEdit
PushNotification
ScheduleWakeup
Skill
TaskOutput
TaskStop
WebFetch
WebSearch
Workflow

The biggest default waste is:

Workflow
DesignSync
Monitor
Cron*
ScheduleWakeup
PushNotification
NotebookEdit

These tools are useful in the right context. But sending them to a local model for every coding turn is waste.

A simple mental model

Use this decision tree:

flowchart TD
    A[What am I doing?] --> B{Simple question?}
    B -->|Yes| C[Use Ollama / direct local chat]
    B -->|No| D{Need to edit files?}
    D -->|No| E[Use analysis profile: Read, Glob, Grep, PowerShell]
    D -->|Yes| F{Large change or unclear approach?}
    F -->|No| G[Use daily coding profile]
    F -->|Yes| H[Use planning profile]
    H --> I{Need internet docs?}
    G --> I
    I -->|Yes| J[Add WebFetch and WebSearch]
    I -->|No| K[Do not add web tools]

Before and after

Before

claude

or a large default tool set.

Result:

~130 KB request
~12 seconds for "7+5"
Lots of irrelevant tool definitions
Higher local model latency
Higher API cost if using paid models

After

Analysis:

claude --tools "Read,Glob,Grep,PowerShell" --strict-mcp-config --disallowedTools "mcp__*"

Result:

34,821 bytes
73.4% smaller
Enough for code analysis
No edit capability

Coding:

claude --tools "Read,Write,Edit,Glob,Grep,PowerShell,TodoWrite" --strict-mcp-config --disallowedTools "mcp__*"

Expected result:

Much smaller than full Claude Code
Enough for real coding
No irrelevant design/scheduling/workflow tools

Lessons learned

1. Claude Code is not a chat client

It is an agent runtime.

That runtime has a cost.

2. Tool schemas dominate request size

In my captured request, tool definitions were:

94,028 bytes
81.00% of compact JSON

This is the first thing to optimise.

3. `/clear` is not enough

It clears conversation history. It does not remove tools, system prompt, project memory, skills, or agent runtime instructions.

4. Local models suffer more from prompt bloat

Hosted Claude can benefit from prompt caching.

Local models and custom gateways may not.

5. Use task-specific profiles

One giant Claude Code configuration for every task is inefficient.

Use different tool sets for:

simple chat
analysis
normal coding
planning
research
skills

6. Keep PowerShell for Windows

On Windows, PowerShell should be the primary execution tool.

Do not include Bash unless the project needs it.

7. Do not optimise emotionally

Measure first.

In this case, the data was clear:

Tools were the problem.

Not CLAUDE.md. Not /clear. Not the actual prompt.

Final recommendation

For my daily Windows coding workflow with local Qwen, I would use this as the default:

claude --tools "Read,Write,Edit,Glob,Grep,PowerShell,TodoWrite" --strict-mcp-config --disallowedTools "mcp__*"

For read-only analysis:

claude --tools "Read,Glob,Grep,PowerShell" --strict-mcp-config --disallowedTools "mcp__*"

For simple questions:

ollama run qwen3.6:27b-mtp-q4_K_M "your question"

Using full Claude Code for 7+5 is not clever. It is a waste of context, GPU time, and patience.

Claude Code is excellent when used as a coding agent.

It is the wrong hammer for tiny prompts.

The short version

Test environment

What Claude Code actually sent

Size by top-level component

Messages breakdown

messages[0].content[]

System prompt breakdown

Why /clear did not fix it

Detected tools

Why tool definitions are so expensive

Official Claude Code behaviour that matters

--tools

--bare

--disable-slash-commands

claudeMdExcludes

disableBundledSkills

autoMemoryEnabled

CLAUDE_CODE_ATTRIBUTION_HEADER

Prompt caching

Performance impact

Cost impact

The optimisation experiment

1. Coding profile

2. Analysis profile

Recommended profiles

Profile 1: Simple question

Profile 2: Bare one-shot

Profile 3: Analysis mode

Profile 4: Daily coding mode

Profile 5: Heavy coding / planning mode

Profile 6: Research coding mode

Profile 7: Skills mode

Recommended PowerShell aliases

Recommended local settings

Do not over-optimise the wrong thing

What should stay enabled for Windows coding

What should be disabled by default

A simple mental model

Before and after

Before

After

Lessons learned

1. Claude Code is not a chat client

2. Tool schemas dominate request size

3. /clear is not enough

4. Local models suffer more from prompt bloat

5. Use task-specific profiles

6. Keep PowerShell for Windows

7. Do not optimise emotionally

Final recommendation

References

Related Articles

Claude Code, llama.cpp, and the Hidden Prompt Cache Killer

Using OpenRouter with Claude Code CLI

The Future of Programming and AI: Interview

💬 Comments & Reactions

`messages[0].content[]`

Why `/clear` did not fix it

`--tools`

`--bare`

`--disable-slash-commands`

`claudeMdExcludes`

`disableBundledSkills`

`autoMemoryEnabled`

`CLAUDE_CODE_ATTRIBUTION_HEADER`

3. `/clear` is not enough