- The short version
- Test environment
- What Claude Code actually sent
- Size by top-level component
- Messages breakdown
- System prompt breakdown
- Why
/cleardid not fix it - Detected tools
- Why tool definitions are so expensive
- Official Claude Code behaviour that matters
- Performance impact
- Cost impact
- The optimisation experiment
- Recommended profiles
- Recommended PowerShell aliases
- Recommended local settings
- Do not over-optimise the wrong thing
- What should stay enabled for Windows coding
- What should be disabled by default
- A simple mental model
- Before and after
- Lessons learned
- Final recommendation
- References
I recently connected Claude Code to a local model:
qwen3.6:27b-mtp-q4_K_M
The model was running through Ollama / a local Claude-compatible setup.
Then I tested something intentionally stupid:
7+5
The response took around 12 seconds.
That is ridiculous for a three-character arithmetic prompt.
So I captured the actual request Claude Code sent to the model and analysed it. The result was very clear: the problem was not the complexity of the question. The problem was the size of the wrapper Claude Code sends around every request.
Claude Code is not a simple chat client. It is an agent runtime. It sends project context, system instructions, tool definitions, IDE context, git information, memory instructions, skills, agents, and other operational metadata.
That is useful for serious coding work.
It is terrible for trivial prompts.
This article documents the investigation, the numbers, the detected tools, the impact, and the practical command profiles I now use to reduce prompt size and improve performance.
The short version
The actual prompt was:
7+5
Size:
3 characters
The captured Claude Code request was:
130,971 bytes raw file size
116,089 bytes compact JSON
~28,856 estimated tokens
The largest component was tool definitions:
94,028 bytes
~23,393 estimated tokens
81.00% of compact JSON
That means the model was not really answering a tiny prompt. It was processing a full Claude Code runtime envelope.
After restricting tools, I got these sizes:
| Mode | Command purpose | Request size |
|---|---|---|
| Original full request | Default / heavy Claude Code context | 130,971 bytes |
| Coding profile | Limited coding tools | 50,459 bytes |
| Analysis profile | Read/search/run only | 34,821 bytes |
That is a reduction of:
| Mode | Reduction from original |
|---|---|
| Coding profile | 61.5% smaller |
| Analysis profile | 73.4% smaller |
This is not a cosmetic optimisation. This is the difference between feeding a local model a reasonable request and forcing it to chew through a giant runtime contract before answering anything.
Test environment
The captured request came from a Claude Code session running against a local / gateway model.
Relevant details from the request:
Model: qwen3.6:27b-mtp-q4_K_M
Max output tokens requested: 32,000
Streaming: true
Temperature: 1.0
Thinking enabled: true
Platform: Windows 11
Primary shell: PowerShell
Working directory: c:\Projects\GitHub\myproject
The project was a real git repository with a CLAUDE.md file loaded into context.
Claude Code also detected VS Code context and included information such as the opened file:
c:\Projects\GitHub\myproject\.claude\settings.local.json
What Claude Code actually sent
The request was not just this:
{
"messages": [
{
"role": "user",
"content": "7+5"
}
]
}It was closer to this:
You are Claude Code.
Here are the system rules.
Here is the project CLAUDE.md.
Here is the current date.
Here is the shell and OS.
Here is the git status.
Here is the opened IDE file.
Here are all available agents.
Here are all available skills.
Here are all available tools and their JSON schemas.
Here is the actual user prompt: 7+5.
That is the core problem.
For real coding tasks, that context can be valuable. For a trivial local model prompt, it is pure overhead.
Size by top-level component
I parsed the request as JSON and measured compact JSON size for each top-level section.
Compact JSON excludes pretty-print whitespace, so it gives a better view of actual semantic payload.
| Section | Bytes | Characters | Estimated tokens | Share of compact request |
|---|---|---|---|---|
tools |
94,028 | 93,573 | ~23,393 | 81.00% |
messages |
14,363 | 14,191 | ~3,548 | 12.37% |
system |
7,299 | 7,263 | ~1,816 | 6.29% |
metadata |
176 | 176 | ~44 | 0.15% |
model |
24 | 24 | ~6 | 0.02% |
thinking |
16 | 16 | ~4 | 0.01% |
max_tokens |
5 | 5 | ~1 | ~0% |
stop_sequences |
4 | 4 | ~1 | ~0% |
stream |
4 | 4 | ~1 | ~0% |
top_p |
4 | 4 | ~1 | ~0% |
top_k |
4 | 4 | ~1 | ~0% |
tool_choice |
4 | 4 | ~1 | ~0% |
temperature |
3 | 3 | ~1 | ~0% |
The conclusion is obvious:
The tools section is the monster.
CLAUDE.md was not the main issue. The tool schema payload was.
Messages breakdown
The messages array contained two main messages.
| Message | Role | Bytes | Characters | Estimated tokens | Purpose |
|---|---|---|---|---|---|
messages[0] |
user |
7,135 | 6,983 | ~1,746 | Project reminder, local command context, IDE context, actual user prompt |
messages[1] |
system |
7,225 | 7,205 | ~1,801 | Available agent types and skills |
messages[0].content[]
| Part | Bytes | Text chars | Estimated tokens | Description |
|---|---|---|---|---|
content[0] |
6,353 | 6,039 | ~1,510 | <system-reminder> with project CLAUDE.md |
content[1] |
272 | 246 | ~62 | Local command caveat |
content[2] |
159 | 131 | ~33 | /clear command metadata |
content[3] |
72 | 46 | ~12 | Empty local command stdout |
content[4] |
218 | 188 | ~47 | IDE opened file context |
content[5] |
28 | 3 | ~1 | Actual prompt: 7+5 |
The actual useful input was one token or so.
Everything else was runtime context.
System prompt breakdown
The top-level system array contained Claude Code operating instructions.
| System item | Bytes | Text chars | Estimated tokens | Purpose |
|---|---|---|---|---|
system[0] |
119 | 94 | ~24 | Identity: Claude Code |
system[1] |
7,177 | 7,016 | ~1,754 | Harness rules, safety rules, memory rules, environment, VS Code formatting |
This part is not huge compared to tools, but it still matters.
The system prompt included instructions for:
- Claude Code identity
- software engineering behaviour
- security boundaries
- tool permission rules
- file reference formatting
- memory management
- shell/environment details
- git status
- VS Code extension behaviour
Most of that is useful in an agentic coding environment.
It is still overhead for simple prompts.
Why /clear did not fix it
The captured request included /clear command metadata, but the request was still huge.
That is because /clear clears conversation history. It does not remove the Claude Code runtime contract.
Claude Code still needs to send:
- system prompt
- tool definitions
- project memory /
CLAUDE.md - environment info
- IDE context
- skills / agents
- current user prompt
So /clear helps with conversation history, but it does not turn Claude Code into a minimal chat client.
Detected tools
The request contained 28 tools.
Here is the full list, sorted by size.
| Tool | Size | Estimated tokens | What it does | Keep for Windows coding? |
|---|---|---|---|---|
Workflow |
21,010 bytes | ~5,206 | Runs structured workflows and orchestration patterns. | No. Biggest offender. Disable unless deliberately using workflows. |
PowerShell |
9,049 bytes | ~2,251 | Runs PowerShell commands on Windows. | Yes. Essential for Windows coding. |
DesignSync |
8,978 bytes | ~2,232 | Syncs local component/design files with Claude Design projects. | No for normal coding. |
Monitor |
6,288 bytes | ~1,564 | Watches commands/processes/logs and notifies when conditions happen. | Usually no. Useful only for long-running monitoring. |
AskUserQuestion |
4,901 bytes | ~1,224 | Presents structured interactive questions/options to the user. | Optional. Good UX, but not required daily. |
EnterPlanMode |
4,317 bytes | ~1,079 | Enters implementation planning mode before code changes. | Optional. Use for big refactors/features. |
EnterWorktree |
3,761 bytes | ~938 | Creates or enters an isolated git worktree. | No unless you explicitly use worktrees. |
CronCreate |
3,720 bytes | ~924 | Schedules future or recurring prompts. | No. Not needed for coding. |
ScheduleWakeup |
3,718 bytes | ~922 | Schedules self-paced wakeups or delayed checks. | No. Not needed for normal coding. |
Grep |
3,230 bytes | ~806 | Searches file contents using ripgrep semantics. | Yes. Essential for code analysis. |
Bash |
3,052 bytes | ~758 | Runs Git Bash / POSIX shell commands. | Usually no on Windows unless repo uses .sh. |
ExitPlanMode |
2,538 bytes | ~634 | Exits plan mode and asks for approval. | Optional with EnterPlanMode. |
ExitWorktree |
2,511 bytes | ~626 | Leaves/removes a Claude-created worktree. | No unless using worktrees. |
Agent |
2,507 bytes | ~625 | Launches subagents for delegated tasks. | Optional. Powerful but adds overhead. |
Skill |
2,112 bytes | ~528 | Invokes Claude Code skills. | Optional. Disable by default for local-model speed. |
NotebookEdit |
1,627 bytes | ~406 | Edits Jupyter notebooks. | No for .NET/React/PowerShell coding. |
Read |
1,592 bytes | ~397 | Reads files. | Yes. Essential. |
PushNotification |
1,560 bytes | ~388 | Sends notifications. | No. |
TaskOutput |
1,552 bytes | ~386 | Reads output from background tasks. | No by default. |
Edit |
966 bytes | ~241 | Performs exact string replacement in files. | Yes. Essential for safe edits. |
TodoWrite |
929 bytes | ~232 | Tracks implementation tasks/todos. | Yes/optional. Small and useful. |
WebSearch |
836 bytes | ~208 | Searches the web. | Optional. Use only when current docs/research are needed. |
WebFetch |
747 bytes | ~186 | Fetches web pages. | Optional. Use only when needed. |
Glob |
731 bytes | ~183 | Finds files by glob pattern. | Yes. Essential. |
Write |
639 bytes | ~160 | Creates or overwrites files. | Yes for coding. |
TaskStop |
537 bytes | ~134 | Stops background tasks. | No by default. |
CronDelete |
360 bytes | ~90 | Deletes scheduled jobs. | No. |
CronList |
231 bytes | ~58 | Lists scheduled jobs. | No. |
The worst offenders by size were:
Workflow
PowerShell
DesignSync
Monitor
AskUserQuestion
EnterPlanMode
PowerShell is large, but I need it on Windows.
Workflow, DesignSync, Monitor, Cron*, ScheduleWakeup, and notification tools are not needed for everyday coding.
Why tool definitions are so expensive
Tools are not sent as just names.
They are sent with:
- name
- description
- when to use the tool
- when not to use the tool
- safety warnings
- examples
- full JSON schema
- required fields
- optional fields
- enums
- nested object structures
For example, a small tool like Write is only 639 bytes.
But Workflow is 21,010 bytes because it includes a very long description and schema.
This is necessary for the model to call tools reliably. The model cannot safely call a tool unless it knows the contract.
But if the tool is irrelevant to the task, it is dead weight.
Official Claude Code behaviour that matters
Claude Code has documented options that directly relate to this problem.
--tools
Claude Code supports restricting which built-in tools are available:
claude --tools "Bash,Edit,Read"It also supports disabling all built-in tools with:
claude --tools ""MCP tools are not affected by --tools, so MCP tools should be blocked separately if needed:
--disallowedTools "mcp__*"Source: Claude Code CLI reference
--bare
Claude Code has a minimal mode:
claude --bare -p "query"The official documentation says bare mode skips auto-discovery of hooks, skills, plugins, MCP servers, auto memory, and CLAUDE.md, while still giving Claude access to Bash, file read, and file edit tools.
Source: Claude Code CLI reference
--disable-slash-commands
Claude Code can disable skills and commands for a session:
claude --disable-slash-commandsSource: Claude Code CLI reference
claudeMdExcludes
Claude Code settings support excluding specific CLAUDE.md files:
{
"claudeMdExcludes": [
"**/vendor/**/CLAUDE.md"
]
}Source: Claude Code settings
disableBundledSkills
Claude Code settings support disabling bundled skills and workflows:
{
"disableBundledSkills": true
}This removes bundled skills and workflows from the model context while leaving built-in slash commands typable.
Source: Claude Code settings
autoMemoryEnabled
Claude Code auto memory is on by default. It can be disabled in settings:
{
"autoMemoryEnabled": false
}Or with an environment variable:
$env:CLAUDE_CODE_DISABLE_AUTO_MEMORY = "1"Source: Claude Code memory documentation
CLAUDE_CODE_ATTRIBUTION_HEADER
When routing Claude Code through an LLM gateway, this environment variable can help prompt-cache behaviour:
$env:CLAUDE_CODE_ATTRIBUTION_HEADER = "0"The official docs say setting it to 0 omits the attribution block from the start of the system prompt, and that disabling it improves prompt-cache hit rates when routing through an LLM gateway. Anthropic API caching is unaffected.
Source: Claude Code environment variables
Prompt caching
Claude Code sends the full context on each request. The documentation explains that the model itself does not remember previous requests, so Claude Code resends the system prompt, project context, prior messages/tool results, and the new message.
Prompt caching avoids reprocessing unchanged prefixes when supported.
Source: How Claude Code uses prompt caching
This is very important when using a local model or gateway. If your gateway does not implement compatible prompt caching, the model may process the full prompt every time.
Performance impact
The original huge request took around:
12 seconds
For a prompt as small as:
7+5
That delay is not because arithmetic is hard.
The delay comes from:
- Serialising and sending a large request.
- Tokenising a large prompt.
- Processing tens of thousands of prompt tokens.
- Loading tool schema context into attention.
- Possible reasoning/thinking overhead.
- Possible lack of prompt caching in the local/gateway path.
- Local model speed limits, especially with large context.
With hosted Claude models, prompt caching can reduce the repeated cost of unchanged context.
With local LLMs, Ollama, llama.cpp, or custom gateways, do not assume you get the same benefit. If the gateway does not support Claude-compatible prompt caching, every request may be processed from scratch.
Cost impact
If you use a paid API, bigger prompts can increase cost.
Token cost generally scales with how much context the model processes. Claude Code documentation explicitly recommends reducing token usage because larger context means more tokens and higher cost.
Source: Claude Code cost documentation
Even if you run a local model and pay no API fee, you still pay in:
- latency
- GPU time
- CPU time
- VRAM pressure
- power usage
- worse responsiveness
- lower throughput
- more context pollution
Local is not free. It just changes the type of cost.
The optimisation experiment
I tested two reduced command profiles.
1. Coding profile
claude --tools "Read,Write,Edit,Glob,Grep,PowerShell,TodoWrite,AskUserQuestion,EnterPlanMode,ExitPlanMode,Skill,WebFetch,WebSearch" --strict-mcp-config --disallowedTools "mcp__*"Measured request size:
49.28 KB
50,459 bytes
Reduction from the original raw request:
61.5% smaller
This profile is useful for serious coding, but it still includes optional tools:
AskUserQuestionEnterPlanModeExitPlanModeSkillWebFetchWebSearch
For daily coding, I think this is still heavier than necessary.
2. Analysis profile
claude --tools "Read,Glob,Grep,PowerShell" --strict-mcp-config --disallowedTools "mcp__*"Measured request size:
34.00 KB
34,821 bytes
Reduction from the original raw request:
73.4% smaller
This is a very good read-only / analysis profile.
It can:
- read files
- find files
- search content
- run safe PowerShell commands
- run tests/builds if needed
It cannot edit files, which is exactly what I want for analysis-only work.
Recommended profiles
I now think about Claude Code profiles based on the task.
Do not use one giant tool set for everything. That is lazy and expensive.
Profile 1: Simple question
Do not use Claude Code.
Use the model directly:
ollama run qwen3.6:27b-mtp-q4_K_M "7+5"Or use your llama.cpp/Open WebUI/direct local chat client.
Claude Code is the wrong tool for basic chat.
Profile 2: Bare one-shot
For a lightweight Claude Code scripted call:
claude --bare -p "7+5"This skips a lot of automatic Claude Code context.
Profile 3: Analysis mode
For architecture inspection, code search, and explanation:
claude --tools "Read,Glob,Grep,PowerShell" --strict-mcp-config --disallowedTools "mcp__*"Use this for prompts like:
Find where this API endpoint is implemented.
Explain the project structure.
Analyse this error.
Search for all usages of this class.
Run tests and explain failures.
Profile 4: Daily coding mode
For normal code edits on Windows:
claude --tools "Read,Write,Edit,Glob,Grep,PowerShell,TodoWrite" --strict-mcp-config --disallowedTools "mcp__*"This is my recommended default.
It includes only what a coding agent really needs:
| Tool | Why |
|---|---|
Read |
inspect files |
Write |
create files |
Edit |
modify files |
Glob |
find files |
Grep |
search code |
PowerShell |
run Windows commands, builds, tests |
TodoWrite |
track multi-step work |
Profile 5: Heavy coding / planning mode
For bigger refactors or multi-file features:
claude --tools "Read,Write,Edit,Glob,Grep,PowerShell,TodoWrite,AskUserQuestion,EnterPlanMode,ExitPlanMode" --strict-mcp-config --disallowedTools "mcp__*"Use this when the implementation needs planning or there are multiple valid approaches.
Profile 6: Research coding mode
For coding tasks that require online documentation:
claude --tools "Read,Write,Edit,Glob,Grep,PowerShell,TodoWrite,WebFetch,WebSearch" --strict-mcp-config --disallowedTools "mcp__*"Do not use this by default.
Web access is useful, but it increases prompt size and can distract the model from inspecting local code first.
Profile 7: Skills mode
When I explicitly want Claude skills:
claude --tools "Read,Write,Edit,Glob,Grep,PowerShell,TodoWrite,Skill" --strict-mcp-config --disallowedTools "mcp__*"Do not keep Skill in the default profile unless you actually use skills every session.
Recommended PowerShell aliases
To make this practical, I would create PowerShell functions.
Add this to your PowerShell profile:
notepad $PROFILEThen add:
function claude-analyse {
claude --tools "Read,Glob,Grep,PowerShell" --strict-mcp-config --disallowedTools "mcp__*" @args
}
function claude-code-light {
claude --tools "Read,Write,Edit,Glob,Grep,PowerShell,TodoWrite" --strict-mcp-config --disallowedTools "mcp__*" @args
}
function claude-code-plan {
claude --tools "Read,Write,Edit,Glob,Grep,PowerShell,TodoWrite,AskUserQuestion,EnterPlanMode,ExitPlanMode" --strict-mcp-config --disallowedTools "mcp__*" @args
}
function claude-code-research {
claude --tools "Read,Write,Edit,Glob,Grep,PowerShell,TodoWrite,WebFetch,WebSearch" --strict-mcp-config --disallowedTools "mcp__*" @args
}
function claude-bare {
claude --bare -p @args
}Then use:
claude-analyse
claude-code-light
claude-code-plan
claude-code-research
claude-bare "7+5"This is better than remembering long commands.
Recommended local settings
For a local-model/gateway setup, I would also consider a local .claude/settings.local.json:
{
"disableBundledSkills": true,
"autoMemoryEnabled": false,
"includeGitInstructions": false,
"env": {
"CLAUDE_CODE_DISABLE_AUTO_MEMORY": "1",
"CLAUDE_CODE_DISABLE_BUNDLED_SKILLS": "1",
"CLAUDE_CODE_ATTRIBUTION_HEADER": "0",
"MAX_THINKING_TOKENS": "0"
}
}Be careful with this.
It improves performance and reduces noise, but it can also make Claude Code less aware of your project and workflow.
For serious coding, I would not blindly remove all project instructions. I would first reduce tools, because tools were the biggest problem in the captured request.
Do not over-optimise the wrong thing
CLAUDE.md is not the main villain.
In my captured request:
CLAUDE.md / system-reminder area: ~6 KB
Tools: ~94 KB
So yes, keep CLAUDE.md concise.
But do not waste two hours shaving 500 bytes from CLAUDE.md while sending a 21 KB Workflow tool you never use.
That is bad engineering.
Optimise the biggest cost first.
What should stay enabled for Windows coding
For my Windows + PowerShell environment, the minimum useful coding set is:
Read
Write
Edit
Glob
Grep
PowerShell
TodoWrite
I would not include Bash by default on Windows unless the project genuinely uses shell scripts.
Having both Bash and PowerShell gives the model two different command syntaxes. Local models are more likely to mix them up.
Use PowerShell as the primary shell.
Add Bash only when needed.
What should be disabled by default
For local Qwen / Ollama / llama.cpp coding, I would disable these unless I explicitly need them:
Agent
Bash
CronCreate
CronDelete
CronList
DesignSync
EnterWorktree
ExitWorktree
Monitor
NotebookEdit
PushNotification
ScheduleWakeup
Skill
TaskOutput
TaskStop
WebFetch
WebSearch
Workflow
The biggest default waste is:
Workflow
DesignSync
Monitor
Cron*
ScheduleWakeup
PushNotification
NotebookEdit
These tools are useful in the right context. But sending them to a local model for every coding turn is waste.
A simple mental model
Use this decision tree:
flowchart TD
A[What am I doing?] --> B{Simple question?}
B -->|Yes| C[Use Ollama / direct local chat]
B -->|No| D{Need to edit files?}
D -->|No| E[Use analysis profile: Read, Glob, Grep, PowerShell]
D -->|Yes| F{Large change or unclear approach?}
F -->|No| G[Use daily coding profile]
F -->|Yes| H[Use planning profile]
H --> I{Need internet docs?}
G --> I
I -->|Yes| J[Add WebFetch and WebSearch]
I -->|No| K[Do not add web tools]
Before and after
Before
claudeor a large default tool set.
Result:
~130 KB request
~12 seconds for "7+5"
Lots of irrelevant tool definitions
Higher local model latency
Higher API cost if using paid models
After
Analysis:
claude --tools "Read,Glob,Grep,PowerShell" --strict-mcp-config --disallowedTools "mcp__*"Result:
34,821 bytes
73.4% smaller
Enough for code analysis
No edit capability
Coding:
claude --tools "Read,Write,Edit,Glob,Grep,PowerShell,TodoWrite" --strict-mcp-config --disallowedTools "mcp__*"Expected result:
Much smaller than full Claude Code
Enough for real coding
No irrelevant design/scheduling/workflow tools
Lessons learned
1. Claude Code is not a chat client
It is an agent runtime.
That runtime has a cost.
2. Tool schemas dominate request size
In my captured request, tool definitions were:
94,028 bytes
81.00% of compact JSON
This is the first thing to optimise.
3. /clear is not enough
It clears conversation history. It does not remove tools, system prompt, project memory, skills, or agent runtime instructions.
4. Local models suffer more from prompt bloat
Hosted Claude can benefit from prompt caching.
Local models and custom gateways may not.
5. Use task-specific profiles
One giant Claude Code configuration for every task is inefficient.
Use different tool sets for:
- simple chat
- analysis
- normal coding
- planning
- research
- skills
6. Keep PowerShell for Windows
On Windows, PowerShell should be the primary execution tool.
Do not include Bash unless the project needs it.
7. Do not optimise emotionally
Measure first.
In this case, the data was clear:
Tools were the problem.
Not CLAUDE.md. Not /clear. Not the actual prompt.
Final recommendation
For my daily Windows coding workflow with local Qwen, I would use this as the default:
claude --tools "Read,Write,Edit,Glob,Grep,PowerShell,TodoWrite" --strict-mcp-config --disallowedTools "mcp__*"For read-only analysis:
claude --tools "Read,Glob,Grep,PowerShell" --strict-mcp-config --disallowedTools "mcp__*"For simple questions:
ollama run qwen3.6:27b-mtp-q4_K_M "your question"Using full Claude Code for 7+5 is not clever. It is a waste of context, GPU time, and patience.
Claude Code is excellent when used as a coding agent.
It is the wrong hammer for tiny prompts.
💬 Comments & Reactions