Search This Blog

Wednesday, June 17, 2026

Ways to Cut Your Claude Code Token Usage

 

Ways to Cut Your Claude Code Token Usage

Large language models are incredibly powerful, but they can also become surprisingly expensive when token usage spirals out of control. If you use Claude Code heavily for development, debugging, code reviews, or automation, reducing token consumption can dramatically lower costs while also improving response speed.

The good news: most teams waste tokens without realizing it.

This guide covers practical, real-world strategies to reduce Claude Code token usage without sacrificing output quality.


Why Token Usage Gets Expensive Fast

Every interaction with Claude Code consumes tokens from:

  • Your prompts

  • Conversation history

  • Uploaded files

  • Code context

  • Model responses

In software projects, context grows rapidly. A single debugging session can include:

  • Thousands of lines of source code

  • Multiple iterations of prompts

  • Long error logs

  • Repeated explanations

  • Full-file rewrites

Over time, this becomes expensive and slower.

Efficient prompting and workflow design matter more than most developers think.


1. Stop Sending Entire Files

One of the biggest token killers is pasting complete files when only a small section matters.

Bad Approach

“Here’s my entire 2,000-line React component. Fix this button alignment issue.”

Better Approach

Send only:

  • The relevant function

  • The failing component

  • The exact error

  • Minimal supporting context

Example

Instead of:

// entire application file

Use:

<Button className="primary-btn">

Plus:

“The button overflows on mobile devices below 400px width.”

Smaller context windows lead to lower token usage and often better answers.


2. Use Targeted Questions

Broad prompts generate broad responses.

High-Token Prompt

“Review this architecture and suggest improvements.”

Lower-Token Prompt

“Identify memory leak risks in this Redis worker implementation.”

Specificity reduces unnecessary analysis and keeps responses focused.


3. Avoid Repeating Context

Many users repeatedly resend the same information.

For example:

  • Project structure

  • Tech stack

  • Requirements

  • Previous explanations

Claude already has the conversation context in the current thread.

Instead of repeating everything:

Use

“Continue using the Express + PostgreSQL setup from earlier.”

Instead of:

“I’m building an Express app with PostgreSQL, JWT auth, Redis caching…”

Every repeated paragraph increases token costs.


4. Summarize Before Continuing

Long conversations become expensive because every new message includes prior context.

A smart technique is to periodically compress the discussion.

Example

Ask Claude:

“Summarize the current implementation decisions in 10 bullet points.”

Then start a fresh conversation using only that summary.

This dramatically reduces context-window bloat.


5. Request Concise Responses

Claude often defaults to highly detailed answers.

That’s useful sometimes — but expensive for routine tasks.

Try Prompts Like

  • “Answer briefly.”

  • “Only show the changed code.”

  • “Return diff only.”

  • “One paragraph maximum.”

  • “No explanation needed.”

Example

Instead of:

“Explain every optimization opportunity.”

Use:

“List the top 3 performance issues only.”

Shorter outputs = fewer output tokens.


6. Use Diffs Instead of Full Rewrites

Developers often ask Claude to rewrite entire files even when only a few lines need changing.

Expensive

“Rewrite this entire file with the fixes.”

Efficient

“Show only the modified sections.”

Or:

“Provide a unified diff.”

Example

- const timeout = 5000;
+ const timeout = 15000;

Diff-based workflows massively reduce token usage.


7. Split Large Tasks Into Smaller Sessions

Huge prompts create huge outputs.

Instead of:

“Build an entire authentication system with OAuth, RBAC, audit logging, and multi-tenancy.”

Break tasks into stages:

  1. Authentication schema

  2. JWT implementation

  3. OAuth integration

  4. RBAC middleware

  5. Audit logging

This improves:

  • Token efficiency

  • Output quality

  • Maintainability


8. Trim Stack Traces and Logs

Raw logs are extremely token-heavy.

Most debugging only requires:

  • The relevant error message

  • 10–20 surrounding lines

  • Important environment details

Avoid

Pasting:

  • Entire CI logs

  • Full Docker output

  • Massive stack traces

Instead

Extract:

  • Root exception

  • Relevant call stack

  • Reproduction steps


9. Use External Documentation References

Instead of repeatedly pasting API docs, summarize them once.

Example

Instead of:

“Here are 400 lines of API documentation…”

Use:

“Assume standard Stripe subscription API behavior.”

Or provide only the endpoint relevant to the problem.


10. Create Reusable Prompt Templates

Repeatedly crafting large prompts wastes tokens.

Build compact templates for common workflows.

Example Template

Task:
Bug fix only.

Constraints:
- Keep existing architecture
- Minimal changes
- Return diff only

Small reusable prompts compound savings over time.


11. Be Careful With Auto-Context Tools

Some IDE integrations automatically inject:

  • Entire repositories

  • Open tabs

  • Documentation

  • Dependency trees

This can silently explode token usage.

Review what your tooling actually sends to Claude.

Sometimes “smart context” is anything but smart.


12. Prefer Iteration Over Perfection

Many developers try to get the perfect answer in one massive prompt.

That usually costs more.

A better workflow:

  1. Get a rough solution

  2. Refine incrementally

  3. Improve specific parts

Smaller iterative prompts are typically more efficient than giant all-in-one requests.


13. Cache Stable Information

If certain information rarely changes, avoid re-sending it constantly.

Examples:

  • Coding standards

  • Architecture rules

  • Database schema summaries

  • Deployment environments

Store them externally and reference concise summaries instead.


14. Watch Output Formatting

Markdown-heavy formatting can increase token usage significantly.

Especially:

  • Giant tables

  • Excessive comments

  • Large JSON payloads

  • Repeated code blocks

Ask for lean formatting when possible.

Example

“Plain text only.”

Or:

“Minimal formatting.”


15. Know When Smaller Models Are Enough

Not every task requires the most advanced reasoning model.

Simple tasks like:

  • Regex fixes

  • Syntax cleanup

  • Basic refactoring

  • Documentation formatting

can often run on cheaper models.

Reserve premium reasoning for:

  • Architecture

  • Complex debugging

  • System design

  • Deep analysis


Final Thoughts

Most Claude Code token waste comes from:

  • Oversharing context

  • Repeating information

  • Requesting unnecessarily large outputs

  • Poor workflow structure

A few small habit changes can cut token usage dramatically while improving speed and clarity.

The most efficient AI workflows are usually:

  • Focused

  • Iterative

  • Minimal

  • Context-aware

Reducing tokens isn’t just about saving money.

It also makes AI-assisted development faster, cleaner, and more maintainable.


Quick Token Reduction Checklist

✅ Send only relevant code
✅ Ask focused questions
✅ Request concise answers
✅ Use diffs instead of rewrites
✅ Trim logs and stack traces
✅ Summarize long conversations
✅ Split giant tasks into smaller steps
✅ Avoid repeating context
✅ Review IDE auto-context behavior
✅ Use reusable prompt templates

Small optimizations add up quickly — especially for heavy daily Claude Code users.

Tuesday, May 12, 2026

Skills vs MCP: Understanding the Difference in Modern AI Agents

 

Skills vs MCP: Understanding the Difference in Modern AI Agents

Artificial Intelligence agents are evolving rapidly. As teams build more capable AI systems, two concepts appear repeatedly in discussions, frameworks, and architectures:

  • Skills

  • MCP (Model Context Protocol)

They are related, but they solve very different problems.

If you are building AI assistants, autonomous agents, internal copilots, or workflow automation systems, understanding the distinction is essential.


The Short Version

Here’s the simplest way to think about it:

ConceptWhat It Means
SkillsWhat the AI can do
MCPHow the AI connects to tools and data

Or even more simply:

Skills provide intelligence and workflows.

MCP provides connectivity and interoperability.


What Are Skills?

A Skill is a reusable capability or behavior that an AI agent can perform.

Think of Skills as specialized expertise modules.

Examples include:

  • Summarizing documents

  • Writing SQL queries

  • Reviewing code

  • Creating Jira tickets

  • Generating reports

  • Customer support workflows

  • Security incident analysis

A Skill typically includes:

  • Instructions or prompts

  • Logic and workflows

  • Tool usage rules

  • Context handling

  • Decision-making behavior

  • Sometimes executable code

Skills are generally:

  • Task-oriented

  • Domain-specific

  • Reusable

  • Workflow-driven


Example of a Skill

Imagine a Customer Support Skill.

This Skill might:

  1. Read incoming Zendesk tickets

  2. Search the knowledge base

  3. Identify customer sentiment

  4. Draft a reply

  5. Escalate complex issues to humans

The AI assistant invokes this Skill whenever support-related tasks appear.

The Skill defines behavior.


What Is MCP?

MCP (Model Context Protocol) is a standardized protocol that allows AI models and external systems to communicate in a structured way.

Introduced by entity["organization","Anthropic","AI company"], MCP aims to create a common language between AI assistants and tools.

MCP defines:

  • Tool discovery

  • Context exchange

  • Structured tool calls

  • Permissions and capabilities

  • Standard communication schemas

You can think of MCP as infrastructure for AI integrations.


Why MCP Matters

Before MCP, every AI integration was often custom-built.

That created problems:

  • Different APIs everywhere

  • Inconsistent tool definitions

  • Hard-to-maintain integrations

  • Vendor lock-in

  • Duplicate engineering effort

MCP standardizes the connection layer.

Just like:

  • HTTP standardized web communication

  • USB standardized hardware connectivity

  • ODBC standardized database access

MCP standardizes AI-to-tool communication.


The Core Difference

SkillsMCP
A capabilityA protocol
Defines behaviorDefines communication
Focuses on workflowsFocuses on integrations
Business logic orientedInfrastructure oriented
Tells the AI what to doTells the AI how to connect

This distinction is extremely important.

Many people confuse Skills and MCP because both are involved in AI tooling.

But they operate at different layers.


A Real-World Analogy

Imagine building a smart office assistant.

Skills are like applications

Examples:

  • Calendar assistant

  • Meeting summarizer

  • Expense reporting workflow

  • IT helpdesk automation

These define functionality.

MCP is like USB-C or HTTP

It defines how systems connect:

  • Slack integration

  • GitHub integration

  • Database access

  • CRM connectivity

MCP is not the workflow itself.

It is the standardized bridge.


How Skills and MCP Work Together

The most powerful AI systems use both.

A Skill often depends on multiple external tools.

Instead of building custom integrations every time, the Skill accesses those tools through MCP.

Example architecture:

AI Assistant
   ↓
Skill: Research Analyst
   ↓
Uses MCP tools:
   - GitHub MCP server
   - Slack MCP server
   - Database MCP server

In this setup:

  • The Skill handles reasoning and orchestration

  • MCP handles standardized tool access


When Should You Use Skills?

Use Skills when you need:

1. Reusable Workflows

Examples:

  • Invoice processing

  • HR onboarding

  • Compliance review

  • Security operations

2. Domain Expertise

Examples:

  • Legal analysis

  • Medical coding

  • Financial reporting

  • Software architecture reviews

3. Multi-Step Agent Logic

Examples:

  • Gather information

  • Analyze data

  • Generate output

  • Notify stakeholders

4. Business-Specific Behavior

Examples:

  • Company tone guidelines

  • Escalation rules

  • Approval workflows

  • Internal policy enforcement

Skills are ideal for encoding operational intelligence.


When Should You Use MCP?

Use MCP when you need:

1. Standardized Integrations

Examples:

  • Connecting to Slack

  • Connecting to GitHub

  • Accessing databases

  • Integrating CRMs and internal systems

2. Tool Portability

One MCP-compatible tool can work across many AI platforms.

3. Reduced Integration Complexity

Instead of custom connectors everywhere, systems speak the same protocol.

4. Shared Tool Ecosystems

Multiple agents can reuse the same MCP servers and integrations.

MCP is ideal for scalable AI infrastructure.


Typical Modern AI Agent Stack

Most advanced AI systems are moving toward an architecture like this:

User
 ↓
AI Agent
 ↓
Skills Layer
 ↓
MCP Client
 ↓
MCP Servers
 ↓
External Tools & Data

This creates:

  • Modular design

  • Easier maintenance

  • Better interoperability

  • Faster integration development

  • Reusable capabilities


Common Misunderstandings

“MCP replaces Skills”

No.

MCP standardizes connectivity.

You still need Skills for reasoning, workflows, and business behavior.


“Skills are just prompts”

Not necessarily.

Modern Skills can include:

  • Decision logic

  • Tool orchestration

  • State handling

  • Validation rules

  • Multi-agent coordination

  • Custom execution flows

They are often much more sophisticated than simple prompting.


“MCP is only for AI agents”

Primarily yes, but the bigger idea is standardized machine-tool communication.

The ecosystem is still evolving.


Which One Should You Build First?

That depends on your goal.

Build Skills first if:

  • You are solving business workflows

  • You want task automation

  • You need specialized agent behavior

  • You are experimenting with AI use cases

Build MCP integrations first if:

  • You need scalable infrastructure

  • You support multiple agents/tools

  • You want interoperability

  • You are building a platform ecosystem

In practice, mature systems eventually use both.


The Future of AI Systems

The industry is moving toward:

  • Modular AI architectures

  • Shared tool ecosystems

  • Standardized protocols

  • Reusable agent capabilities

In that future:

  • Skills become the intelligence layer

  • MCP becomes the interoperability layer

This separation is likely to become a foundational design pattern for enterprise AI systems.


Final Takeaway

Here’s the easiest way to remember the difference:

SkillsMCP
IntelligenceConnectivity
WorkflowsIntegrations
BehaviorCommunication
What the AI doesHow the AI reaches tools

Or in one sentence:

Skills tell the AI what to do.

MCP tells the AI how to access tools and data.

Understanding both concepts is essential for designing scalable, maintainable, and powerful AI agents.

How to Add an MCP Server to Claude in VS Code

If you’re using Claude with Visual Studio Code and want to connect external tools like GitHub, databases, Notion, Slack, or your local filesystem, MCP (Model Context Protocol) is the feature you need.

This guide walks through how to add an MCP server to Claude inside VS Code step by step.

What is MCP?
MCP (Model Context Protocol) allows Claude to connect with external tools and services.

With MCP servers, Claude can:

- Read and edit files
- Access GitHub repositories
- Query databases
- Interact with APIs
- Connect to productivity tools
- Automate browser tasks

Prerequisites:

Before starting, make sure you have:
- Node.js installed
- Visual Studio Code installed
- Claude Code CLI access

Step 1: Install Claude Code

Open your terminal and install Claude Code globally:

npm install -g @anthropic-ai/claude-code

After installation, launch Claude once with following command:

claude

Step 2: Add an MCP Server

To add a Local Filesystem MCP Server, run:

claude mcp add filesystem npx @modelcontextprotocol/server-filesystem .

To add an HTTP MCP Server, run:

claude mcp add --transport http myserver https://example.com/mcp

Step 3: Verify MCP Servers

To see all configured MCP servers, run following:

claude mcp list

Inside Claude Code, you can also type:

/mcp

Step 4: Open VS Code

Navigate to your project:

code .

Inside the VS Code terminal, launch Claude:

claude

Step 5: Create a Shared .mcp.json Configuration
Create a file named .mcp.json with the following:

{
  "mcpServers": {
    "filesystem": {
      "command": "npx",
      "args": [
        "@modelcontextprotocol/server-filesystem",
        "."
      ]
    }
  }
}

Popular MCP Servers Developers Use

- Filesystem
- GitHub
- PostgreSQL
- Notion
- Slack
- Puppeteer
- Docker
- Jira


Example: Add GitHub MCP Server

claude mcp add github npx @modelcontextprotocol/server-github

Troubleshooting

If MCP servers are not showing, restart Claude:

exit
claude

Then verify again:

claude mcp list

Final Thoughts

MCP transforms Claude from a standalone AI assistant into a fully integrated development companion. Once connected, Claude can work directly with your files, repositories, databases, and developer tools — all from inside VS Code.


Cheers,
Kapil

Popular Posts