Building a production AI-powered MCP server required balancing multiple competing constraints. This technical foundation enabled fantastic business outcomes: 44% faster feature delivery, 91% component adoption, & estimated $2.3M in annual productivity savings.
Rather than building a complete system upfront, we followed a deliberate progression from minimal viable solution to production-grade infrastructure. This approach validated the concept early and enabled rapid iteration based on real usage feedback.
Please note: Technical details are generalized to protect intellectual property while demonstrating engineering approach and architectural decisions.
Production-grade infrastructure that scales from prototype to thousands of users—real-world platform engineering experience.
Before building the full production system, we implemented a quick win that was incredibly well received: llms.txt for repository-level guidance
A plain text file at the repo root listing rules, preferred APIs/packages, token usage, and links to canonical docs. Editors like GitHub Copilot read it to ground suggestions and avoid hallucinations.
Purpose: Guide editor AIs to use real design system patterns
Effort: 2-3 hours to implement and deploy
// llms.txt warns against inline hex colors
<button style={{
backgroundColor: '#ff6b6b',
padding: '12px 24px',
borderRadius: '4px' }} >
Save
</button>
// Suggested by llms.txt
import { colors, spacing, borderRadius } from '@company/tokens'
<button style={{
backgroundColor: colors.primary[500],
padding: spacing.md spacing.lg,
borderRadius: borderRadius.sm }} aria-busy >
Saving…
</button>
// Tab to auto-completeThe goal was to validate whether an MCP server could effectively ground AI responses in real design system knowledge. The first proof of concept focused on a minimal viable implementation.
VS Code • Figma Dev Mode • CLI
The initial implementation validated the flow: query → retrieve → answer with code samples grounded in real tokens and components.
Single tool: search_design_docs
import { createServer, tool } from 'mcp-framework';
import fs from 'node:fs';
import path from 'node:path';
import { cosineSimilarity, embed } from './pocembed';
// Load precomputed vectors created by the training script
const vectors = JSON.parse(
fs.readFileSync(path.join(process.cwd(), 'vectors.json'), 'utf8')
) as Array<{
id: string;
text: string;
vector: number[];
source: string;
}>;
const searchDesignDocs = tool({
name: 'search_design_docs',
description: 'Semantic search over design system docs and guidelines',
inputSchema: {
type: 'object',
required: ['query'],
properties: {
query: { type: 'string' },
k: { type: 'number', default: 5 }
},
},
handler: async ({ query, k = 5 }) => {
const q = await embed(query);
const scored = vectors
.map((d) => ({ ...d, score: cosineSimilarity(q, d.vector) }))
.sort((a, b) => b.score - a.score)
.slice(0, k);
return {
results: scored.map((s) => ({
source: s.source,
score: Number(s.score.toFixed(3)),
text: s.text
})),
};
},
});
const server = createServer({
name: 'company-design-system-mcp-poc',
version: '0.0.1',
tools: [searchDesignDocs],
});
server.start(8080);Index design system docs into vectors.json
// scripts/poc-train.ts
import fs from 'node:fs';
import path from 'node:path';
import matter from 'gray-matter';
import { embed } from './pocembed';
const docsDir = path.join(process.cwd(), 'docs/design-system');
function chunk(text: string, size = 800, overlap = 120) {
const chunks: string[] = [];
for (let i = 0; i < text.length; i += size - overlap) {
chunks.push(text.slice(i, i + size));
}
return chunks;
}
async function main() {
const files = fs.readdirSync(docsDir)
.filter((f) => f.endsWith('.md') || f.endsWith('.mdx'));
const out: any[] = [];
for (const file of files) {
const full = path.join(docsDir, file);
const raw = fs.readFileSync(full, 'utf8');
const { content } = matter(raw);
for (const text of chunk(content)) {
const vector = await embed(text);
out.push({
id: `${file}-${out.length}`,
source: file,
text,
vector
});
}
}
fs.writeFileSync(
path.join(process.cwd(), 'vectors.json'),
JSON.stringify(out)
);
}
main();Invoke search_ds and inspect request/response
Grounding search_ds response with component knowledge
Use the Button component in primary intent with an isLoading state.
Announce progress with aria-busy and disable user interaction to prevent duplicate submissions.
<Button intent="primary" aria-busy={loading || undefined} disabled={loading}>
{loading ? 'Saving…' : 'Save'}
</Button>The system is built on a modern, scalable architecture centered around the Model Context Protocol (MCP). At its core, a Node.js/TypeScript server exposes intelligent design system tools through the MCP, enabling seamless integration with IDEs, CLI tools, & CI/CD pipelines.
The foundation of accurate AI responses starts with comprehensive data collection. We aggregate design system knowledge from six primary sources - documentation, Storybook examples, Figma specifications, GitHub code patterns, support tickets, & developer conversations - to build a complete picture of how the design system is documented, implemented, & used in practice.
Raw documentation is transformed into AI-ready knowledge through a dual-track processing pipeline. OpenAI's text-embedding model converts all content into vectors stored in Pinecone for lightning-fast semantic search, LangChain orchestrates the RAG (Retrieval-Augmented Generation) pipeline to intelligently retrieve and synthesize the most relevant context for each query.
The heart of the system is a high-performance Node.js server built on Fastify that implements the Model Context Protocol specifications. It exposes four primary tools search_components, validate, generate_code, & check_accessibility using GPT-4 & fine-tuned models for code generation & compliance validation.
search_dsvalidategeneratecheck_a11yanalyticsThe MCP server's true power comes from meeting developers where they work. Through a VS Code extension, GitHub Actions, & an analytics dashboard, the system delivers contextually accurate, actionable guidance directly in existing workflows. From inline suggestions during coding to automated PR validation to team-wide adoption insights.
These challenges and solutions reflect the technical journey from proof-of-concept to production deployment.
Initial embeddings struggled with design system terminology and domain knowledge.
AI occasionally generated component APIs that didn't exist in the design system.
OpenAI API costs would have exceeded $15K/month at full adoption.
A small overview of the infrastructure and operations setup.
A critical component of production readiness was establishing a robust data pipeline and training infrastructure.
This continuous learning approach ensured the system stayed current as the design system evolved.
Achieving production-grade performance required systematic optimization:
Production operations required comprehensive monitoring. We built a multi-layered observability strategy tracking everything from infrastructure health to business metrics:
The technical implementation directly improved developer experience.
Lessons learned from building and scaling a production AI system—technical insights and engineering practices that shaped our approach
The key to success was the deliberate progression from simple proof-of-concept to production-grade infrastructure, guided by real usage patterns and user feedback at every step.
See the bigger picture: Explore the detailed impacts and technical implementation