TL;DR: Content Intelligence System is a content analysis system combining semantic search (kpy), link analysis (klinks), performance metrics (kperf), and RAG orchestration (krag). LLMs automatically select tools to generate data-driven responses to natural language queries. Both local (Ollama) and cloud (Claude, OpenAI) models are supported.
Large Language Models (LLMs) and the protocols (MCP - Model Context Protocol), SDKs (Software Development Kit), and agent architectures we’ve started using in parallel are significantly changing how we work.
Content production no longer exists solely within the realm of content creation. SEO (Search Engine Optimization) approaches, user experience optimization, advertising, automation, cross-channel strategies… all of these are intertwined with user behaviors, search intents, and conversion journeys.
Making sense of this complex structure, identifying and eliminating friction and leakage points in funnels is becoming easier. Communication blockages between teams and data silos are gradually disappearing. With skills and agents, MCPs, goal-oriented insights and interpretations can be obtained.
A data silo refers to isolated data sets in different departments or systems that are not shared with each other. For example, the marketing team’s GA4 data and the sales team’s CRM data. While these data could become more valuable together, lacking connections between them also causes differences in strategies.
So, how does this work in practice?
Content Intelligence System
This system I created for my personal site you’re viewing through this post consists of four core tools:
| Tool | Function | Technology |
|---|---|---|
| kpy | Semantic search | OpenAI/Qwen embedding + LanceDB |
| klinks | Link analysis | PageRank + graph analysis |
| kperf | Performance | GSC + GA4 (Google Analytics 4) + content graph |
| krag | RAG orchestration | Claude API + tool calling |
Each tool analyzes the content ecosystem from a different perspective, and when used together, they reveal powerful insights.
kpy: Semantic Search
This tool enables semantic search by indexing content into a vector database using embedding models. When I search for “eCommerce dataLayer object structures”, all content related to the topic is listed with relevance scores, even if this exact term doesn’t appear in the title.
kpy search_content.py "GA4 e-commerce tracking" --lang tr --limit 5
This approach enables meaning-based search instead of keyword matching. An “Enhanced ecommerce” search also finds Turkish content containing “gelişmiş e-ticaret” (advanced e-commerce).
What is an embedding model? AI models that convert text into numerical vectors that preserve meaning. These vectors make “similar meaning” mathematically measurable. Since 2024, transformer-based and instruction-tuned models have come to the forefront with multilingual (1000+ languages) and multimodal (text-image-audio) capabilities.1
Embedding model selection:
| Model | Advantage | Disadvantage |
|---|---|---|
OpenAI (text-embedding-3-small) | High quality, easy integration | API cost, data privacy |
Qwen (Qwen3-Embedding-4B) | Local, free, privacy-first | GPU/RAM requirement, setup |
| Sentence Transformers | Lightweight, fast, multilingual | Lower accuracy |
klinks: Link Analysis
Analyzes links between content to calculate PageRank, detect orphan content, and identify hub pages.
klinks analyze
klinks orphans
klinks top --limit 10
I can see at a glance which content has high site authority and which is isolated.
| Metric | Description |
|---|---|
| PageRank | Content’s site-internal authority |
| In-degree | Number of incoming links |
| Out-degree | Number of outgoing links |
| Orphan | Content receiving no links |
| Hub | Content with many outgoing links |
kperf: Performance Analysis
Combines GSC (Google Search Console) and GA4 data with the content graph to reveal opportunity points:
- High traffic but low conversion
- High PageRank but low traffic
- High impressions but low CTR (Click-Through Rate)
- Orphan but receiving organic traffic
kperf analyze --gsc data/gsc.csv --ga4 data/ga4.csv
kperf insights
kperf correlations
Visualization: D3.js Graph Viewer
All this data is visualized in an interactive D3.js-based graph viewer:
- Node size: Represents PageRank value
- Node color: Shows language and content type
- Hover: Connected content becomes visible
- Filters: Orphans or top performers can be filtered instantly
cd .claude/knowledge && python -m http.server 8765
# http://localhost:8765/graph-viewer.html
Vector Database: LanceDB
Reasons I chose LanceDB:
- Embedded and serverless: Works local-first
- No infrastructure required: No extra setup
- Privacy-first: All data stays on machine
- Fast: Arrow-based, columnar storage
import lancedb
db = lancedb.connect("vectors/")
table = db.create_table("content", data)
# Semantic search
results = table.search(query_embedding).limit(5).to_list()
krag: RAG (Retrieval-Augmented Generation) Integration
Why Was It Added?
Although kpy, klinks, and kperf tools are individually powerful, answering complex questions required manually running multiple commands and combining results. For example, for the question “which of my orphan content should I prioritize?”:
klinks orphans→ orphan listkperf analyze→ traffic data- Manual comparison → insight
krag was added to automate this process.
What Does It Provide?
| Before | After |
|---|---|
| Run 3+ commands | Single natural language question |
| Manually combine results | Claude automatically analyzes |
| Technical knowledge required | Conversational interface |
| Static outputs | Contextual insights |
How Does It Work?
flowchart TB
Q[Natural language question] --> API[LLM API]
API --> TS[Tool Selection]
TS --> SS[Semantic Search]
TS --> LA[Link Analysis]
TS --> PA[Performance Analytics]
SS --> DR[Data Retrieval]
LA --> DR
PA --> DR
DR --> AN[LLM analysis + insight]
AN --> R[Response]
The system works integrated with the selected LLM API (Application Programming Interface) to answer natural language questions:
krag "Why are our orphan contents important?"
krag "Which blog posts have high traffic but low conversion?"
krag "What should my SEO priorities be?"
The LLM automatically decides which tool to use (agentic behavior):
- User sends natural language question
- LLM analyzes question and selects appropriate tools
- Tools run (
kpy,klinks,kperf), data returns - LLM combines data and responds with contextual insights
Full RAG: Retrieval (multiple data sources) + Augmented (enrichment with LLM) + Generation (natural language response). Thanks to agentic architecture, the model decides which tools to use on its own.
LLM Options
| Option | Advantages | Disadvantages |
|---|---|---|
| Claude API | High quality, reliable tool calling | API cost, data privacy |
| OpenAI GPT-4 | Wide ecosystem, good documentation | API cost |
| Ollama (Local) | Free, privacy-first | GPU requirement, lower quality |
| LM Studio | GUI, easy setup | Limited automation |
Recommended local models for tool calling:2
| Model | Size | Notes |
|---|---|---|
| Qwen 3 | 8B-72B | Multilingual, MTEB #1 |
| Llama 3.3 | 70B | Meta’s latest model |
| Mistral | 7B | Lightweight, fast |
For more options:
- Ollama Model Library - View available models with
ollama list- HuggingFace Open LLM Leaderboard - Current benchmark results
- Pay attention to “function calling” or “tool use” support when selecting models
Cloud and local options can be supported with dual mode structure:
# Cloud mode (default)
krag "Why are my orphan contents important?"
# Local mode (Ollama)
KRAG_LOCAL=true krag "Why are my orphan contents important?"
Extending with MCP
Model Context Protocol (MCP) is an open standard for providing custom tools to AI coding assistants. These tools can be integrated into all MCP-supporting assistants:
| Assistant | MCP Support | Notes |
|---|---|---|
| Claude Code | Native | Anthropic’s reference implementation |
| Gemini CLI | Yes | Google’s CLI tool |
| Cursor | Yes | VS Code-based IDE |
| Windsurf | Yes | Codeium’s IDE |
Resources for MCP ecosystem:
- MCP Servers Directory - Ready-made MCP servers
- Awesome MCP Servers - Community-curated list
- MCP Specification - For writing your own MCP server
Example MCP server configuration:
{
"mcpServers": {
"ceaksan-knowledge": {
"command": "python",
"args": ["mcp_server.py"]
}
}
}
The MCP server provides these tools:
search_content: Semantic content searchget_voice_profile: Writing style guidelist_content: Content listget_content_stats: Statistics
Practical Usage Scenarios
For SEO Specialist
klinks orphans # Find isolated content
kperf insights # See CTR opportunities
For Content Strategist
kpy search "universal analytics" # Find related content
klinks analyze # Examine hub-spoke structure
For Growth Manager
kperf correlations # PageRank-traffic relationship
kperf top --limit 20 # Top performers
GA4 MCP Integration
Google provides an official MCP server for working with Analytics data.3 With this integration:
- Natural language queries: Questions like “Pages with most traffic in the last 30 days?” can be asked
- Automatic report generation: AI assistant can analyze GA4 data and provide insights
- Combining with Content Intelligence: PageRank + GA4 metrics = powerful content analysis
# Example usage
krag "Which content gets high traffic but has low engagement rate?"
Setup: You can access official documentation at Google Analytics MCP Docs.
Conclusion
LLMs, MCP protocols, and agent architectures are taking content analysis and optimization to a new level. When meaning-based discovery through semantic search, structure-based evaluation through link analysis, and impact-based measurement through performance metrics come together, it becomes possible to see a holistic picture of the content ecosystem.
This system can work completely privacy-first in a local environment if I want, or can become scalable remotely.
Further Reading
- LanceDB Documentation
- OpenAI Embeddings Guide
- Model Context Protocol (MCP)
- Semantic SEO 2025 Strategy Guide
- Qwen3 Embedding Models
Footnotes
- MTEB (Massive Text Embedding Benchmark) is the standard benchmark used to compare embedding models. For detailed comparison: MTEB Leaderboard ↩
- Detailed information about Ollama tool calling: Ollama Tool Calling Docs ↩
- Google Analytics MCP is the official MCP server providing access to GA4 data through AI assistants: Google Analytics MCP ↩