Content Intelligence System: AI-Powered Content Analysis

How are LLMs, MCP protocols, and agent architectures transforming content analysis? A system combining semantic search, link analysis, and performance metrics.

Ceyhun Enki Aksan Entrepreneur, Maker

Jan 13, 2026

TL;DR: Content Intelligence System is a content analysis system combining semantic search (kpy), link analysis (klinks), performance metrics (kperf), and RAG orchestration (krag). LLMs automatically select tools to generate data-driven responses to natural language queries. Both local (Ollama) and cloud (Claude, OpenAI) models are supported.

Large Language Models (LLMs) and the protocols (MCP - Model Context Protocol), SDKs (Software Development Kit), and agent architectures we’ve started using in parallel are significantly changing how we work.

Content production no longer exists solely within the realm of content creation. SEO (Search Engine Optimization) approaches, user experience optimization, advertising, automation, cross-channel strategies… all of these are intertwined with user behaviors, search intents, and conversion journeys.

Making sense of this complex structure, identifying and eliminating friction and leakage points in funnels is becoming easier. Communication blockages between teams and data silos are gradually disappearing. With skills and agents, MCPs, goal-oriented insights and interpretations can be obtained.

A data silo refers to isolated data sets in different departments or systems that are not shared with each other. For example, the marketing team’s GA4 data and the sales team’s CRM data. While these data could become more valuable together, lacking connections between them also causes differences in strategies.

So, how does this work in practice?

Content Intelligence System

This system I created for my personal site you’re viewing through this post consists of four core tools:

Tool	Function	Technology
kpy	Semantic search	OpenAI/Qwen embedding + LanceDB
klinks	Link analysis	PageRank + graph analysis
kperf	Performance	GSC + GA4 (Google Analytics 4) + content graph
krag	RAG orchestration	Claude API + tool calling

Each tool analyzes the content ecosystem from a different perspective, and when used together, they reveal powerful insights.

kpy: Semantic Search

This tool enables semantic search by indexing content into a vector database using embedding models. When I search for “eCommerce dataLayer object structures”, all content related to the topic is listed with relevance scores, even if this exact term doesn’t appear in the title.

kpy search_content.py "GA4 e-commerce tracking" --lang tr --limit 5

This approach enables meaning-based search instead of keyword matching. An “Enhanced ecommerce” search also finds Turkish content containing “gelişmiş e-ticaret” (advanced e-commerce).

What is an embedding model? AI models that convert text into numerical vectors that preserve meaning. These vectors make “similar meaning” mathematically measurable. Since 2024, transformer-based and instruction-tuned models have come to the forefront with multilingual (1000+ languages) and multimodal (text-image-audio) capabilities.¹

Embedding model selection:

Model	Advantage	Disadvantage
OpenAI (`text-embedding-3-small`)	High quality, easy integration	API cost, data privacy
Qwen (`Qwen3-Embedding-4B`)	Local, free, privacy-first	GPU/RAM requirement, setup
Sentence Transformers	Lightweight, fast, multilingual	Lower accuracy

klinks: Link Analysis

Analyzes links between content to calculate PageRank, detect orphan content, and identify hub pages.

klinks analyze
klinks orphans
klinks top --limit 10

I can see at a glance which content has high site authority and which is isolated.

Metric	Description
PageRank	Content’s site-internal authority
In-degree	Number of incoming links
Out-degree	Number of outgoing links
Orphan	Content receiving no links
Hub	Content with many outgoing links

kperf: Performance Analysis

Combines GSC (Google Search Console) and GA4 data with the content graph to reveal opportunity points:

High traffic but low conversion
High PageRank but low traffic
High impressions but low CTR (Click-Through Rate)
Orphan but receiving organic traffic

kperf analyze --gsc data/gsc.csv --ga4 data/ga4.csv
kperf insights
kperf correlations

Visualization: D3.js Graph Viewer

All this data is visualized in an interactive D3.js-based graph viewer:

Node size: Represents PageRank value
Node color: Shows language and content type
Hover: Connected content becomes visible
Filters: Orphans or top performers can be filtered instantly

cd .claude/knowledge && python -m http.server 8765
# http://localhost:8765/graph-viewer.html

Vector Database: LanceDB

Reasons I chose LanceDB:

Embedded and serverless: Works local-first
No infrastructure required: No extra setup
Privacy-first: All data stays on machine
Fast: Arrow-based, columnar storage

import lancedb

db = lancedb.connect("vectors/")
table = db.create_table("content", data)

# Semantic search
results = table.search(query_embedding).limit(5).to_list()

krag: RAG (Retrieval-Augmented Generation) Integration

Why Was It Added?

Although kpy, klinks, and kperf tools are individually powerful, answering complex questions required manually running multiple commands and combining results. For example, for the question “which of my orphan content should I prioritize?”:

klinks orphans → orphan list
kperf analyze → traffic data
Manual comparison → insight

krag was added to automate this process.

What Does It Provide?

Before	After
Run 3+ commands	Single natural language question
Manually combine results	Claude automatically analyzes
Technical knowledge required	Conversational interface
Static outputs	Contextual insights

How Does It Work?

flowchart TB
    Q[Natural language question] --> API[LLM API]
    API --> TS[Tool Selection]
    TS --> SS[Semantic Search]
    TS --> LA[Link Analysis]
    TS --> PA[Performance Analytics]
    SS --> DR[Data Retrieval]
    LA --> DR
    PA --> DR
    DR --> AN[LLM analysis + insight]
    AN --> R[Response]

The system works integrated with the selected LLM API (Application Programming Interface) to answer natural language questions:

krag "Why are our orphan contents important?"
krag "Which blog posts have high traffic but low conversion?"
krag "What should my SEO priorities be?"

The LLM automatically decides which tool to use (agentic behavior):

User sends natural language question
LLM analyzes question and selects appropriate tools
Tools run (kpy, klinks, kperf), data returns
LLM combines data and responds with contextual insights

Full RAG: Retrieval (multiple data sources) + Augmented (enrichment with LLM) + Generation (natural language response). Thanks to agentic architecture, the model decides which tools to use on its own.

LLM Options

Option	Advantages	Disadvantages
Claude API	High quality, reliable tool calling	API cost, data privacy
OpenAI GPT-4	Wide ecosystem, good documentation	API cost
Ollama (Local)	Free, privacy-first	GPU requirement, lower quality
LM Studio	GUI, easy setup	Limited automation

Recommended local models for tool calling:²

Model	Size	Notes
Qwen 3	8B-72B	Multilingual, MTEB #1
Llama 3.3	70B	Meta’s latest model
Mistral	7B	Lightweight, fast

For more options:

Ollama Model Library - View available models with ollama list

HuggingFace Open LLM Leaderboard - Current benchmark results

Pay attention to “function calling” or “tool use” support when selecting models

Cloud and local options can be supported with dual mode structure:

# Cloud mode (default)
krag "Why are my orphan contents important?"

# Local mode (Ollama)
KRAG_LOCAL=true krag "Why are my orphan contents important?"

Extending with MCP

Model Context Protocol (MCP) is an open standard for providing custom tools to AI coding assistants. These tools can be integrated into all MCP-supporting assistants:

Assistant	MCP Support	Notes
Claude Code	Native	Anthropic’s reference implementation
Gemini CLI	Yes	Google’s CLI tool
Cursor	Yes	VS Code-based IDE
Windsurf	Yes	Codeium’s IDE

Resources for MCP ecosystem:

MCP Servers Directory - Ready-made MCP servers

Awesome MCP Servers - Community-curated list

MCP Specification - For writing your own MCP server

Example MCP server configuration:

{
  "mcpServers": {
    "ceaksan-knowledge": {
      "command": "python",
      "args": ["mcp_server.py"]
    }
  }
}

The MCP server provides these tools:

search_content: Semantic content search
get_voice_profile: Writing style guide
list_content: Content list
get_content_stats: Statistics

Practical Usage Scenarios

For SEO Specialist

klinks orphans           # Find isolated content
kperf insights          # See CTR opportunities

For Content Strategist

kpy search "universal analytics"  # Find related content
klinks analyze                    # Examine hub-spoke structure

For Growth Manager

kperf correlations      # PageRank-traffic relationship
kperf top --limit 20    # Top performers

GA4 MCP Integration

Google provides an official MCP server for working with Analytics data.³ With this integration:

Natural language queries: Questions like “Pages with most traffic in the last 30 days?” can be asked
Automatic report generation: AI assistant can analyze GA4 data and provide insights
Combining with Content Intelligence: PageRank + GA4 metrics = powerful content analysis

# Example usage
krag "Which content gets high traffic but has low engagement rate?"

Setup: You can access official documentation at Google Analytics MCP Docs.

Conclusion

LLMs, MCP protocols, and agent architectures are taking content analysis and optimization to a new level. When meaning-based discovery through semantic search, structure-based evaluation through link analysis, and impact-based measurement through performance metrics come together, it becomes possible to see a holistic picture of the content ecosystem.

This system can work completely privacy-first in a local environment if I want, or can become scalable remotely.

Footnotes

MTEB (Massive Text Embedding Benchmark) is the standard benchmark used to compare embedding models. For detailed comparison: MTEB Leaderboard ↩
Detailed information about Ollama tool calling: Ollama Tool Calling Docs ↩
Google Analytics MCP is the official MCP server providing access to GA4 data through AI assistants: Google Analytics MCP ↩

Member Zone