How much faster is Polars than pandas for pivot operations?

According to benchmarks, Polars runs 1.6 to 8.6 times faster than pandas for grouping and pivot operations depending on data size. Approximately 8.6x faster for 100 million row datasets and approximately 1.6x faster for smaller 10,000 row datasets.

Why should pre-pivoted summaries be sent to AI agents instead of raw data?

Sending raw data quickly consumes token limits and increases hallucination risk. Pre-pivoted summaries with added flags reduce token costs by 60-80% while also improving model accuracy.

How can pivot tables be created with Excel Copilot?

Microsoft 365 Copilot Agent Mode (generally available since December 2025) enables pivot table creation using natural language. For example, the command 'Create a quarterly sales pivot by region' causes Copilot to analyze the data and automatically create the appropriate PivotTable.

Can BigQuery data be pivoted in Google Sheets?

Yes. Through Google Connected Sheets integration, billions of rows of BigQuery data can be analyzed as pivot tables in the Sheets interface. The October 2025 update increased pivot table row capacity to 200,000.

What problems do LLMs face when working with pivot table data?

LLMs show approximately 50% accuracy in table transposition (near random chance) and 59-82% numerical hallucination rates. The solution is to have LLMs generate code rather than perform calculations directly, using deterministic tools (pandas, DuckDB) for computation.

What is the advantage of analyzing e-commerce data with local LLMs?

Local LLMs (Ollama + Qwen2.5-Coder) ensure KVKK compliance as no data leaves the premises. Marginal cost is near zero, and with hybrid architecture, 85-95% cost reduction compared to cloud LLMs can be achieved.

Pivot Table, Data Analytics and AI Integration

A comprehensive guide from pivot table fundamentals to Excel Copilot and Google Gemini integration, Python Pandas/Polars performance to BigQuery GA4 pivot operations, WooCommerce RFM analysis to privacy-focused analytics with local LLMs, LLM limitations with tabular data and token optimization strategies.

Ceyhun Enki Aksan Entrepreneur, Maker

Feb 9, 2026

TL;DR

Platform	2020	2026
Excel	Manual PivotTable	Copilot Agent Mode (natural language)
Google Sheets	Basic pivot	Connected Sheets + BigQuery (200K rows)
Python	pandas pivot_table()	Polars (5-10x faster), DuckDB (SQL PIVOT)
R	spread/gather	pivot_wider/pivot_longer (tidyr)
JavaScript	PivotTable.js	AG Grid, Flexmonster, Arquero
AI	None	Semantic summarization, token optimization

The first version of this article was published in 2020 and covered basic pivot table creation using Excel and Google Sheets. Since then, pivot table has evolved from a “row and column transposition” tool to a data engine at the center of the data analytics ecosystem. Especially with the proliferation of AI agents, pivot table is no longer the final result but serves as an intermediate layer that artificial intelligence uses to make sense of data.

Pivot Table Fundamentals

A Pivot Table (Summary Table) is a tool used to analyze, summarize, and report records in a data table. Its fundamental function is to designate a column as a grouping key, transform rows into columns, and apply aggregation functions (sum, average, count).

Through this process, you can:

Summarize large data tables with grouping, filtering, and conditional formatting
Calculate totals and subtotals of numerical data at category and subcategory levels
Adjust detail scope by expanding or collapsing operations
Make comparisons using different calculation functions (numerical, text, date)

Key rules when creating pivot tables: columns must have headers, each column should contain a single data type, and data entry consistency must be maintained (e.g., “Email” should not appear as “E-mail” in another row).

Modern Excel: Power Pivot, DAX and Copilot

Power Pivot and Data Model

In 2020, Excel’s row limit (1,048,576) was one of the biggest constraints when working with pivot tables. Power Pivot’s Data Model eliminates this limit: the xVelocity (VertiPaq) in-memory columnar engine supports approximately 2 billion rows per table¹. Data is stored in the backend Data Model rather than the Excel worksheet and reported via pivot.

DAX (Data Analysis Expressions)

DAX, which comes with Power Pivot, takes pivot table calculations beyond standard Excel formulas²:

DAX Function	Description	Use Case
`CALCULATE`	Dynamic aggregation by changing filter context	Conditional sales totals
`SUMX`	Row-by-row iteration and summation	`SUMX(Sales, Sales[Qty] * Sales[Margin])`
`RELATED`	Fetch data from related tables	Multi-table pivot
`SAMEPERIODLASTYEAR`	Same period last year comparison	YoY analysis
`ALL / ALLEXCEPT`	Remove filters for grand totals	Percentage calculations

Copilot Agent Mode

Agent Mode, generally available since December 2025, brings pivot table creation to natural language in Excel³:

PivotTables can be automatically created with commands like “Create a quarterly sales pivot by region”
User control is provided at planning, creation, and validation stages
OpenAI GPT 5.2 or Anthropic Claude Opus 4.5 models can be used in the backend
The =COPILOT() formula enables cell-level forecasting, summarization, and aggregation

Excel’s former “Analyze Data” (App Skills) feature is being removed by late February 2026; Agent Mode and Analyst are taking its place.

Google Sheets: Connected Sheets and Gemini

Connected Sheets + BigQuery

Google Connected Sheets brings BigQuery’s scale to the Sheets interface. With the October 2025 update⁴:

Pivot table row capacity increased from 100,000 to 200,000
Drill-down feature added by double-clicking pivot cells
CSV files from Drive can be imported directly to BigQuery and analyzed via Connected Sheets

This enables billions of rows of e-commerce data to be analyzed as pivot tables in the Sheets interface.

Gemini Integration

With the May and October 2025 updates, Gemini in Google Sheets enables⁵:

Creating pivot tables with natural language (“Create a pivot table showing sum of sales by region”)
Multi-step execution from a single prompt (data cleaning + formatting + pivot creation)
Cross-pivot analysis across multiple tables
Focused analysis on selected ranges

Google Sheets Limits

Limit	Value
Cells	10 million
Columns	18,278
Tabs	200
Characters per cell	50,000

Python: pandas, Polars and DuckDB

pandas pivot_table()

pandas 3.0.0 (January 2026) introduces a significant behavioral change to pivot_table(): the observed parameter now defaults to True; only observed categories are shown. Copy-on-Write has also become the default and only mode⁶.

import pandas as pd

df = pd.DataFrame({
    'Date': ['2026-01', '2026-01', '2026-02', '2026-02'],
    'Category': ['Electronics', 'Fashion', 'Electronics', 'Fashion'],
    'Sales': [1200, 450, 1500, 300],
    'Returns': [1, 2, 0, 5]
})

pivot = df.pivot_table(
    index='Category',
    columns='Date',
    values='Sales',
    aggfunc='sum'
)

Polars: The Performance Revolution

Polars 1.38.1 (February 2026) is a DataFrame library written in Rust, positioned as an alternative to pandas. It runs 1.6 to 8.6 times faster for pivot operations depending on data size⁷:

Data Size	pandas	Polars	Difference
10K rows	Baseline	~1.6x faster	Small difference
100M rows	Baseline	~8.6x faster	Significant difference
1 GB CSV loading	Baseline	~5x faster, 87% less memory	Critical difference

import polars as pl

df = pl.DataFrame({
    'Category': ['Electronics', 'Fashion', 'Electronics', 'Fashion'],
    'Month': ['January', 'January', 'February', 'February'],
    'Sales': [1200, 450, 1500, 300]
})

pivot = df.pivot(
    on='Month',
    index='Category',
    values='Sales',
    aggregate_function='sum'
)

Polars’ streaming engine introduced in 2025 provides an additional 3-7x improvement over the in-memory engine. For scenarios where e-commerce data reaches millions of rows, Polars is the biggest best practice of 2026.

DuckDB: Pivot with SQL

DuckDB 1.4.4 (January 2026) offers first-class PIVOT/UNPIVOT SQL syntax⁸:

-- Simple pivot
PIVOT sales ON month USING sum(amount);

-- SQL standard syntax
SELECT * FROM sales
PIVOT (sum(amount) AS total FOR month IN ('January', 'February', 'March'));

DuckDB can run SQL directly on pandas DataFrames and provides advantages in memory management for large datasets.

R: tidyverse and data.table

pivot_wider() and pivot_longer()

Introduced in tidyr 1.0.0 (September 2019), pivot_wider() and pivot_longer() replaced the old spread() and gather() functions⁹. The naming is more intuitive: pivot_longer for long format, pivot_wider for wide format.

library(tidyr)

# Convert long format to wide format (pivot)
df %>%
  pivot_wider(
    names_from = Month,
    values_from = Sales,
    values_fn = sum
  )

data.table Alternative

data.table 1.18.0 (December 2025) offers reshaping operations with dcast() (wide) and melt() (long). Thanks to internal fast radix sorting and binary search, it runs approximately 5 times faster than tidyr¹⁰.

library(data.table)

dcast(dt, Category ~ Month, value.var = "Sales", fun.aggregate = sum)

JavaScript: Browser-Based Pivot

Current State

Tool	Status	Capacity	License
PivotTable.js	Unmaintained (~7 years since last update)	Small data	MIT
WebDataRocks	Active	~1 MB	Free
Flexmonster	Active	~1 GB, millions of rows	Commercial
AG Grid (Enterprise)	Active	Server-side, unlimited	Commercial
Arquero	Active	1M+ rows (in browser)	BSD

Arquero, developed by the UW Interactive Data Lab, is a columnar JavaScript data transformation library. It can process over 1 million rows in the browser and offers pivot/reshape operations¹¹.

AG Grid Enterprise offers the most comprehensive pivot features with server-side row models. React, Angular, and Vue integrations are available.

AI Agents and Pivot Tables

With the proliferation of AI agents, the role of pivot tables is changing: no longer just a report for humans to read, but a method for artificial intelligence to break data into digestible chunks.

Creating Pivots with Natural Language

Excel Copilot Agent Mode and Google Gemini can create pivot tables from natural language commands. Claude’s Code Interpreter feature (September 2025) can upload CSV/TSV files to produce pivot analysis, visualizations, and Excel output using Python¹².

AI-Assisted Pivot Recommendation

Microsoft Research has developed a system that uses BERT embeddings to understand attribute significance in pivot tables and recommend meaningful pivot configurations to users¹³. In this approach, the agent evaluates not just numbers but also the semantic meaning of column names. For example, in the command “Pivot return rates during discount periods,” it logically matches which column represents discounts and which represents returns.

Token Optimization: Pre-Pivot Strategy

Sending pre-pivoted summaries instead of raw data to AI agents reduces token costs by 60-80%¹⁴:

import pandas as pd
import json

# Raw data: 10,000 rows of e-commerce data
df = pd.read_csv('sales.csv')

# Pre-pivot summary: For AI agent
pivot = df.pivot_table(
    index='Category',
    values=['Sales', 'Returns'],
    aggfunc={'Sales': 'sum', 'Returns': 'mean'}
).reset_index()

# Add flags: Points for agent to focus on
pivot['Status'] = pivot.apply(
    lambda x: 'Critical' if x['Returns'] > 3 else 'Stable', axis=1
)

# Send to agent in JSON format
context = pivot.to_json(orient='records', force_ascii=False)

This approach provides two critical advantages:

Token savings: 10-20 row pivot summary instead of 10,000 row raw log
Improved accuracy: Flag columns (Status, Trend) answer the agent’s “where should I look?” question and reduce hallucination risk

Browser-Based Agents

Browser-based AI agents navigating an e-commerce panel (Shopify, WooCommerce) can instantly transform complex on-screen tables into pivot structures and perform anomaly detection: generating alerts like “Sales are increasing while profit is declining.”

E-Commerce Pivot Analysis Practices

Common Pivot Patterns

Pattern	Row (Index)	Column	Value	Purpose
Category performance	Product category	Month/Quarter	Sales total	Seasonal trends
Cohort analysis	Acquisition month	Subsequent months	Retention rate	Customer loyalty
Channel x Region	Marketing channel	Geographic region	Revenue	Channel effectiveness
Product matrix	SKU	Metrics (revenue, returns, margin)	Values	Product health
Customer LTV	Customer segment	Period	Lifetime value	Segment comparison

Pivot in the Modern Data Stack

In the modern data stack architecture (dbt, Looker, BigQuery), pivot operations typically occur in the transformation layer¹⁵:

Data ingestion: Raw e-commerce data brought to warehouse via Fivetran/Stitch
Transformation (dbt): dbt_utils.pivot() macro for SQL pivot operations, dynamic column creation without hardcoding
BI layer: Looker/Tableau consumes dbt-transformed data
Reverse ETL: Census/Hightouch pushes pivoted results back to CRM and marketing tools

ShopifyQL and Pivot

ShopifyQL, introduced by Shopify in June 2022, significantly simplifies querying e-commerce data compared to standard SQL¹⁶. Period comparison with COMPARE TO previous_year (an operation requiring self-joins in SQL), row/column totals with WITH TOTALS and GROUP_TOTALS, and percentage change calculation with WITH PERCENT_CHANGE can all be done in a single line:

FROM sales
  SHOW net_sales
  GROUP BY product_type
  SINCE -12m
  COMPARE TO previous_year
  WITH TOTALS, PERCENT_CHANGE
  VISUALIZE

However, ShopifyQL does not yet support true pivot/crosstab operations (transforming rows into columns). Subquery and CTE support is also absent. The Sidekick AI integration introduced in December 2025 enables generating ShopifyQL queries through natural language, and Sidekick Pulse provides proactive anomaly detection¹⁷.

For complex pivot scenarios, the recommended approach is: extract a GROUP BY summary via ShopifyQL, transfer to BigQuery via Fivetran or Airbyte, and apply true pivot operations there using DuckDB PIVOT or dbt_utils.pivot(). Analyzing BigQuery data directly as a Sheets pivot table via Google Connected Sheets also provides an effective alternative.

Web Scraping and Competitive Intelligence

In e-commerce, analyzing competitor pricing and market data through web scraping combined with pivot tables is a common practice.

Tool Landscape (2025-2026)

Tool	Type	Use Case	Status
Scrapy 2.12+	Framework	Large-scale structured crawling	Active
Playwright 1.49+	Browser automation	JavaScript-heavy sites, SPAs	Active
ScrapeGraphAI	AI-powered	Semantic extraction, low maintenance	New
Crawl4AI	Open source	LLM-ready output, free	New

Anti-Bot Challenges

As of 2025-2026, anti-bot protections have hardened significantly. Cloudflare Turnstile uses TLS/HTTP2 fingerprinting and JavaScript challenges; DataDome performs ML-based real-time bot detection. Puppeteer-stealth was discontinued in February 2025; active alternatives are Nodriver, SeleniumBase UC Mode, and Camoufox¹⁸.

ScrapeGraphAI’s AI approach requires 70% less maintenance than CSS selector-based scrapers. LLMs can understand page semantics regardless of HTML structure changes.

Scraping to Pivot Pipeline

from playwright.async_api import async_playwright
import duckdb

# Store data in DuckDB
con = duckdb.connect("competitor_prices.duckdb")

# Competitor price comparison: PIVOT
con.execute("""
    PIVOT price_history
    ON competitor
    USING AVG(price) AS avg_price, MIN(price) AS min_price
    GROUP BY product_sku
    ORDER BY product_sku
""")

From a legal perspective: respecting robots.txt, maintaining minimum 1-2 second delays between requests, and checking ToS are recommended. Pricing data is generally not considered personal data, though KVKK applies when scraping involves personal data.

WooCommerce Data Analytics

WooCommerce REST API v3 enables programmatic extraction of order, product, customer, and coupon data. Unlike Shopify’s 40 API requests per minute limit, WooCommerce offers unlimited access through direct database access¹⁹. For order filtering operations, see WooCommerce REST API - Filtering Orders.

API to Pivot

from woocommerce import API
import pandas as pd

wcapi = API(
    url="https://yourstore.com",
    consumer_key="ck_xxx",
    consumer_secret="cs_xxx",
    version="wc/v3"
)

# Fetch all orders with pagination
all_orders = []
page = 1
while True:
    response = wcapi.get("orders", params={"per_page": 100, "page": page})
    orders = response.json()
    if not orders:
        break
    all_orders.extend(orders)
    page += 1

df = pd.json_normalize(all_orders, sep='_')

RFM Analysis

import numpy as np

snapshot = df["order_date"].max() + pd.Timedelta(days=1)

rfm = df.groupby("customer_id").agg(
    recency=("order_date", lambda x: (snapshot - x.max()).days),
    frequency=("order_id", "nunique"),
    monetary=("line_total", "sum")
).reset_index()

# Score 1-5
for col in ["recency", "frequency", "monetary"]:
    labels = range(5, 0, -1) if col == "recency" else range(1, 6)
    rfm[f"{col}_score"] = pd.qcut(
        rfm[col], q=5, labels=labels, duplicates="drop"
    ).astype(int)

# Segment pivot
rfm_pivot = rfm.pivot_table(
    values="monetary",
    index="segment",
    aggfunc=["count", "mean", "sum"]
).round(2)

WooCommerce MCP Integration

WooCommerce 10.3 (2025) introduced the first official MCP (Model Context Protocol) integration. This enables AI agents to directly search, add, and update products.

BigQuery Integration Paths

Tool	Method	Cost
Airbyte	Open-source ETL	Free (self-hosted)
n8n	Workflow automation	Free (self-hosted)
Coupler.io	No-code scheduler	From $49/mo
Direct DB	Server-side tracking -> BigQuery	Infrastructure cost only

BigQuery E-commerce Pivot Operations

BigQuery’s native PIVOT operator enables powerful analysis on GA4 e-commerce data²⁰. If you are familiar with the data warehouse concept, BigQuery can be thought of as its cloud-native implementation.

GA4 Data PIVOT

SELECT *
FROM (
    SELECT
        items.item_category AS category,
        event_name,
        items.price * items.quantity AS revenue
    FROM `project.analytics_XXXXXX.events_*`,
    UNNEST(items) AS items
    WHERE _TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
    AND event_name IN ('purchase', 'add_to_cart', 'view_item')
)
PIVOT (
    SUM(revenue) AS total
    FOR event_name IN ('purchase', 'add_to_cart', 'view_item')
);

BigQuery ML

BigQuery ML enables model creation directly via SQL for e-commerce predictions:

ARIMA_PLUS: Daily revenue forecasts, Turkish holiday calendar support (holiday_region = 'TR'), up to 100 million time series simultaneously
K-means: Customer segmentation from RFM scores, automatic scaling with standardize_features = TRUE

Cost Optimization

Technique	Savings	Implementation
Partitioning	Up to 90% scan reduction	`PARTITION BY DATE(event_timestamp)`
Clustering	30-60% additional reduction	`CLUSTER BY event_name, user_id`
Materialized Views	Auto-refreshed pre-aggregations	`CREATE MATERIALIZED VIEW`
Column selection	Variable	Never use `SELECT *`

BigQuery on-demand pricing: first 1 TB/month free, then $6.25/TB. A typical e-commerce store (500K orders/year) with monthly query volume of 500 GB stays within the free tier.

BI Engine automatically caches frequently accessed data, providing sub-second query response and 4-10x performance improvement.

Privacy-Focused Analytics with Local LLMs

Sending customer data (names, emails, addresses, purchase history) to cloud LLM APIs constitutes cross-border data transfer under KVKK. Local LLMs solve this by design²¹.

Ollama Setup

ollama pull qwen2.5-coder:14b    # Code generation + data analysis
ollama pull mistral-nemo:12b      # Multilingual, general analysis
ollama pull phi4:14b              # Strong reasoning

VRAM Requirements (Q4_K_M Quantization)

Model	Parameters	VRAM	Best For
Qwen2.5-Coder 7B	7B	6-7 GB	Code generation, SQL writing
Qwen2.5-Coder 14B	14B	10-12 GB	Complex analytics code
Mistral Nemo 12B	12B	8-10 GB	Multilingual, general analysis
Phi-4 14B	14B	10-12 GB	Reasoning, math
DeepSeek-R1 14B	14B	10-12 GB	Chain-of-thought reasoning

Hardware tiers: 8 GB VRAM (RTX 4060) for 7B models, 12-16 GB (RTX 4070 Ti, M2 Pro/Max) for 12-14B models, 24 GB (RTX 4090, M3 Max) for 32B models.

Pivot Summary with Local LLM

import ollama
import json

pivot_summary = """
Monthly Revenue (Category):
| Category    | Jan    | Feb    | Mar    |
|-------------|--------|--------|--------|
| Electronics | 45,200 | 38,100 | 52,300 |
| Clothing    | 22,100 | 19,800 | 28,500 |

RFM Segments:
| Segment     | Count | Avg Spend | Avg Recency |
|-------------|-------|-----------|-------------|
| Champions   | 342   | $1,250    | 8 days      |
| At Risk     | 891   | $480      | 67 days     |
"""

response = ollama.chat(
    model="qwen2.5-coder:14b",
    messages=[{
        "role": "user",
        "content": f"Analyze this e-commerce pivot data. "
                   f"Return JSON only.\n{pivot_summary}"
    }],
    format="json"
)

KVKK Compliance

KVKK (Turkey’s Personal Data Protection Law), with its 2025 update, raised administrative fines to a range of 68,083 TL to 13.6 million TL (43.93% increase). The “Generative AI and Personal Data Protection” guideline published in December 2025 mandates “privacy by design” principles for LLM developers²².

When using local LLMs: embedding generation, vector search, and LLM inference all occur in the local environment. No data leaves the premises.

Hybrid Architecture

Layer 1: Data Extraction (Local)
  WooCommerce API -> Python -> Polars -> DuckDB
  Cost: $0 (compute only)

Layer 2: Pivot + Summarization (Local LLM)
  Raw data -> pivot tables -> Ollama (Qwen2.5-Coder 14B)
  Cost: ~$0 (electricity, ~$0.02/hr GPU)
  KVKK compliant: all PII stays on-premises

Layer 3: Strategic Insights (Cloud LLM, optional)
  Pre-pivoted, anonymized summaries -> Claude / GPT-4o
  Cost: ~$0.02-0.04 per query

This approach delivers 85-95% cost reduction while maintaining KVKK compliance.

LLM Limitations with Tabular Data

As AI agents increasingly leverage pivot tables, understanding these models’ limitations with tabular data becomes critical.

Structural Mismatch

Tabular data has a two-dimensional, relational structure, while LLMs are trained with a one-dimensional, autoregressive objective. When a table is serialized to text, row and column boundaries blur, the model becomes sensitive to row/column shuffles that should not matter, and context window limits are hit with large datasets²³.

Numerical Precision Failures

Numbers are processed as character sequences during tokenization, losing magnitude and numerical order information. Research across six major LLMs found factual hallucination rates between 59% and 82%, with numerical recall reliability near zero²⁴.

Table Understanding Benchmarks

Benchmark	Year	Scope	Finding
SUC (Microsoft)	WSDM 2024	7 basic table tasks	Best accuracy: 65.43%
TableBench	AAAI 2025	886 instances, 18 subcategories	Even GPT-4 significantly behind humans
RealHiTBench	ACL 2025	708 hierarchical tables, 3,752 QA	Hierarchical headers challenged all models
MMQA	ICLR 2025	Multi-table, multi-hop reasoning	All models behind human performance
MMTU	2025	30,000+ questions, 25 tasks	Reasoning models +10pp advantage
TReB	2025	26 sub-tasks, 3 inference modes	ICoT (interleaved chain-of-thought) best

Table transposition accuracy is approximately 50% (near random chance), while header identification is 94-97% accurate. This means models can understand table content but fail at structural transformation²⁵.

Serialization Format Comparison

Based on tests across 11 different formats²⁶:

Format	Accuracy	Token Usage
Markdown KV	60.7%	2.7x (baseline)
XML	56.0%	2.3x
YAML	54.7%	1.9x
Natural language	49.6%	3.0x
CSV	44.3%	1.0x (lowest)

Markdown KV provides the highest accuracy but consumes 2.7x more tokens than CSV. When token budget is constrained, CSV is recommended; when accuracy is the priority, Markdown KV is preferred.

Solution Strategies

Code generation approach: Have the LLM write pandas/SQL code rather than analyze data directly. Delegates computation to deterministic tools
Schema + sample rows: Send column names, data types, and first 5 rows instead of the entire table
ICoT (Interleaved Chain-of-Thought): Alternate between text reasoning and code execution. Achieved the best results in the TReB benchmark
Code Interpreter: Claude and GPT’s code execution tools completely eliminate numerical hallucination
SpreadsheetLLM (Microsoft, EMNLP 2024): A compression approach achieving 96% reduction in token usage and 25.6% improvement in table detection²⁷

Cost and Token Optimization

Raw Data vs. Pivot Summary Comparison

Approach	Rows	Tokens (Approx.)	Cost (GPT-4o)
Raw order data (10K)	10,000	~250,000	~$0.625
Category+Month pre-grouping	~200	~10,000	~$0.025
Summary statistics only	~20	~2,000	~$0.005

Pre-pivot enables 95-99% token savings. CSV format consumes 40-50% fewer tokens than JSON.

API Pricing (February 2026)

Model	Input/1M Tokens	Output/1M Tokens	Batch
GPT-4o	$2.50	$10.00	50% off
GPT-4o mini	$0.15	$0.60	50% off
Claude Sonnet 4.5	$3.00	$15.00	50% off
Claude Haiku 4.5	$1.00	$5.00	50% off
Gemini 2.0 Flash	$0.10	$0.40	-
Local (Ollama)	~$0	~$0	-

Anthropic offers a prompt caching mechanism that reduces input token costs by 90% for repeated queries. OpenAI Batch API delivers results within 24 hours at a 50% discount.

Decision Matrix

Scenario	Recommended Approach
Data contains PII	Local LLM only (KVKK requirement)
Data < 100K rows, simple pivot	pandas/Polars, no LLM needed
Data > 1M rows, complex pivot	BigQuery ($6.25/TB) or DuckDB (free)
Natural language insights from pivots	Pre-pivot locally, send summary to cloud LLM
Batch processing (daily reports)	Batch API (50% off), overnight processing

Practical Guide: Which Tool for Which Scenario?

Scenario	Recommended Tool	Why
Quick exploration, small data	Excel / Google Sheets	Drag-and-drop, visual
Natural language pivot	Excel Copilot / Gemini	”Sales pivot by region”
Analysis on BigQuery	Google Connected Sheets	Billions of rows, Sheets interface
Programmatic analysis (medium data)	pandas pivot_table()	Widespread, well-documented
Big data (million+ rows)	Polars	5-10x faster, low memory
SQL-based pivot	DuckDB PIVOT	Native SQL syntax
Statistical analysis	R tidyverse / data.table	pivot_wider, ggplot2
Browser pivot	AG Grid / Flexmonster	Large data, enterprise
AI agent preparation	pandas/Polars + JSON	Pre-pivot, flag addition
Data engineering	dbt_utils.pivot()	Repeatable, version-controlled
Competitor price tracking	Scrapy/Playwright + DuckDB	Scraping -> pivot pipeline
WooCommerce analytics	wc-api-python + Polars	REST API, unlimited access
BigQuery pivot	PIVOT operator + BI Engine	GA4, billions of rows
Privacy-focused analytics	Ollama + Qwen2.5-Coder	KVKK compliant, free

Conclusion

Pivot table maintains its position as one of the fundamental tools of data analytics. However, as of 2026, its role has evolved significantly: no longer just a report table for humans to read, but an intermediate layer where AI agents break data into digestible chunks. Pivot tables can now be created through natural language via Excel Copilot and Google Gemini, millions of rows can be pivoted within seconds with Polars, and pre-pivot summaries can reduce AI agent token costs by 60-80%.

In e-commerce analytics, pivot table is the most direct way to answer critical business questions such as cohort analysis, category performance, and customer segmentation. In the modern data stack, pivot operations are defined once in the dbt transformation layer and consumed by all downstream consumers.

However, LLMs have structural limitations when working with tabular data: transposition accuracy at 50%, numerical hallucination rate at 59-82%. Therefore, rather than having LLMs perform calculations, having them generate code, using deterministic tools (pandas, DuckDB, BigQuery) for computation, and positioning LLMs solely for content generation and natural language insights remains the most effective strategy. Local LLMs, with their KVKK compliance and zero marginal cost advantages, offer an increasingly powerful alternative for e-commerce data analysis.

Pivot table usage is also covered in detail across different platforms: Looker Studio Pivot Table, Google Analytics Pivot Table, and Facebook Analytics Pivot Table.

Member Zone