TL;DR
Fetch content from R using WordPress REST API + Application Passwords. Make API calls with httr2, manipulate data with tidyverse. List draft, publish, pending content and analyze them.
| Component | Tool/Method |
|---|---|
| API Client | httr2::request() |
| Authentication | Application Passwords |
| Data Processing | tidyverse (tibble, dplyr) |
| Endpoint | /wp-json/wp/v2/posts |
| Max Per Page | 100 (pagination required) |
WordPress sites often accumulate content in various states due to multiple authors, AI-assisted content creation, and editorial workflows. Drafts, scheduled posts, orphan pages… Regular tracking is critical for content strategy health.
In this post, I’ll explain how to list and perform basic analysis on your WordPress content using R.
For advanced analysis and an interactive dashboard, check out the R Shiny Content Intelligence Dashboard post.
Requirements
# Package installation
install.packages(c("httr2", "tidyverse", "jsonlite"))
library(httr2)
library(jsonlite)
library(tidyverse)
WordPress REST API Connection
REST API is enabled by default in WordPress 4.7+. We’ll use Application Passwords for authentication (WordPress 5.6+).
Creating an Application Password
- WordPress Admin → Users → Profile
- Navigate to “Application Passwords” section
- Enter a name (e.g., “R Script”)
- Click “Add New Application Password”
- Copy the generated password (shown only once)
Store Application Passwords securely. I recommend keeping them as environment variables in your .Renviron file.
Configuration
# Add to .Renviron file:
# WP_USER=username
# WP_APP_PASSWORD=xxxx xxxx xxxx xxxx xxxx xxxx
# WP_SITE_URL=https://yoursite.com
# Usage in R
wp_config <- list(
base_url = paste0(Sys.getenv("WP_SITE_URL"), "/wp-json/wp/v2/"),
user = Sys.getenv("WP_USER"),
app_password = Sys.getenv("WP_APP_PASSWORD")
)
Fetching Content
#' Fetch WordPress posts
#' @param status Post status: publish, draft, pending, private, future
#' @param per_page Items per page (max 100)
#' @param page Page number
get_wp_posts <- function(status = "publish", per_page = 100, page = 1) {
response <- request(wp_config$base_url) |>
req_url_path_append("posts") |>
req_url_query(
status = status,
per_page = per_page,
page = page,
`_fields` = "id,title,excerpt,date,modified,status,categories,tags"
) |>
req_auth_basic(wp_config$user, wp_config$app_password) |>
req_perform()
# Convert response to tibble
content <- resp_body_json(response)
tibble(
id = map_int(content, "id"),
title = map_chr(content, ~.x$title$rendered),
excerpt = map_chr(content, ~.x$excerpt$rendered),
date = map_chr(content, "date"),
modified = map_chr(content, "modified"),
status = map_chr(content, "status"),
categories = map(content, "categories"),
tags = map(content, "tags")
)
}
Listing Drafts
# Fetch drafts
drafts <- get_wp_posts(status = "draft")
# Display titles
drafts |>
select(id, title, modified) |>
arrange(desc(modified)) |>
print(n = 20)
Output:
# A tibble: 15 × 3
id title modified
<int> <chr> <chr>
1 892 R Shiny Dashboard Development 2025-01-10T14:30:00
2 887 WordPress API Best Practices 2025-01-08T09:15:00
3 845 Semantic SEO Strategies 2024-12-22T16:45:00
...
Fetching All Statuses
#' Fetch all content (with pagination)
get_all_posts <- function(status = "publish") {
all_posts <- tibble()
page <- 1
total_pages <- Inf
while (page <= total_pages) {
response <- request(wp_config$base_url) |>
req_url_path_append("posts") |>
req_url_query(
status = status,
per_page = 100,
page = page,
`_fields` = "id,title,excerpt,date,modified,status,categories,tags"
) |>
req_auth_basic(wp_config$user, wp_config$app_password) |>
req_perform()
if (page == 1) {
total_pages <- as.integer(resp_header(response, "X-WP-TotalPages"))
}
content <- resp_body_json(response)
posts <- tibble(
id = map_int(content, "id"),
title = map_chr(content, ~.x$title$rendered),
excerpt = map_chr(content, ~.x$excerpt$rendered),
date = map_chr(content, "date"),
modified = map_chr(content, "modified"),
status = map_chr(content, "status"),
categories = map(content, "categories"),
tags = map(content, "tags")
)
all_posts <- bind_rows(all_posts, posts)
page <- page + 1
Sys.sleep(0.5)
}
all_posts
}
# Combine all statuses
all_content <- bind_rows(
get_all_posts("publish") |> mutate(status = "publish"),
get_all_posts("draft") |> mutate(status = "draft"),
get_all_posts("pending") |> mutate(status = "pending")
)
# Summary
all_content |>
count(status) |>
arrange(desc(n))
Basic Analysis
Category Distribution
# Fetch categories
get_categories <- function() {
response <- request(wp_config$base_url) |>
req_url_path_append("categories") |>
req_url_query(per_page = 100) |>
req_perform()
content <- resp_body_json(response)
tibble(
id = map_int(content, "id"),
name = map_chr(content, "name"),
count = map_int(content, "count")
)
}
categories <- get_categories()
# Top categories by content count
categories |>
arrange(desc(count)) |>
head(10)
Content Age Analysis
all_content |>
mutate(
date = as.Date(date),
age_days = as.numeric(Sys.Date() - date),
age_group = case_when(
age_days < 30 ~ "Last 30 days",
age_days < 90 ~ "1-3 months",
age_days < 365 ~ "3-12 months",
TRUE ~ "1 year+"
)
) |>
count(status, age_group) |>
pivot_wider(names_from = status, values_from = n, values_fill = 0)
Finding Stale Drafts
# Drafts older than 90 days
stale_drafts <- drafts |>
mutate(
modified_date = as.Date(modified),
days_stale = as.numeric(Sys.Date() - modified_date)
) |>
filter(days_stale > 90) |>
arrange(desc(days_stale)) |>
select(id, title, days_stale)
cat("Draft count updated 90+ days ago:", nrow(stale_drafts), "\n")
print(stale_drafts)
Next Steps
With this basic setup, you can manage your WordPress content from R. For more advanced analysis:
- Semantic analysis: Content similarity with embeddings
- Graph visualization: Internal link relationships
- Gap analysis: Missing content detection
- SEO alignment: Target keyword coverage
Check out the R Shiny Content Intelligence Dashboard post that combines these topics in an interactive dashboard.
Legacy Method: XML-RPC (Deprecated)
XML-RPC is not recommended due to security risks. Many hosting providers disable it by default. Use REST API instead.
The old RWordPress package used XML-RPC:
# ❌ OLD METHOD - Do not use
# library("RWordPress")
# options(WordPressLogin = c(user = "pass"),
# WordPressURL = "http://site.com/xmlrpc.php")
# getPosts()
Frequently Asked Questions (FAQ)
How to use WordPress REST API from R?
Use httr2 package with req_auth_basic() function for Application Passwords authentication. Send GET request to wp-json/wp/v2/posts endpoint.
request(base_url) |>
req_url_path_append("posts") |>
req_auth_basic(user, app_password) |>
req_perform()
What is WordPress Application Password?
A secure authentication method for REST API introduced in WordPress 5.6+. Created from Admin → Users → Profile → Application Passwords section. Store the one-time displayed password in your .Renviron file.
How to list WordPress drafts with R?
Use status = "draft" parameter:
drafts <- get_wp_posts(status = "draft")
For more than 100 items, get_all_posts() function handles pagination.
Why is XML-RPC not recommended?
XML-RPC is disabled by default by many hosting providers due to security risks. REST API is a more secure, faster, and flexible alternative.
How to perform WordPress content age analysis?
Calculate date difference using mutate() and case_when() in R:
all_content |>
mutate(
age_days = as.numeric(Sys.Date() - as.Date(date)),
age_group = case_when(
age_days < 30 ~ "Last 30 days",
age_days < 90 ~ "1-3 months",
TRUE ~ "3+ months"
)
)
Summary: Key Takeaways
- httr2 + tidyverse combination provides modern WordPress API integration
- Application Passwords are required for secure authentication
- status parameter filters by draft/publish/pending/future
- Pagination allows fetching more than 100 items (
X-WP-TotalPagesheader) - XML-RPC is deprecated - prefer REST API for security reasons
*[API]: Application Programming Interface *[REST]: Representational State Transfer