Listing WordPress Content with R

Fetch and analyze WordPress content via REST API using R. Modern data processing with httr2 and tidyverse.

Ceyhun Enki Aksan
Ceyhun Enki Aksan Entrepreneur, Maker

TL;DR

Fetch content from R using WordPress REST API + Application Passwords. Make API calls with httr2, manipulate data with tidyverse. List draft, publish, pending content and analyze them.

ComponentTool/Method
API Clienthttr2::request()
AuthenticationApplication Passwords
Data Processingtidyverse (tibble, dplyr)
Endpoint/wp-json/wp/v2/posts
Max Per Page100 (pagination required)

WordPress sites often accumulate content in various states due to multiple authors, AI-assisted content creation, and editorial workflows. Drafts, scheduled posts, orphan pages… Regular tracking is critical for content strategy health.

In this post, I’ll explain how to list and perform basic analysis on your WordPress content using R.

tip

For advanced analysis and an interactive dashboard, check out the R Shiny Content Intelligence Dashboard post.

Requirements

# Package installation
install.packages(c("httr2", "tidyverse", "jsonlite"))

library(httr2)
library(jsonlite)
library(tidyverse)

WordPress REST API Connection

REST API is enabled by default in WordPress 4.7+. We’ll use Application Passwords for authentication (WordPress 5.6+).

Creating an Application Password

  1. WordPress Admin → Users → Profile
  2. Navigate to “Application Passwords” section
  3. Enter a name (e.g., “R Script”)
  4. Click “Add New Application Password”
  5. Copy the generated password (shown only once)
warning

Store Application Passwords securely. I recommend keeping them as environment variables in your .Renviron file.

Configuration

# Add to .Renviron file:
# WP_USER=username
# WP_APP_PASSWORD=xxxx xxxx xxxx xxxx xxxx xxxx
# WP_SITE_URL=https://yoursite.com

# Usage in R
wp_config <- list(
  base_url = paste0(Sys.getenv("WP_SITE_URL"), "/wp-json/wp/v2/"),
  user = Sys.getenv("WP_USER"),
  app_password = Sys.getenv("WP_APP_PASSWORD")
)

Fetching Content

#' Fetch WordPress posts
#' @param status Post status: publish, draft, pending, private, future
#' @param per_page Items per page (max 100)
#' @param page Page number
get_wp_posts <- function(status = "publish", per_page = 100, page = 1) {

  response <- request(wp_config$base_url) |>
    req_url_path_append("posts") |>
    req_url_query(
      status = status,
      per_page = per_page,
      page = page,
      `_fields` = "id,title,excerpt,date,modified,status,categories,tags"
    ) |>
    req_auth_basic(wp_config$user, wp_config$app_password) |>
    req_perform()

  # Convert response to tibble
  content <- resp_body_json(response)

  tibble(
    id = map_int(content, "id"),
    title = map_chr(content, ~.x$title$rendered),
    excerpt = map_chr(content, ~.x$excerpt$rendered),
    date = map_chr(content, "date"),
    modified = map_chr(content, "modified"),
    status = map_chr(content, "status"),
    categories = map(content, "categories"),
    tags = map(content, "tags")
  )
}

Listing Drafts

# Fetch drafts
drafts <- get_wp_posts(status = "draft")

# Display titles
drafts |>
  select(id, title, modified) |>
  arrange(desc(modified)) |>
  print(n = 20)

Output:

# A tibble: 15 × 3
      id title                              modified
   <int> <chr>                              <chr>
 1   892 R Shiny Dashboard Development      2025-01-10T14:30:00
 2   887 WordPress API Best Practices       2025-01-08T09:15:00
 3   845 Semantic SEO Strategies            2024-12-22T16:45:00
...

Fetching All Statuses

#' Fetch all content (with pagination)
get_all_posts <- function(status = "publish") {
  all_posts <- tibble()
  page <- 1
  total_pages <- Inf

  while (page <= total_pages) {
    response <- request(wp_config$base_url) |>
      req_url_path_append("posts") |>
      req_url_query(
        status = status,
        per_page = 100,
        page = page,
        `_fields` = "id,title,excerpt,date,modified,status,categories,tags"
      ) |>
      req_auth_basic(wp_config$user, wp_config$app_password) |>
      req_perform()

    if (page == 1) {
      total_pages <- as.integer(resp_header(response, "X-WP-TotalPages"))
    }

    content <- resp_body_json(response)

    posts <- tibble(
      id = map_int(content, "id"),
      title = map_chr(content, ~.x$title$rendered),
      excerpt = map_chr(content, ~.x$excerpt$rendered),
      date = map_chr(content, "date"),
      modified = map_chr(content, "modified"),
      status = map_chr(content, "status"),
      categories = map(content, "categories"),
      tags = map(content, "tags")
    )

    all_posts <- bind_rows(all_posts, posts)
    page <- page + 1
    Sys.sleep(0.5)
  }

  all_posts
}

# Combine all statuses
all_content <- bind_rows(
  get_all_posts("publish") |> mutate(status = "publish"),
  get_all_posts("draft") |> mutate(status = "draft"),
  get_all_posts("pending") |> mutate(status = "pending")
)

# Summary
all_content |>
  count(status) |>
  arrange(desc(n))

Basic Analysis

Category Distribution

# Fetch categories
get_categories <- function() {
  response <- request(wp_config$base_url) |>
    req_url_path_append("categories") |>
    req_url_query(per_page = 100) |>
    req_perform()

  content <- resp_body_json(response)

  tibble(
    id = map_int(content, "id"),
    name = map_chr(content, "name"),
    count = map_int(content, "count")
  )
}

categories <- get_categories()

# Top categories by content count
categories |>
  arrange(desc(count)) |>
  head(10)

Content Age Analysis

all_content |>
  mutate(
    date = as.Date(date),
    age_days = as.numeric(Sys.Date() - date),
    age_group = case_when(
      age_days < 30 ~ "Last 30 days",
      age_days < 90 ~ "1-3 months",
      age_days < 365 ~ "3-12 months",
      TRUE ~ "1 year+"
    )
  ) |>
  count(status, age_group) |>
  pivot_wider(names_from = status, values_from = n, values_fill = 0)

Finding Stale Drafts

# Drafts older than 90 days
stale_drafts <- drafts |>
  mutate(
    modified_date = as.Date(modified),
    days_stale = as.numeric(Sys.Date() - modified_date)
  ) |>
  filter(days_stale > 90) |>
  arrange(desc(days_stale)) |>
  select(id, title, days_stale)

cat("Draft count updated 90+ days ago:", nrow(stale_drafts), "\n")
print(stale_drafts)

Next Steps

With this basic setup, you can manage your WordPress content from R. For more advanced analysis:

  • Semantic analysis: Content similarity with embeddings
  • Graph visualization: Internal link relationships
  • Gap analysis: Missing content detection
  • SEO alignment: Target keyword coverage

Check out the R Shiny Content Intelligence Dashboard post that combines these topics in an interactive dashboard.

Legacy Method: XML-RPC (Deprecated)

warning

XML-RPC is not recommended due to security risks. Many hosting providers disable it by default. Use REST API instead.

The old RWordPress package used XML-RPC:

# ❌ OLD METHOD - Do not use
# library("RWordPress")
# options(WordPressLogin = c(user = "pass"),
#         WordPressURL = "http://site.com/xmlrpc.php")
# getPosts()

Frequently Asked Questions (FAQ)

How to use WordPress REST API from R?

Use httr2 package with req_auth_basic() function for Application Passwords authentication. Send GET request to wp-json/wp/v2/posts endpoint.

request(base_url) |>
  req_url_path_append("posts") |>
  req_auth_basic(user, app_password) |>
  req_perform()

What is WordPress Application Password?

A secure authentication method for REST API introduced in WordPress 5.6+. Created from Admin → Users → Profile → Application Passwords section. Store the one-time displayed password in your .Renviron file.

How to list WordPress drafts with R?

Use status = "draft" parameter:

drafts <- get_wp_posts(status = "draft")

For more than 100 items, get_all_posts() function handles pagination.

XML-RPC is disabled by default by many hosting providers due to security risks. REST API is a more secure, faster, and flexible alternative.

How to perform WordPress content age analysis?

Calculate date difference using mutate() and case_when() in R:

all_content |>
  mutate(
    age_days = as.numeric(Sys.Date() - as.Date(date)),
    age_group = case_when(
      age_days < 30 ~ "Last 30 days",
      age_days < 90 ~ "1-3 months",
      TRUE ~ "3+ months"
    )
  )

Summary: Key Takeaways

  1. httr2 + tidyverse combination provides modern WordPress API integration
  2. Application Passwords are required for secure authentication
  3. status parameter filters by draft/publish/pending/future
  4. Pagination allows fetching more than 100 items (X-WP-TotalPages header)
  5. XML-RPC is deprecated - prefer REST API for security reasons

*[API]: Application Programming Interface *[REST]: Representational State Transfer