Listing WordPress Content with R

Q: How to use WordPress REST API from R?

Use httr2 package with req_auth_basic() function for Application Passwords authentication. Send GET request to wp-json/wp/v2/posts endpoint.

Q: How to list WordPress drafts with R?

Use get_wp_posts(status = 'draft') function to fetch content in draft status from REST API. Use get_all_posts() function for pagination.

Q: How to perform WordPress content age analysis?

Calculate date difference using mutate() and case_when() in R. Create age_days variable and group by 30, 90, 365 day thresholds.

Fetch and analyze WordPress content via REST API using R. Modern data processing with httr2 and tidyverse.

Ceyhun Enki Aksan Entrepreneur, Maker

Jan 25, 2025

TL;DR

Fetch content from R using WordPress REST API + Application Passwords. Make API calls with httr2, manipulate data with tidyverse. List draft, publish, pending content and analyze them.

Component	Tool/Method
API Client	`httr2::request()`
Authentication	Application Passwords
Data Processing	`tidyverse` (tibble, dplyr)
Endpoint	`/wp-json/wp/v2/posts`
Max Per Page	100 (pagination required)

WordPress sites often accumulate content in various states due to multiple authors, AI-assisted content creation, and editorial workflows. Drafts, scheduled posts, orphan pages… Regular tracking is critical for content strategy health.

In this post, I’ll explain how to list and perform basic analysis on your WordPress content using R.

tip

For advanced analysis and an interactive dashboard, check out the R Shiny Content Intelligence Dashboard post.

Requirements

# Package installation
install.packages(c("httr2", "tidyverse", "jsonlite"))

library(httr2)
library(jsonlite)
library(tidyverse)

WordPress REST API Connection

REST API is enabled by default in WordPress 4.7+. We’ll use Application Passwords for authentication (WordPress 5.6+).

Creating an Application Password

WordPress Admin → Users → Profile
Navigate to “Application Passwords” section
Enter a name (e.g., “R Script”)
Click “Add New Application Password”
Copy the generated password (shown only once)

warning

Store Application Passwords securely. I recommend keeping them as environment variables in your .Renviron file.

Configuration

# Add to .Renviron file:
# WP_USER=username
# WP_APP_PASSWORD=xxxx xxxx xxxx xxxx xxxx xxxx
# WP_SITE_URL=https://yoursite.com

# Usage in R
wp_config <- list(
  base_url = paste0(Sys.getenv("WP_SITE_URL"), "/wp-json/wp/v2/"),
  user = Sys.getenv("WP_USER"),
  app_password = Sys.getenv("WP_APP_PASSWORD")
)

Fetching Content

#' Fetch WordPress posts
#' @param status Post status: publish, draft, pending, private, future
#' @param per_page Items per page (max 100)
#' @param page Page number
get_wp_posts <- function(status = "publish", per_page = 100, page = 1) {

  response <- request(wp_config$base_url) |>
    req_url_path_append("posts") |>
    req_url_query(
      status = status,
      per_page = per_page,
      page = page,
      `_fields` = "id,title,excerpt,date,modified,status,categories,tags"
    ) |>
    req_auth_basic(wp_config$user, wp_config$app_password) |>
    req_perform()

  # Convert response to tibble
  content <- resp_body_json(response)

  tibble(
    id = map_int(content, "id"),
    title = map_chr(content, ~.x$title$rendered),
    excerpt = map_chr(content, ~.x$excerpt$rendered),
    date = map_chr(content, "date"),
    modified = map_chr(content, "modified"),
    status = map_chr(content, "status"),
    categories = map(content, "categories"),
    tags = map(content, "tags")
  )
}

Listing Drafts

# Fetch drafts
drafts <- get_wp_posts(status = "draft")

# Display titles
drafts |>
  select(id, title, modified) |>
  arrange(desc(modified)) |>
  print(n = 20)

Output:

# A tibble: 15 × 3
      id title                              modified
   <int> <chr>                              <chr>
 1   892 R Shiny Dashboard Development      2025-01-10T14:30:00
 2   887 WordPress API Best Practices       2025-01-08T09:15:00
 3   845 Semantic SEO Strategies            2024-12-22T16:45:00
...

Fetching All Statuses

#' Fetch all content (with pagination)
get_all_posts <- function(status = "publish") {
  all_posts <- tibble()
  page <- 1
  total_pages <- Inf

  while (page <= total_pages) {
    response <- request(wp_config$base_url) |>
      req_url_path_append("posts") |>
      req_url_query(
        status = status,
        per_page = 100,
        page = page,
        `_fields` = "id,title,excerpt,date,modified,status,categories,tags"
      ) |>
      req_auth_basic(wp_config$user, wp_config$app_password) |>
      req_perform()

    if (page == 1) {
      total_pages <- as.integer(resp_header(response, "X-WP-TotalPages"))
    }

    content <- resp_body_json(response)

    posts <- tibble(
      id = map_int(content, "id"),
      title = map_chr(content, ~.x$title$rendered),
      excerpt = map_chr(content, ~.x$excerpt$rendered),
      date = map_chr(content, "date"),
      modified = map_chr(content, "modified"),
      status = map_chr(content, "status"),
      categories = map(content, "categories"),
      tags = map(content, "tags")
    )

    all_posts <- bind_rows(all_posts, posts)
    page <- page + 1
    Sys.sleep(0.5)
  }

  all_posts
}

# Combine all statuses
all_content <- bind_rows(
  get_all_posts("publish") |> mutate(status = "publish"),
  get_all_posts("draft") |> mutate(status = "draft"),
  get_all_posts("pending") |> mutate(status = "pending")
)

# Summary
all_content |>
  count(status) |>
  arrange(desc(n))

Basic Analysis

Category Distribution

# Fetch categories
get_categories <- function() {
  response <- request(wp_config$base_url) |>
    req_url_path_append("categories") |>
    req_url_query(per_page = 100) |>
    req_perform()

  content <- resp_body_json(response)

  tibble(
    id = map_int(content, "id"),
    name = map_chr(content, "name"),
    count = map_int(content, "count")
  )
}

categories <- get_categories()

# Top categories by content count
categories |>
  arrange(desc(count)) |>
  head(10)

Content Age Analysis

all_content |>
  mutate(
    date = as.Date(date),
    age_days = as.numeric(Sys.Date() - date),
    age_group = case_when(
      age_days < 30 ~ "Last 30 days",
      age_days < 90 ~ "1-3 months",
      age_days < 365 ~ "3-12 months",
      TRUE ~ "1 year+"
    )
  ) |>
  count(status, age_group) |>
  pivot_wider(names_from = status, values_from = n, values_fill = 0)

Finding Stale Drafts

# Drafts older than 90 days
stale_drafts <- drafts |>
  mutate(
    modified_date = as.Date(modified),
    days_stale = as.numeric(Sys.Date() - modified_date)
  ) |>
  filter(days_stale > 90) |>
  arrange(desc(days_stale)) |>
  select(id, title, days_stale)

cat("Draft count updated 90+ days ago:", nrow(stale_drafts), "\n")
print(stale_drafts)

Next Steps

With this basic setup, you can manage your WordPress content from R. For more advanced analysis:

Semantic analysis: Content similarity with embeddings
Graph visualization: Internal link relationships
Gap analysis: Missing content detection
SEO alignment: Target keyword coverage

Check out the R Shiny Content Intelligence Dashboard post that combines these topics in an interactive dashboard.

Legacy Method: XML-RPC (Deprecated)

warning

XML-RPC is not recommended due to security risks. Many hosting providers disable it by default. Use REST API instead.

The old RWordPress package used XML-RPC:

# ❌ OLD METHOD - Do not use
# library("RWordPress")
# options(WordPressLogin = c(user = "pass"),
#         WordPressURL = "http://site.com/xmlrpc.php")
# getPosts()

Frequently Asked Questions (FAQ)

How to use WordPress REST API from R?

Use httr2 package with req_auth_basic() function for Application Passwords authentication. Send GET request to wp-json/wp/v2/posts endpoint.

request(base_url) |>
  req_url_path_append("posts") |>
  req_auth_basic(user, app_password) |>
  req_perform()

What is WordPress Application Password?

A secure authentication method for REST API introduced in WordPress 5.6+. Created from Admin → Users → Profile → Application Passwords section. Store the one-time displayed password in your .Renviron file.

How to list WordPress drafts with R?

Use status = "draft" parameter:

drafts <- get_wp_posts(status = "draft")

For more than 100 items, get_all_posts() function handles pagination.

Why is XML-RPC not recommended?

XML-RPC is disabled by default by many hosting providers due to security risks. REST API is a more secure, faster, and flexible alternative.

How to perform WordPress content age analysis?

Calculate date difference using mutate() and case_when() in R:

all_content |>
  mutate(
    age_days = as.numeric(Sys.Date() - as.Date(date)),
    age_group = case_when(
      age_days < 30 ~ "Last 30 days",
      age_days < 90 ~ "1-3 months",
      TRUE ~ "3+ months"
    )
  )

Summary: Key Takeaways

httr2 + tidyverse combination provides modern WordPress API integration
Application Passwords are required for secure authentication
status parameter filters by draft/publish/pending/future
Pagination allows fetching more than 100 items (X-WP-TotalPages header)
XML-RPC is deprecated - prefer REST API for security reasons

*[API]: Application Programming Interface *[REST]: Representational State Transfer

Member Zone