Extract Reddit Posts and Discussions via API

Subreddit posts, comment threads, user history, and cross-platform search. Structured Reddit data extraction. Structured JSON, cursor pagination, no Reddit app registration.

Five dedicated Reddit endpoints Full comment thread extraction with nesting User post and comment history Cross-subreddit search

Reddit Has the Best Unfiltered Opinion Data on the Internet

Reddit is where people say what they actually think. Product reviews, technology comparisons, hiring experiences, brand complaints — the discussions on Reddit are more honest than any survey or focus group. Google increasingly surfaces Reddit threads in search results because the content is genuinely useful.

But accessing Reddit data programmatically has gotten harder. Reddit's official API went paid in 2023, killing most third-party apps. The free tier (100 requests per minute for personal use) requires OAuth, a registered app, and comes with strict usage policies that ban commercial data collection.

Pushshift, the academic archive that most researchers relied on, was restricted to moderation tools. The Reddit data pipeline that powered research, product development, and market analysis went dark for most teams.

Five Endpoints for Reddit Data

Subreddit Posts

POST /api/reddit/posts

Extract posts from any subreddit. Returns titles, body text, scores, comment counts, URLs, author info, and timestamps. Supports sort options (hot, new, top, rising).

Parameters

Parameter Type Required Description
subreddit string Yes Subreddit name (without r/)
sort string No Sort order: hot, new, top, rising
count integer No Number of posts to return
cursor string No Pagination cursor from previous response

Response

{
  "posts": [
    {
      "id": "1abc234",
      "title": "Just migrated our data pipeline to event-driven architecture. AMA.",
      "body": "After 6 months of work, we moved from batch processing to...",
      "author": "data_engineer_jane",
      "score": 847,
      "upvote_ratio": 0.94,
      "num_comments": 234,
      "url": "https://reddit.com/r/dataengineering/comments/1abc234/",
      "created_utc": "2026-03-08T15:30:00Z",
      "subreddit": "dataengineering",
      "is_self": true,
      "link_url": null
    }
  ],
  "has_more": true,
  "cursor": "eyJhZnRlciI6InQzXzFhYmMyMzQifQ=="
}
Cost: 1 credit per request

Post Comments

POST /api/reddit/posts/comments

Extract the full comment thread for any post. Returns nested comment trees with author, score, timestamps, and reply chains.

Parameters

Parameter Type Required Description
post_url string Yes* Reddit post URL
post_id string Yes* Reddit post ID (alternative to post_url)
sort string No Comment sort: best, top, new, controversial

Response

{
  "comments": [
    {
      "id": "c5d6e7f",
      "author": "devops_mike",
      "body": "We did the same migration last year. Key lesson: start with...",
      "score": 234,
      "created_utc": "2026-03-08T16:15:00Z",
      "replies": [
        {
          "id": "g8h9i0j",
          "author": "data_engineer_jane",
          "body": "Totally agree. We hit the same wall with...",
          "score": 89,
          "replies": []
        }
      ]
    }
  ]
}
Cost: 1 credit per request

Search Posts

POST /api/reddit/search/posts

Search across all of Reddit by keyword. Returns matching posts from any subreddit with engagement metrics.

Parameters

Parameter Type Required Description
query string Yes Search keywords
subreddit string No Limit to specific subreddit
sort string No relevance, top, new, comments
count integer No Results per page
cursor string No Pagination cursor
Cost: 1 credit per request

User Posts

POST /api/reddit/user/posts

Extract all posts from a specific user. Useful for understanding a user's expertise, interests, and posting patterns across all subreddits.

Parameters

Parameter Type Required Description
username string Yes Reddit username (without u/)
sort string No Sort order: hot, new, top
count integer No Number of posts to return
cursor string No Pagination cursor
Cost: 1 credit per request

User Comments

POST /api/reddit/user/comments

Extract a user's comment history across all subreddits. Reveals their areas of expertise and engagement patterns.

Parameters

Parameter Type Required Description
username string Yes Reddit username (without u/)
sort string No Sort order: hot, new, top
count integer No Number of comments to return
cursor string No Pagination cursor
Cost: 1 credit per request

Code Examples

Production-ready examples for extracting Reddit data.

Product Sentiment Analysis

Python — Search + comment extraction
import requests

API_KEY = "YOUR_API_KEY"
BASE = "https://api.anysite.io"
headers = {"access-token": API_KEY}

# Search for product discussions
results = requests.post(f"{BASE}/api/reddit/search/posts",
    headers=headers,
    json={
        "query": "TechCorp review",
        "sort": "top",
        "count": 50
    }
).json()

# Pull comments from top discussions
for post in results["posts"][:10]:
    comments = requests.post(f"{BASE}/api/reddit/posts/comments",
        headers=headers,
        json={"post_url": post["url"]}
    ).json()

    print(f"\n[{post['score']} pts] {post['title']}")
    print(f"  {post['num_comments']} comments in r/{post['subreddit']}")
    for comment in comments["comments"][:3]:
        print(f"  > {comment['body'][:100]}... ({comment['score']} pts)")

Competitive Intelligence

Python — Monitor competitor mentions across subreddits
import requests

API_KEY = "YOUR_API_KEY"
BASE = "https://api.anysite.io"
headers = {"access-token": API_KEY}

# Monitor competitor mentions across key subreddits
subreddits = ["dataengineering", "devops", "machinelearning", "SaaS"]
competitor = "CompetitorName"

all_mentions = []
for sub in subreddits:
    results = requests.post(f"{BASE}/api/reddit/search/posts",
        headers=headers,
        json={
            "query": competitor,
            "subreddit": sub,
            "sort": "new",
            "count": 25
        }
    ).json()
    all_mentions.extend(results["posts"])

print(f"Found {len(all_mentions)} mentions of {competitor}")
for post in sorted(all_mentions, key=lambda p: p["score"], reverse=True)[:10]:
    print(f"  r/{post['subreddit']} [{post['score']}] {post['title'][:80]}")

Subreddit Posts

# Get top posts from a subreddit
curl -X POST "https://api.anysite.io/api/reddit/posts" \
  -H "access-token: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"subreddit": "dataengineering", "sort": "top", "count": 25}'

Search Across Reddit

# Search all subreddits by keyword
curl -X POST "https://api.anysite.io/api/reddit/search/posts" \
  -H "access-token: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"query": "best data pipeline tool 2026", "count": 50}'

Post Comments

# Get full comment thread for a post
curl -X POST "https://api.anysite.io/api/reddit/posts/comments" \
  -H "access-token: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"post_url": "https://reddit.com/r/dataengineering/comments/abc123/"}'

CLI Commands

Subreddit posts
anysite api /api/reddit/posts subreddit=dataengineering sort=top count=25
Search with subreddit filter
anysite api /api/reddit/search/posts \
  query="kubernetes alternatives" sort=top count=50
Post comments
anysite api /api/reddit/posts/comments \
  post_url="https://reddit.com/r/dataengineering/comments/abc123/"
User history
anysite api /api/reddit/user/posts username=data_engineer_jane
anysite api /api/reddit/user/comments username=data_engineer_jane

Pipeline YAML — Market Research

pipeline.yaml
name: reddit-market-research
sources:
  product_discussions:
    endpoint: /api/reddit/search/posts
    input:
      query: "data pipeline tool recommendation"
      count: 50

  detailed_threads:
    endpoint: /api/reddit/posts/comments
    depends_on: product_discussions
    input:
      post_url: ${product_discussions.url}
    on_error: skip

  competitor_mentions:
    endpoint: /api/reddit/search/posts
    input:
      query: ${file:competitor_names.txt}
      count: 25
    parallel: 3

storage:
  format: parquet
  path: ./data/reddit-research

Use Cases

Product Research and Feature Validation

Problem

Before building a feature, you want to know if real users are asking for it. Before launching a product, you want to know what frustrations exist with current solutions. Surveys are biased — people give you answers they think you want to hear. Reddit gives you what they actually say.

Solution

Search for discussions about your problem space. Pull comment threads from "What tool do you use for X?" and "Frustrated with Y" posts. Analyze the recurring pain points, feature requests, and tool recommendations.

Result

Unfiltered market intelligence. Know exactly what problems users describe in their own words, which solutions they've tried, and what they wish existed.

Brand and Reputation Monitoring

Problem

Reddit threads about your product rank in Google search results. A negative experience post with 500 upvotes becomes the first thing potential customers see. Without monitoring, you discover these threads weeks later.

Solution

Search for your brand name, product names, and key executive names across Reddit. Pull comment threads to understand sentiment. Respond to legitimate complaints. Track mention volume and sentiment over time.

Result

Real-time brand intelligence from Reddit. Catch negative threads early, understand what drives positive mentions, and track your reputation as it evolves.

Content Ideation and SEO Research

Problem

Finding topics that your target audience actively discusses and cares about is the foundation of content strategy. Reddit discussions show you the exact questions people ask, the language they use, and the problems they describe.

Solution

Monitor relevant subreddits for trending topics, recurring questions, and popular comparisons. Use post scores and comment counts as proxy for audience interest. Extract the exact phrasing people use as seed keywords for SEO research.

Result

Content calendars built from real audience demand, not keyword tool estimates. The questions people ask on Reddit today are the blog posts and landing pages you should write tomorrow.

Academic and Social Research

Problem

Reddit is one of the largest sources of public discourse data. Researchers studying online communities, public opinion, technology adoption, or social dynamics need structured access to posts and comments at scale.

Solution

Extract subreddit posts and comment threads programmatically. Build datasets spanning specific time periods, topics, or communities. Store in Parquet or databases for analysis with standard research tools.

Result

Structured datasets from Reddit for academic analysis. Reproducible data collection pipelines that can be re-run to track changes over time.

How Anysite Compares

Feature Anysite Reddit API (Free) Reddit API (Paid) Apify Pushshift
Subreddit posts 1 credit per request 100 req/min Higher limits Actor-based Restricted
Comments 1 credit per thread 100 req/min Higher limits Actor-based Restricted
Search Cross-platform Reddit only Reddit only Actor-based Historical only
User history Posts + comments Available Available Actor-based Restricted
Authentication API key OAuth 2.0 OAuth 2.0 + contract API key Moderator access
Commercial use Allowed Restricted Contract required Allowed Not allowed
Pipeline support YAML pipelines None None Actor scheduling None
Other platforms 9+ sources Reddit only Reddit only 1,800+ actors Reddit only

Reddit's free API tier (100 req/min) requires OAuth and prohibits commercial data collection. Their paid enterprise tier requires a sales contract. Anysite provides the same data access with a simple API key and credits — no OAuth, no usage restrictions, no contract negotiations.

Pushshift, the academic archive that researchers relied on, is now restricted to Reddit moderation tools. The historical data that powered most Reddit research is no longer publicly accessible.

Pricing

All endpoints cost 1 credit per request.

Endpoint Costs

Endpoint Cost
Subreddit posts 1 credit
Post comments 1 credit
Search posts 1 credit
User posts 1 credit
User comments 1 credit

Cost Examples

Use Case Monthly Volume Credits Recommended Plan
Monitor 5 subreddits (daily) ~150 requests ~150 Starter ($49/mo)
Brand monitoring (daily search + threads) ~300 requests ~300 Starter ($49/mo)
Market research (50 threads + comments) ~100 requests ~100 Starter ($49/mo)
Full pipeline (search + comments + user history) ~1,000 requests ~1,000 Starter ($49/mo)

Reddit data extraction is among the most cost-efficient use cases. Comprehensive monitoring and research stays well within the Starter plan's 15,000 monthly credits.

Frequently Asked Questions

Do I need a Reddit account to use the API?
No. You authenticate with your Anysite API key only. No Reddit account, no OAuth, no app registration required.
Can I access NSFW subreddits?
The API accesses publicly available content. NSFW subreddits that are public are accessible. Quarantined or restricted subreddits may have limited availability.
How far back can I search?
The search endpoint returns results based on Reddit's search index. For historical data beyond what search returns, you can paginate through subreddit posts using the cursor.
Can I get nested comment replies?
Yes. The comments endpoint returns the full comment tree including reply chains. Nested replies are structured hierarchically in the response.
How do I analyze sentiment from Reddit discussions?
The API returns raw post and comment text. For sentiment analysis, use the Anysite CLI's built-in LLM analysis: anysite llm classify can categorize posts by sentiment, topic, or custom categories.
Is commercial use of Reddit data allowed?
Anysite provides API infrastructure for accessing publicly available data. Review Reddit's Terms of Service and applicable laws for your specific use case. Anysite does not restrict commercial use of data extracted through its API.

Start Extracting Reddit Data

7-day free trial with 1,000 credits. Posts, comments, search, and user history. No Reddit account required.