Anysite CLI is a production-grade command-line tool for building data pipelines against any web resource. It includes an agent-ready protocol with endpoint discovery and structured output, so any AI agent can build YAML pipelines autonomously. Also features batch processing with parallel execution, database integration, LLM analysis, and DuckDB SQL querying. Install with pip install anysite-cli.

How does the Agent Protocol work?

The CLI exposes endpoint discovery (anysite schema search), structured JSON output with _hints metadata, and standardized exit codes. Any AI agent — Claude Code, Cursor, ChatGPT — can use these to discover endpoints, build YAML pipelines, estimate costs, and execute collection autonomously.

What websites and platforms does Anysite CLI support?

Anysite CLI has ready-made endpoints for LinkedIn, Twitter/X, Instagram, Reddit, YouTube, SEC EDGAR, Y Combinator, and Google. For any other website, the AI Parser and Web Parser can extract structured data from any URL. AI agents using the CLI can discover endpoints via schema search and _hints metadata.

How is Anysite CLI different from workflow tools like n8n or Zapier?

Workflow tools pass every data transformation through LLM context, consuming millions of tokens for large datasets. Anysite CLI processes data locally — only the pipeline configuration enters the context window. For 10,000 records, this means ~1K tokens vs ~5M tokens, a 5,000x efficiency gain.

What output formats does the CLI support?

The CLI supports JSON (default), JSONL, CSV, and Rich table output. For storage, data can be saved to Parquet files, DuckDB, SQLite, or PostgreSQL. You can also pipe output directly into other CLI tools using standard Unix pipes.

How much does Anysite CLI cost?

Anysite CLI uses credit-based pricing. Starter plan is $49/month (15K credits) with a 7-day free trial. Plans scale up to Enterprise at $1,199+/month (1.2M+ credits). Pay-as-you-go top-ups at $2.90/1K credits (min $20, 12-month rollover). Rate limits vary by plan from 60 to 200 req/min.

Is Anysite CLI a web scraping alternative?

Yes. Anysite CLI is a modern web scraping alternative. Instead of fragile CSS selectors and XPath that break on every website redesign, Anysite uses AI-powered web data extraction to understand page structure automatically. You define declarative YAML data pipelines that describe what data you need, not how to scrape it. This makes your data collection stable, maintainable, and production-ready.

What are YAML data pipelines in Anysite CLI?

YAML data pipelines are declarative configurations that define multi-step data collection workflows. Instead of writing scraping code, you write YAML that specifies data sources, dependencies between them, and storage options. The Anysite CLI executes the pipeline, handles errors, manages rate limits, and stores results locally in Parquet, SQLite, PostgreSQL, or DuckDB. You can chain multiple data sources like companies to employees to posts, all in one pipeline.

Can Anysite CLI extract data from any website?

Yes. The CLI has two modes for web data extraction: 1) Ready-made endpoints for popular platforms like LinkedIn, Instagram, Twitter, Reddit, YouTube, SEC EDGAR, and Google. 2) An AI Parser that can extract structured data from any URL by analyzing the page with AI. Any AI agent can discover the right endpoint via CLI schema search or use the parser to extract structured data from any URL.

CLI — Production Data Pipelines From Your Terminal

Name: Anysite CLI
Author: Anysite

The open-source CLI that turns Anysite's web-to-API engine into production data pipelines. Describe what you need — your AI agent builds and runs the pipeline using CLI commands. Local storage. Zero token waste. From idea to structured dataset in minutes.

$ pip install anysite-cli

Start 7-Day Trial — 1,000 Credits Read the Docs

Agent Protocol

Local-first storage

YAML pipelines

Any website, any platform

Why Traditional Web Scraping Alternatives Fall Short

Every existing method for web data extraction has the same problem: they weren't built for production data pipelines.

Traditional approaches — browser automation, workflow tools, custom scripts, API aggregators — each solve part of the problem. Anysite CLI solves the whole thing: declarative pipelines that handle extraction, transformation, storage, analysis, and scheduling from a single YAML file. Your AI agent builds the pipeline from a description using CLI tools. No selectors to break, no tokens to burn, no infrastructure to manage.

Declarative YAML Pipelines

Chain multiple data sources with dependencies. Define filters, output format, storage destination, and schedule — in one file. Six pre-built templates for common patterns. Incremental collection with cursor tracking.

Agent-Ready Protocol

Your AI agent discovers endpoints (anysite schema search), builds the YAML, estimates costs (--dry-run), and executes autonomously. Structured JSON responses, _hints metadata, exit codes 0-5.

anysite schema search "linkedin company employees"
anysite dataset collect pipeline.yaml --dry-run
anysite dataset collect pipeline.yaml

Full Data Stack Built In

Batch processing with parallel execution and error strategies. Database loading into SQLite, PostgreSQL, ClickHouse. LLM enrichment: classify, summarize, enrich, deduplicate. SQL queries via DuckDB. Cron scheduling with webhooks.

Full capability reference at docs.anysite.io/cli/overview

Describe It or Define It. Collect. Store. Query.

Two paths to the same result: let your AI agent build the pipeline from natural language, or write the YAML yourself for full control.

Define Pipeline

YAML config or natural language via your AI agent

Preview & Collect

Dry-run to estimate, then execute

Store Locally

Parquet, DuckDB, PostgreSQL, SQLite

Query & Analyze

SQL queries + LLM classification

name: prospect-pipeline
sources:
  target_companies:
    endpoint: /api/linkedin/search/companies
    input:
      industry: "SaaS"
      employee_count: "51-200"
    parallel: 3

  decision_makers:
    endpoint: /api/linkedin/company/employees
    depends_on: target_companies
    input:
      company: ${target_companies.urn}
      keywords: "VP Sales, Director Sales"
      count: 5
    on_error: skip

  recent_posts:
    endpoint: /api/linkedin/user/posts
    depends_on: decision_makers
    input:
      urn: ${decision_makers.internal_id.value}
      count: 5

storage:
  format: parquet
  path: ./data/prospects

# Preview costs before running
anysite dataset collect pipeline.yaml --dry-run

# Execute the full pipeline
anysite dataset collect pipeline.yaml

# Run incremental updates
anysite dataset collect pipeline.yaml --incremental

# Query results with SQL
anysite dataset query pipeline.yaml \
  --sql "SELECT * FROM decision_makers WHERE title LIKE '%CTO%'"

# Classify posts with LLM
anysite llm classify pipeline.yaml --source recent_posts \
  --categories "product_update,hiring,thought_leadership"

Any Website Is an Endpoint. Major Platforms Are Ready Out of the Box.

The Anysite engine turns any web page into structured data via AI parsing. Major platforms come with dedicated, optimized endpoints.

Platform	Coverage
LinkedIn	Profiles, companies, posts, jobs, search, email finder
Instagram	Profiles, posts, reels, comments, search
Twitter/X	Profiles, tweets, search, followers
Reddit	Posts, comments, subreddits, search, user history
YouTube	Videos, channels, subtitles, comments, search
DuckDuckGo	Web search results
SEC EDGAR	Company filings (10-K, 10-Q, 8-K)
Y Combinator	Companies, founders, batches
Any URL	AI-powered structured extraction from any webpage

Built for Real Workflows

From lead gen to research, the CLI handles production-grade data collection.

Use Case	What Happens
Sales Intelligence	YAML chains company search → employee lookup → activity. Runs on cron. Outputs to PostgreSQL.
Competitive Intelligence	Multi-source collection across LinkedIn, Twitter, Reddit, web. `dataset diff` detects changes between runs.
Research at Scale	Batch 10K+ records with parallel execution. Incremental resume after interruption. DuckDB SQL for analysis.
Brand Monitoring	Scheduled pipeline collects mentions across platforms. LLM sentiment classification. Webhook on completion.

Your Data Stays Local. Your Tokens Stay Unburned.

Unlike workflow tools that pass every record through LLM context, Anysite CLI processes data locally. Only config enters the context window.

Approach	1,000 Records	10,000 Records	100,000 Records
Workflow Tool (context-based)	~500K tokens	~5M tokens	~50M tokens
Anysite CLI	~1K tokens	~1K tokens	~1K tokens
Efficiency gain	500x	5,000x	50,000x

Context window: [pipeline.yaml config] ← only this enters context

Local execution: collect → store → query ← all outside context

LLM analysis: [classify/summarize] ← only when requested

Get Running in 5 Minutes

1. Install the CLI

pip install anysite-cli

2. Configure your API key

anysite config set api_key YOUR_API_KEY

3. Update the schema

anysite schema update

4. Make your first request

anysite api /api/linkedin/user user=satyanadella

5. Create your first pipeline

anysite dataset init my-first-pipeline
anysite dataset collect my-first-pipeline/dataset.yaml --dry-run

Resources

CLI Documentation GitHub Repository API Reference MCP Server REST API Pricing & Plans

No YAML required: The agent-ready CLI means your AI assistant can build pipelines without you writing YAML. Describe the data you need in plain English — your agent discovers endpoints, builds the YAML, and runs the pipeline. Works with Claude Code, Cursor, and any MCP-compatible agent.

Simple Credit-Based Pricing

7-day free trial on Starter. Scale as you grow.

MCP Unlimited

$30/mo

Unlimited MCP

6 req/min

Get MCP

Starter

$49/mo

15,000 credits/mo

$3.27 / 1K · 60 req/min

Free Trial

POPULAR

Growth

$200/mo

100,000 credits/mo

$2.00 / 1K · 90 req/min

Get Plan

Scale

$300/mo

190,000 credits/mo

$1.58 / 1K · 150 req/min

Get Plan

Pro

$549/mo

425,000 credits/mo

$1.29 / 1K · 200 req/min

Get Plan

Enterprise

$1,199+/mo

1.2M+ credits/mo

$0.99 / 1K · 200 req/min

Contact Sales

PAYG top-ups at $2.90/1K credits (min $20, 12-month rollover). Active subscription required.

The entire web is your database. The agent is your data engineer.

Open source. MIT license. Start with pip install anysite-cli

$ pip install anysite-cli

Get API Key Read the Docs