CLI — Production Data Pipelines From Your Terminal

The open-source CLI that turns Anysite's web-to-API engine into production data pipelines. Describe what you need — your AI agent builds and runs the pipeline using CLI commands. Local storage. Zero token waste. From idea to structured dataset in minutes.

$ pip install anysite-cli
Agent Protocol
Local-first storage
YAML pipelines
Any website, any platform

Why Traditional Web Scraping Alternatives Fall Short

Every existing method for web data extraction has the same problem: they weren't built for production data pipelines.

Traditional approaches — browser automation, workflow tools, custom scripts, API aggregators — each solve part of the problem. Anysite CLI solves the whole thing: declarative pipelines that handle extraction, transformation, storage, analysis, and scheduling from a single YAML file. Your AI agent builds the pipeline from a description using CLI tools. No selectors to break, no tokens to burn, no infrastructure to manage.

Declarative YAML Pipelines

Chain multiple data sources with dependencies. Define filters, output format, storage destination, and schedule — in one file. Six pre-built templates for common patterns. Incremental collection with cursor tracking.

Full Data Stack Built In

Batch processing with parallel execution and error strategies. Database loading into SQLite, PostgreSQL, ClickHouse. LLM enrichment: classify, summarize, enrich, deduplicate. SQL queries via DuckDB. Cron scheduling with webhooks.

Full capability reference at docs.anysite.io/cli/overview

Describe It or Define It. Collect. Store. Query.

Two paths to the same result: let your AI agent build the pipeline from natural language, or write the YAML yourself for full control.

1

Define Pipeline

YAML config or natural language via your AI agent

2

Preview & Collect

Dry-run to estimate, then execute

3

Store Locally

Parquet, DuckDB, PostgreSQL, SQLite

4

Query & Analyze

SQL queries + LLM classification

name: prospect-pipeline
sources:
  target_companies:
    endpoint: /api/linkedin/search/companies
    input:
      industry: "SaaS"
      employee_count: "51-200"
    parallel: 3

  decision_makers:
    endpoint: /api/linkedin/company/employees
    depends_on: target_companies
    input:
      company: ${target_companies.urn}
      keywords: "VP Sales, Director Sales"
      count: 5
    on_error: skip

  recent_posts:
    endpoint: /api/linkedin/user/posts
    depends_on: decision_makers
    input:
      urn: ${decision_makers.internal_id.value}
      count: 5

storage:
  format: parquet
  path: ./data/prospects
# Preview costs before running
anysite dataset collect pipeline.yaml --dry-run

# Execute the full pipeline
anysite dataset collect pipeline.yaml

# Run incremental updates
anysite dataset collect pipeline.yaml --incremental

# Query results with SQL
anysite dataset query pipeline.yaml \
  --sql "SELECT * FROM decision_makers WHERE title LIKE '%CTO%'"

# Classify posts with LLM
anysite llm classify pipeline.yaml --source recent_posts \
  --categories "product_update,hiring,thought_leadership"

Any Website Is an Endpoint. Major Platforms Are Ready Out of the Box.

The Anysite engine turns any web page into structured data via AI parsing. Major platforms come with dedicated, optimized endpoints.

Platform Coverage
LinkedInProfiles, companies, posts, jobs, search, email finder
InstagramProfiles, posts, reels, comments, search
Twitter/XProfiles, tweets, search, followers
RedditPosts, comments, subreddits, search, user history
YouTubeVideos, channels, subtitles, comments, search
DuckDuckGoWeb search results
SEC EDGARCompany filings (10-K, 10-Q, 8-K)
Y CombinatorCompanies, founders, batches
Any URLAI-powered structured extraction from any webpage

Built for Real Workflows

From lead gen to research, the CLI handles production-grade data collection.

Use Case What Happens
Sales Intelligence YAML chains company search → employee lookup → activity. Runs on cron. Outputs to PostgreSQL.
Competitive Intelligence Multi-source collection across LinkedIn, Twitter, Reddit, web. dataset diff detects changes between runs.
Research at Scale Batch 10K+ records with parallel execution. Incremental resume after interruption. DuckDB SQL for analysis.
Brand Monitoring Scheduled pipeline collects mentions across platforms. LLM sentiment classification. Webhook on completion.

Your Data Stays Local. Your Tokens Stay Unburned.

Unlike workflow tools that pass every record through LLM context, Anysite CLI processes data locally. Only config enters the context window.

Approach 1,000 Records 10,000 Records 100,000 Records
Workflow Tool (context-based) ~500K tokens ~5M tokens ~50M tokens
Anysite CLI ~1K tokens ~1K tokens ~1K tokens
Efficiency gain 500x 5,000x 50,000x
Context window: [pipeline.yaml config] ← only this enters context
Local execution: collect → store → query ← all outside context
LLM analysis: [classify/summarize] ← only when requested

Get Running in 5 Minutes

1. Install the CLI

pip install anysite-cli

2. Configure your API key

anysite config set api_key YOUR_API_KEY

3. Update the schema

anysite schema update

4. Make your first request

anysite api /api/linkedin/user user=satyanadella

5. Create your first pipeline

anysite dataset init my-first-pipeline
anysite dataset collect my-first-pipeline/dataset.yaml --dry-run

Resources

No YAML required: The agent-ready CLI means your AI assistant can build pipelines without you writing YAML. Describe the data you need in plain English — your agent discovers endpoints, builds the YAML, and runs the pipeline. Works with Claude Code, Cursor, and any MCP-compatible agent.

Simple Credit-Based Pricing

7-day free trial on Starter. Scale as you grow.

MCP Unlimited
$30/mo
Unlimited MCP
6 req/min
Get MCP
Starter
$49/mo
15,000 credits/mo
$3.27 / 1K · 60 req/min
Free Trial
Scale
$300/mo
190,000 credits/mo
$1.58 / 1K · 150 req/min
Get Plan
Pro
$549/mo
425,000 credits/mo
$1.29 / 1K · 200 req/min
Get Plan
Enterprise
$1,199+/mo
1.2M+ credits/mo
$0.99 / 1K · 200 req/min
Contact Sales

PAYG top-ups at $2.90/1K credits (min $20, 12-month rollover). Active subscription required.

The entire web is your database. The agent is your data engineer.

Open source. MIT license. Start with pip install anysite-cli

$ pip install anysite-cli
Get API Key Read the Docs