Firecrawl

The Firecrawl app recipe gives you a fully self-hosted Firecrawl v2 API running inside a Rigbox workspace. It comes pre-configured with everything you need to scrape and crawl websites at scale.

What’s Included

The recipe provisions a complete Firecrawl stack:

Component	Purpose
Firecrawl v2 API	The main scraping and crawling engine
PostgreSQL	Stores crawl jobs, results, and metadata
Redis	Caching and rate limiting
RabbitMQ	Job queue for distributed crawl tasks
Playwright + Chromium	Renders JavaScript-heavy pages for accurate scraping

Firecrawl runs comfortably on 2 vCPU, 4 GB RAM, and 8 GB disk.

Deploying Firecrawl

Firecrawl is an official app recipe (@rigbox/firecrawl@builtin). Spawn a workspace sized for the stack, then install the recipe into it — the install provisions the Firecrawl API plus its PostgreSQL, Redis, and RabbitMQ backing services and starts everything in one call (rig is already authenticated via rig login):

rig workspace spawn --name scraper --template dev --ram 4096 --vcpu 2 --disk 8192
rig recipe app install -r @rigbox/firecrawl@builtin -w scraper

The install blocks until Firecrawl is live, then prints the app’s name and URL. The Firecrawl API is served at https://<APP_NAME>.rigbox.dev. Once the install completes, your Firecrawl API is live and ready to accept requests.

The Firecrawl API key is generated automatically during installation. You can find it in the Usage tab of your app in the dashboard.

Scraping a Page

Use the v2 /scrape endpoint to extract content from a single URL. The waitFor and timeout parameters help with pages that load content dynamically using JavaScript.

curl -X POST https://$SUBDOMAIN.rigbox.dev/v1/scrape \
  -H "Authorization: Bearer $FIRECRAWL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "waitFor": 3000,
    "timeout": 30000
  }'

import requests

response = requests.post(
    f"https://{subdomain}.rigbox.dev/v1/scrape",
    headers={"Authorization": f"Bearer {firecrawl_api_key}"},
    json={
        "url": "https://example.com",
        "waitFor": 3000,
        "timeout": 30000,
    },
)
result = response.json()
print(result["data"]["markdown"])

Set waitFor to a higher value (in milliseconds) for pages that rely heavily on JavaScript to render content. This tells Playwright to wait before capturing the page.

Crawling a Website

To scrape multiple pages from the same site, use the /crawl endpoint. You provide a starting URL and Firecrawl follows links automatically.

curl -X POST https://$SUBDOMAIN.rigbox.dev/v1/crawl \
  -H "Authorization: Bearer $FIRECRAWL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "limit": 50,
    "scrapeOptions": {
      "waitFor": 2000,
      "timeout": 30000
    }
  }'

response = requests.post(
    f"https://{subdomain}.rigbox.dev/v1/crawl",
    headers={"Authorization": f"Bearer {firecrawl_api_key}"},
    json={
        "url": "https://example.com",
        "limit": 50,
        "scrapeOptions": {
            "waitFor": 2000,
            "timeout": 30000,
        },
    },
)
crawl = response.json()
print(f"Crawl job started: {crawl['id']}")

Crawling is asynchronous. The response includes a job ID that you can poll to check progress and retrieve results.

Configuring a Proxy

Some websites use aggressive anti-bot protection that blocks requests from cloud IP addresses. If you run into this, you can route Firecrawl’s requests through a proxy server. Set the Proxy Server parameter in your app settings. This tells Playwright and the scraping engine to send all outgoing requests through the proxy.

Using a proxy adds latency to every request. Only enable it for sites that actively block direct connections.

AI Features

Firecrawl supports LLM-powered extraction, which lets you pull structured data from pages using natural language prompts. To use these features, you need to configure an LLM provider:

Managed Credits: Enable this option to use Rigbox-managed LLM credits. No additional API keys required.
Bring Your Own Key (BYOK): Provide your own OpenAI, Anthropic, or other LLM API key in the app settings.

Once configured, you can use Firecrawl’s /scrape endpoint with the extract parameter to pull structured data from pages using a natural language prompt.

Next Steps

Workspace Services - monitor the PostgreSQL, Redis, and RabbitMQ services backing your Firecrawl instance
App Visibility - control who can access your Firecrawl API
Custom Domains - put your Firecrawl instance behind your own domain

Getting Started

Concepts

Deploying & hosting

AI, automation & data

CLI Reference

Integrations & tools

Build on Rigbox

What’s Included

Deploying Firecrawl

Scraping a Page

Crawling a Website

Configuring a Proxy

AI Features

Next Steps

​What’s Included

​Deploying Firecrawl

​Scraping a Page

​Crawling a Website

​Configuring a Proxy

​AI Features

​Next Steps

What’s Included

Deploying Firecrawl

Scraping a Page

Crawling a Website

Configuring a Proxy

AI Features

Next Steps