Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.rigbox.dev/llms.txt

Use this file to discover all available pages before exploring further.

The Firecrawl template gives you a fully self-hosted Firecrawl v2 API running inside a Rigbox workspace. It comes pre-configured with everything you need to scrape and crawl websites at scale.

What’s Included

The template provisions a complete Firecrawl stack:
ComponentPurpose
Firecrawl v2 APIThe main scraping and crawling engine
PostgreSQLStores crawl jobs, results, and metadata
RedisCaching and rate limiting
RabbitMQJob queue for distributed crawl tasks
Playwright + ChromiumRenders JavaScript-heavy pages for accurate scraping
The workspace runs on 2 vCPU, 4 GB RAM, and 8 GB disk.

Deploying Firecrawl

You can deploy Firecrawl with a single click from the template gallery, or use the API.
curl -X POST https://api.rigbox.dev/api/v1/quick-deploy \
  -H "Authorization: Bearer $RIGBOX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "template_id": "firecrawl"
  }'
Once the workspace is running, your Firecrawl API is live and ready to accept requests.
The Firecrawl API key is generated automatically during deployment. You can find it in the Usage tab of your app in the dashboard.

Scraping a Page

Use the v2 /scrape endpoint to extract content from a single URL. The waitFor and timeout parameters help with pages that load content dynamically using JavaScript.
curl -X POST https://$SUBDOMAIN.rigbox.dev/v1/scrape \
  -H "Authorization: Bearer $FIRECRAWL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "waitFor": 3000,
    "timeout": 30000
  }'
Set waitFor to a higher value (in milliseconds) for pages that rely heavily on JavaScript to render content. This tells Playwright to wait before capturing the page.

Crawling a Website

To scrape multiple pages from the same site, use the /crawl endpoint. You provide a starting URL and Firecrawl follows links automatically.
curl -X POST https://$SUBDOMAIN.rigbox.dev/v1/crawl \
  -H "Authorization: Bearer $FIRECRAWL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "limit": 50,
    "scrapeOptions": {
      "waitFor": 2000,
      "timeout": 30000
    }
  }'
Crawling is asynchronous. The response includes a job ID that you can poll to check progress and retrieve results.

Configuring a Proxy

Some websites use aggressive anti-bot protection that blocks requests from cloud IP addresses. If you run into this, you can route Firecrawl’s requests through a proxy server. Set the Proxy Server parameter in your app settings. This tells Playwright and the scraping engine to send all outgoing requests through the proxy.
Using a proxy adds latency to every request. Only enable it for sites that actively block direct connections.

AI Features

Firecrawl supports LLM-powered extraction, which lets you pull structured data from pages using natural language prompts. To use these features, you need to configure an LLM provider:
  • Managed Credits: Enable this option to use Rigbox-managed LLM credits. No additional API keys required.
  • Bring Your Own Key (BYOK): Provide your own OpenAI, Anthropic, or other LLM API key in the app settings.
Once configured, you can use Firecrawl’s /scrape endpoint with the extract parameter to pull structured data from pages using a natural language prompt.

Next Steps

  • Workspace Services - monitor the PostgreSQL, Redis, and RabbitMQ services backing your Firecrawl instance
  • App Visibility - control who can access your Firecrawl API
  • Custom Domains - put your Firecrawl instance behind your own domain