Skip to content
fb

Datasets

Expand a root into URLs, crawl them into records, and store them in a local SQLite database you can query with SQL.

For bulk work, fb composes into a small pipeline: seed turns a root into a stream of URLs, crawl fetches each into a full record, and db queries the result. Each stage is a separate command so you can inspect or reshape the stream in between.

Seed: a root into URLs

seed expands a page, profile, group, or search into a stream of post URLs, one per line:

fb seed page nasa --limit 100
fb seed search "climate" --type page --limit 50

Because it is just lines of URLs, you can filter, sort, or split the stream with ordinary tools before crawling.

Crawl: URLs into records

crawl reads URLs from stdin (or --from <file>), classifies each, fetches the matching record, and emits it. Point it at a database with --db to store as it goes:

fb seed page nasa --limit 100 | fb crawl --db nasa.db

Pull the comment thread and reactions for each post as you crawl:

fb seed page nasa --limit 100 | fb crawl --db nasa.db --comments --reactions

A URL that fails is logged to stderr and skipped, so one bad item does not stop the run.

The SQLite store

The store is an ordinary SQLite file with one table per record type: pages, profiles, groups, posts, comments, reactions, photos, videos, events. Records upsert by id, so re-crawling refreshes rather than duplicates. Each row keeps the full record as JSON alongside its key columns.

Query it with fb db query, which renders results through the usual formatter:

fb db --db nasa.db query "select owner_name, count(*) n from posts group by 1 order by n desc"
fb db --db nasa.db query "select text from comments order by reactions_count desc limit 10" -o jsonl

Because it is plain SQLite, any other tool works too:

sqlite3 nasa.db "select count(*) from posts"

A full example

Collect a Page's last 200 posts with their comments into a database, then ask which posts drew the most discussion:

fb seed page nasa --limit 200 | fb crawl --db nasa.db --comments
fb db --db nasa.db query \
  "select permalink, comments_count from posts order by comments_count desc limit 10"