Archiving
Save a Page's whole feed as a browsable, incremental tree of Markdown files: one file per post with its comments, indexed by month.
archive turns a Page into a folder of Markdown you can read, grep, or commit to
git. Each post becomes its own file, comments are embedded inline, and a
generated README.md indexes everything by month. The archive is incremental:
re-running only fetches posts that are not already on disk, so you can keep a
Page mirrored over time with the same command.
A first archive
fb archive aivietnam.edu.vn --comments
By default the tree is written under ~/data/<page>. Point --out somewhere
else to choose the root:
fb archive nasa --out ~/archives -n 100 --comments
The layout is browsable on disk and on any git host:
~/data/aivietnam.edu.vn/
README.md index, grouped by year and month
index.json state used for incremental runs
2025/
11/
2025-11-03_khoa-hoc-ai-mien-phi.md one file per post, with its comments
2025-11-01_thong-bao-tuyen-sinh.md
10/
...
Each post file carries its title, date and engagement counts, a link back to Facebook, the post text, images, external links, and the full comment thread. Post slugs are transliterated to ASCII, so Vietnamese (and other accented) titles produce clean file names.
Comments and replies
--comments is on by default and embeds each post's comment thread under a
## Comments heading. Add --replies to walk the reply threads too; replies are
indented under the comment they answer:
fb archive aivietnam.edu.vn --replies
Turn comments off for a faster, text-only archive:
fb archive aivietnam.edu.vn --comments=false
Incremental runs
The archive remembers what it has already saved in index.json. On the next
run, any post whose Markdown file is still on disk is skipped, and only new posts
are fetched and written:
# first run: pulls the recent feed
fb archive aivietnam.edu.vn
# a week later: only the new posts are fetched, README is regenerated
fb archive aivietnam.edu.vn
The index file and each post are written as the crawl proceeds, so an interrupted run loses nothing: re-running picks up exactly where it stopped.
Pass --force to re-fetch and overwrite posts that are already archived, for
example to refresh engagement counts or pull newly added comments:
fb archive aivietnam.edu.vn --force
Bounding the crawl
The same global flags that bound a feed apply here:
fb archive nasa -n 50 # at most 50 posts
fb archive nasa --since 2025-01-01 # stop once posts get older than this
What gets archived
fb archive reads the public crawler surface, so it works on any public Page
with no login:
fb archive aivietnam.edu.vn --comments
The same surface sets the ceiling on depth. An archive captures the most recent posts a Page exposes rather than its entire history, and each post's preview comments rather than the full thread. Re-running picks up new posts incrementally and skips the ones already on disk. See [how fb reads Facebook]({{< relref "authentication.md" >}}) for the details.