Curl this Site

This site is made up of about thirty pages containing around 8000 words. We can fetch the whole site in export format, a hash of pages, and examine it with jq.

We'll use unix command line tools to perform experiments. Curl fetches pages, jq parses json, wc counts words, sort sorts files, uniq counts duplicates.

SITE=about.fed.wiki

We fetch the story from each page and the text from each item of the story.

curl $SITE/system/export.json | \ jq '.[].story[].text' | \ wc -w

7974

The edit history of each page is stored as actions in the journal. We see there if a page has forked one from another site. We use the unix idiom, sort|uniq|sort to tally their locations.

1 "garden.asia.wiki.org" 1 "hello.ward.bay.wiki.org" 1 "house.asia.wiki.org" 1 "plugins.fed.wiki.org" 1 "splash.fed.wiki.org" 1 "ward.asia.wiki.org" 2 "ward.bay.wiki.org" 3 "forage.ward.fed.wiki.org" 3 "glossary.asia.wiki.org" 4 "ward.fed.wiki.org" 1123 null

We see pages here have come from ten other sites. Four of them have been copied from multiple times. This means a third to a half of the thirty pages here started elsewhere.

See Curl a Page to look deeper into our json.