Rehydrating my Twitter archive

Introduction

Twitter was a social media network that operated from March 2006 to July 2023 (later replaced by "X", an almost entirely different social media network). From the late 201x years through late 2022 Twitter was a very useful social media network, often with the most up to date information available especially for current events and IT related tasks. (Then Elon Musk offered to buy Twitter at an unrealistically high value, on a whim; was forced to go through with his unconditional offer, and proceeded to fire most of the Twitter staff, and randomly turn off a lot of the hardware. Twitter was never the same after that.)

I used Twitter read-only actively for several years via the web interface, and eventually created a Twitter account in January 2019, just in time for Linux.Conf.Au 2019. Then I actively used my Twitter account until late 2022, when it became clear that Twitter was no longer a good place.

Along with many many others, I downloaded my Twitter data archive repeatedly in late 2022, with the final download in December 2022. Fortunately I also knew to use Twitter Archive Parser at the time, to fill out details that Twitter's own archive truncated or omitted (eg, expand out t.co links to the full version, full resolution images, etc). So I had a moderately complete archive of my Twitter account. But sitting in a zip file was not particularly accessible.

This week I decided that if Twitter was going to make my account mostly inaccessible without logging in (and more recently rumoured that they would make it inaccessible without a paid "X" account), I should do something myself to make my own posts useful to me again. So I created https://twitter.ewenmcneill.nz/, to put my Twitter account archive online again.

Rehydrating a Twitter Archive Parser archive

To bring the archive back online, in month-by-month pages, with the ability to link to individual Tweets -- but still view some context -- I chose to use the Twitter Archive Parser generated per-month Markdown files, and convert those back into HTML with some usability tweaks.

Assuming you too have a Twitter Archive Parser filled out version of a Twitter archive (ideally with Twitter Archive Parser run in 2022; I am not sure if it will even still work), the steps to put it online were something like:

mkdir /tmp/twitter
cd /tmp/twitter
unzip twitter-data-ewenmcneill-2022-12-18-with-md-and-full-images.zip
rm -r data                    # Remove the JSON files from Twitter
rm -r "Your archive.html"     # Remove top level "start here" page

# Download a copy of the CSS used in the Twitter archive
wget https://unpkg.com/@picocss/pico@latest/css/pico.min.css

# Install markdown_py
brew install python-markdown  # MacOS homebrew

# Translate each Markdown file into a HTML file
for FILE in *.md; do
  YEAR_MONTH=$(echo $FILE | cut -f 1-2 -d -);
  ./twittermd2html "${FILE}" >"${YEAR_MONTH}.html"
done

# Generate an index.html file
for FILE in 2*.html; do
   echo "        <li><a href=\"$FILE\">$FILE</a></li>"; 
done | sed 's/.html</</;' >index.html

vi index.html             # Add the rest of the HTML

and then put that online on a webserver somewhere, generate a TLS certificate for it, etc.

The twittermd2html script is a hacky shell script that I wrote which:

parses the filename to figure out the month and year
special cases hashtags starting a line in the Markdown, by injecting and removing a &zwsp; (otherwise Markdown will turn them into headings)
Runs markdown_py over each file
Injects a HTML header and footer into each HTML file, which references the local pico.min.css file and includes some local style overrides (mostly reducing excess white space)
Transforms the HTML to move the "Tweet / datetime" above the Tweet text rather than below it (it seems more readable that way)
Transforms the HTML to add an acchor tag with the Tweet ID number, and a link (on (#)) to that anchor tag
Rewrite paragraphs which enclose images to have a CSS class on them, so they can be targetted to have reduced surrounding white space

The result is not perfect by any means, but is fairly readable, and gives me easy access to an anchor link for every rehydrated Tweet.

Among other issues that I have noted, it appears that retweeted Tweets (a) show with the storage RT @USER in them, and (b) seem to be truncated (possibly to the original very short Tweet length), which makes longer retweets less useful than they otherwise might be. (Some shorter retweeted Tweets came through perfectly though. And the date on each rehydrated Tweet links to the original Tweet, in case that still works and/or can be found on, eg, the Internet Archive WayBack Machine.)

There are also a bunch of things (like Twitter usernames and embedded links) that ideally would be turned back into links, and some formatting updates I would probably do for presentation purposes. But none of them are sufficiently urgent to spend more time on them now.

But the result -- at https://twitter.ewenmcneill.nz/ -- is much more useful to me than a zip file sitting on a hard drive. Especially since I used to mainly use Twitter as an alternative for a bookmarking service :-)