the r/SneerClub archive at awful.systems is welcoming contributors. it’s a statically-generated site (from this set of archived posts in JSON format) that uses a unique, high-performance Nix-based static site generation system. the current site desperately needs a new stylesheet (especially on mobile), but one area where I really need advice or contributions is the dataset.

currently, the SneerClub archives only pull in data from the bdfr set, which I generated using Bulk Downloader for Reddit right before Reddit killed its API, but I’d love to merge the SneerClub_comments.jsonl and SneerClub_submissions.jsonl files into the data we’re using to generate the site, since those have older data from ArchiveTeam. unfortunately, that data set is in a complete different format from the BDFR data. any advice for tools or techniques to merge those two data sets into one (or offers to contribute a merge script) is greatly appreciated.

  • @selfOPMA
    link
    27 months ago

    also, the static site generator that makes the SneerClub archive work is a neat project in itself. does it sound like it’d be handy to split off into its own thing?