Clean URLs (And More) With This Bash Script

For reasons I needed a script that could take an URL, follow any redirects, and if it didn’t exist, see if it had been archived by the Internet Archive and return that link instead.

So I put together muna. It’s really a powered-up version of unredirector in the agaetr script I put together. It can be used standalone or as a bash function "unredirector".

If there’s a redirect, whether from a shortener or, say, redirected to HTTPS, muna will follow that and change the variable "$url" (or return to STDOUT) the appropriate URL. If there is any other error (including if the page is gone or the server has disappeared), it will see if the page is saved at the Internet Archive and return the latest capture instead. If it cannot find a copy anywhere, it changes the variable "$url" to a NULL string and returns nothing, exiting with the exit code 99.

There is also feeds-in.sh.

While that script is included here as an example, it is a fully functional DEATH ST… script. It’s a functional script, appropriate to put in a cronjob to preprocess sources of URLs for ArchiveBox. Or use it as the base of a script to meet your needs.

One important and super useful note for someone who already has a big list of URLs from some other program: All you have to do is put that text file, one URL per line, in RAWDIR (which you’ll configure here in a second) and that list will be pulled seamlessly into the workflow.

You can find full instructions for both and the scripts on GitHub, GitLab, or my personal git repository.

Related