
For reasons I needed a script that could take an URL, follow any redirects, and if it didn’t exist, see if it had been archived by the Internet Archive and return that link instead.
So I put together muna
. It’s really a powered-up version of unredirector
in the agaetr script I put together. It can be used standalone or as a bash function "unredirector".
If there’s a redirect, whether from a shortener or, say, redirected to HTTPS,muna
will follow that and change the variable"$url"
(or return to STDOUT) the appropriate URL. If there is any other error (including if the page is gone or the server has disappeared), it will see if the page is saved at the Internet Archive and return the latest capture instead. If it cannot find a copy anywhere, it changes the variable"$url"
to a NULL string and returns nothing, exiting with the exit code99
.
There is also feeds-in.sh
.
While that script is included here as an example, it is a fully functional DEATH ST… script. It’s a functional script, appropriate to put in a cronjob to preprocess sources of URLs forArchiveBox
. Or use it as the base of a script to meet your needs.
One important and super useful note for someone who already has a big list of URLs from some other program: All you have to do is put that text file, one URL per line, inRAWDIR
(which you’ll configure here in a second) and that list will be pulled seamlessly into the workflow.
You can find full instructions for both and the scripts on GitHub, GitLab, or my personal git repository.