⚠️ Warning: This post is over a year old, the information may be out of date.

↩️ Reply to:
https://petersmith.org/webmentions/likes/2022/like-202220221213-124654/

📅 2022-12-18 | ⏰ 2 minutes

Previously, I always send my URL to the Wayback Machine to archive it. But now, I have a better way to do it. I use the Gitlab CI/CD service to send my URL to the Wayback Machine. It is a free service and it is very easy to use. I just need to add a few lines of code to my Hugo build script.

Here is an example:

First, I need to add some python script called archivenow during build stage.

pages:
  stage: deploy
  script:
    - hugo --verbose --minify --enableGitInfo
    - git clone [email protected]:oduwsdl/archivenow.git
    - cd archivenow
    - pip install -r requirements.txt
    - pip install ./
    - cd ..
    - ./submit10url2backway.sh
    - echo 'Build is complete and feed submited to web archieved..!!!'

Then, I need to add “submit10url2backway” script to send my URL to the Wayback Machine. Here is the script:

#!/usr/bin/env bash
# Copyright Robbi Nespu <[email protected]> , 2022
# License: MIT

# This script is used to archive my website to the Internet Archive.

# Fetch RSS and read the first 10 URLs send send it to the Wayback Machine
urls1=$(curl -s https://robbinespu.gitlab.io/index.xml | grep "<link>" | head -n 10 | awk -F"<guid>" '{print $2} ' | awk -F"</guid>" '{print $1}')
for i in $urls1
    do  archivenow --ia --is --mg "$i"
done

# Same like previous but for the IndieWeb RSS
urls2=$(curl -s https://robbinespu.gitlab.io/indieweb/index.xml | grep "<link>" | head -n 10 | awk -F"<guid>" '{print $2} ' | awk -F"</guid>" '{print $1}')
for i in $urls2
    do  archivenow --ia --is --mg "$i"
done

# Archive the main website
archivenow --ia --is --mg "https://robbinespu.gitlab.io/"
echo "Done!"

But I have turn off / comment out the script since a year ago because I don’t want to archive my website too often (because I added a webhook to execute CI/CD when received payload from webmention ).

Plus this tricks only work for existing old post, the new post not being archived yet because pipeline still running and it the “artifacts” still un-publish.

I only want to archive it when I have a new post. So, I will archive it manually 😊

Posted by: Robbi Nespu

Edit

#!/usr/bin/env bash # Copyright Robbi Nespu <[email protected]> , 2022 # License: MIT # This script is used to archive my website to the Internet Archive. # Modified for my site - 2022-12-19 by Peter Smith # Fetch RSS and read the first 10 URLs send send it to the Wayback Machine urls1=$(curl -s https://petersmith.org/index.xml | grep "<link>" | head -n 10 | awk -F"<guid>" '{print $2} ' | awk -F"</guid>" '{print $1}') for i in $urls1 do archivenow --ia "$i" done # Same like previous but for the IndieWeb RSS urls2=$(curl -s https://robbinespu.gitlab.io/indieweb/index.xml | grep "<link>" | head -n 10 | awk -F"<guid>" '{print $2} ' | awk -F"</guid>" '{print $1}') for i in $urls2 do archivenow --ia "$i" done # Archive the main website archivenow --ia "https://petersmith.org/" echo "Done!"

↩️ Reply to:https://petersmith.org/webmentions/likes/2022/like-202220221213-124654/

↩️ Reply to:
https://petersmith.org/webmentions/likes/2022/like-202220221213-124654/