⚠️ Warning: This post is over a year old, the information may be out of date.

↩️ Reply to:
https://petersmith.org/webmentions/likes/2022/like-202220221213-124654/

📅 | ⏰ 2 minutes

Previously, I always send my URL to the Wayback Machine to archive it. But now, I have a better way to do it. I use the Gitlab CI/CD service to send my URL to the Wayback Machine. It is a free service and it is very easy to use. I just need to add a few lines of code to my Hugo build script.

Here is an example:

First, I need to add some python script called archivenow during build stage.

pages:
  stage: deploy
  script:
    - hugo --verbose --minify --enableGitInfo
    - git clone [email protected]:oduwsdl/archivenow.git
    - cd archivenow
    - pip install -r requirements.txt
    - pip install ./
    - cd ..
    - ./submit10url2backway.sh
    - echo 'Build is complete and feed submited to web archieved..!!!'

Then, I need to add “submit10url2backway” script to send my URL to the Wayback Machine. Here is the script:

#!/usr/bin/env bash
# Copyright Robbi Nespu <[email protected]> , 2022
# License: MIT

# This script is used to archive my website to the Internet Archive.

# Fetch RSS and read the first 10 URLs send send it to the Wayback Machine
urls1=$(curl -s https://robbinespu.gitlab.io/index.xml | grep "<link>" | head -n 10 | awk -F"<guid>" '{print $2} ' | awk -F"</guid>" '{print $1}')
for i in $urls1
    do  archivenow --ia --is --mg "$i"
done

# Same like previous but for the IndieWeb RSS
urls2=$(curl -s https://robbinespu.gitlab.io/indieweb/index.xml | grep "<link>" | head -n 10 | awk -F"<guid>" '{print $2} ' | awk -F"</guid>" '{print $1}')
for i in $urls2
    do  archivenow --ia --is --mg "$i"
done

# Archive the main website
archivenow --ia --is --mg "https://robbinespu.gitlab.io/"
echo "Done!"

But I have turn off / comment out the script since a year ago because I don’t want to archive my website too often (because I added a webhook to execute CI/CD when received payload from webmention ).

Plus this tricks only work for existing old post, the new post not being archived yet because pipeline still running and it the “artifacts” still un-publish.

I only want to archive it when I have a new post. So, I will archive it manually 😊

Posted by: Robbi Nespu

Edit

Have some thoughts, discussion or feedback on this post?

💬 Send me an email

What is webmention? How to send interactions!

Below you can find all of webmention with this page. Which means, you also can mentioned this URL on any website that support WebMention. Have you written a response to this post? Let me know the URL:

Do you use a website that don't have WebMention capabilities? You can just use Comment Parade!




Peter Smith avatar

Peter Smith

Robbie, that's a very helpful post. That is so wonderfully systematic of you. By the way, the link to archivenow just loops back to your page. But you gave me enough clues to find it on Github. Thanks.

What I really like is your use of the webhooks for webmentions. I need to figure out how to do that.

My 'build' process is rather manual and crude. Basically, once I make a post (or similar), I

  • run a script that does a git commit to Github
  • wait for Netlify (or its app) to notice the changes and build my site
  • job more or less done.

To process webmentions, I run my getMentions.sh script (see here for details), and then run my other script to build the site. I need to do some thinking as to where in all of that I do the automatic thing with the webmention webhooks. That's a project for another day.

Meanwhile, I've taken your “submit10url2backway” scripted and adapted it for my site.


  #!/usr/bin/env bash
  # Copyright Robbi Nespu <[email protected]> , 2022
  # License: MIT

  # This script is used to archive my website to the Internet Archive.
  # Modified for my site - 2022-12-19 by Peter Smith

  # Fetch RSS and read the first 10 URLs send send it to the Wayback Machine
  urls1=$(curl -s https://petersmith.org/index.xml | grep "<link>" | head -n 10 | awk -F"<guid>" '{print $2} ' | awk -F"</guid>" '{print $1}')
  for i in $urls1
      do  archivenow --ia "$i"
  done

  # Same like previous but for the IndieWeb RSS
  urls2=$(curl -s https://robbinespu.gitlab.io/indieweb/index.xml | grep "<link>" | head -n 10 | awk -F"<guid>" '{print $2} ' | awk -F"</guid>" '{print $1}')
  for i in $urls2
      do  archivenow --ia "$i"
  done

  # Archive the main website
  archivenow --ia "https://petersmith.org/"
  echo "Done!"