JOURNAL.ROBBI.MY
← Back to IndieWeb
reply

Reply to: https://petersmith.org/webmentions/likes/2022/like-202220221213-124654/

https://petersmith.org/webmentions/likes/2022/like-202220221213-124654/ β†—

Previously, I always send my URL to the Wayback Machine to archive it. But now, I have a better way to do it. I use the Gitlab CI/CD service to send my URL to the Wayback Machine. It is a free service and it is very easy to use. I just need to add a few lines of code to my Hugo build script.

Here is an example:

First, I need to add some python script called archivenow during build stage.

bash
pages:
  stage: deploy
  script:
    - hugo --verbose --minify --enableGitInfo
    - git clone [email protected]:oduwsdl/archivenow.git
    - cd archivenow
    - pip install -r requirements.txt
    - pip install ./
    - cd ..
    - ./submit10url2backway.sh
    - echo 'Build is complete and feed submited to web archieved..!!!'

Then, I need to add “submit10url2backway” script to send my URL to the Wayback Machine. Here is the script:

bash
#!/usr/bin/env bash
# Copyright Robbi Nespu <[email protected]> , 2022
# License: MIT

# This script is used to archive my website to the Internet Archive.

# Fetch RSS and read the first 10 URLs send send it to the Wayback Machine
urls1=$(curl -s https://robbinespu.gitlab.io/index.xml | grep "<link>" | head -n 10 | awk -F"<guid>" '{print $2} ' | awk -F"</guid>" '{print $1}')
for i in $urls1
    do  archivenow --ia --is --mg "$i"
done

# Same like previous but for the IndieWeb RSS
urls2=$(curl -s https://robbinespu.gitlab.io/indieweb/index.xml | grep "<link>" | head -n 10 | awk -F"<guid>" '{print $2} ' | awk -F"</guid>" '{print $1}')
for i in $urls2
    do  archivenow --ia --is --mg "$i"
done

# Archive the main website
archivenow --ia --is --mg "https://robbinespu.gitlab.io/"
echo "Done!"

But I have turn off / comment out the script since a year ago because I don’t want to archive my website too often (because I added a webhook to execute CI/CD when received payload from webmention ).

Plus this tricks only work for existing old post, the new post not being archived yet because pipeline still running and it the “artifacts” still un-publish.

I only want to archive it when I have a new post. So, I will archive it manually 😊

Have some thoughts, discussion or feedback on this post?

β–Έ What is webmention? How to send interactions!
Below you can find all of webmention with this page. You can also mention this URL on any website that supports WebMention. Have you written a response to this post? Let me know the URL:

Don't have WebMention? Use Comment Parade!

Replies (1)

Peter Smith avatar

Robbie, that's a very helpful post. That is so wonderfully systematic of you. By the way, the link to archivenow just loops back to your page. But you gave me enough clues to find it on Github. Thanks.

What I really like is your use of the webhooks for webmentions. I need to figure out how to do that.

My 'build' process is rather manual and crude. Basically, once I make a post (or similar), I

  • run a script that does a git commit to Github
  • wait for Netlify (or its app) to notice the changes and build my site
  • job more or less done.

To process webmentions, I run my getMentions.sh script (see here for details), and then run my other script to build the site. I need to do some thinking as to where in all of that I do the automatic thing with the webmention webhooks. That's a project for another day.

Meanwhile, I've taken your β€œsubmit10url2backway” scripted and adapted it for my site.


  #!/usr/bin/env bash
  # Copyright Robbi Nespu <[email protected]> , 2022
  # License: MIT

  # This script is used to archive my website to the Internet Archive.
  # Modified for my site - 2022-12-19 by Peter Smith

  # Fetch RSS and read the first 10 URLs send send it to the Wayback Machine
  urls1=$(curl -s https://petersmith.org/index.xml | grep "<link>" | head -n 10 | awk -F"<guid>" '{print $2} ' | awk -F"</guid>" '{print $1}')
  for i in $urls1
      do  archivenow --ia "$i"
  done

  # Same like previous but for the IndieWeb RSS
  urls2=$(curl -s https://robbinespu.gitlab.io/indieweb/index.xml | grep "<link>" | head -n 10 | awk -F"<guid>" '{print $2} ' | awk -F"</guid>" '{print $1}')
  for i in $urls2
      do  archivenow --ia "$i"
  done

  # Archive the main website
  archivenow --ia "https://petersmith.org/"
  echo "Done!"

Comments section coming soon. For now, send a webmention or email.