โ ๏ธ Warning: This post is over a year old, the information may be out of date.
๐ผ Status:
Nenez9595 (bhgn 3)
๐ | โฐ 4 minutes
Assalamualaikum! Pada artikel bahagian ke-2 yang lepas, aku ada cakap untuk guna VPS personal untuk buat kerja cloning… Fuhh take time jugak rupanya walau pakai remote server utk fetch dari nenez9595.blogspot.com :
$ time httrack -q -%i -iC2 nenez9595.blogspot.com -O "/home/robbi/httrack" -n -%P -N0 -s2 -p7 -D -a -K0 -c10 -%k -A25000 -%c10 -F "Mozilla/4.5 (compatible; HTTrack 3.0x; Windows 98)" +*.png +*.gif +*.jpg +*.jpeg +*.css +*.js -ad.doubleclick.net/* -%s -%u
Mirror launched on Sun, 04 Jul 2021 10:31:42 by HTTrack Website Copier/3.49-2+libhtsjava.so.2 [XR&CO'2014]
mirroring nenez9595.blogspot.com +*.png +*.gif +*.jpg +*.jpeg +*.css +*.js -ad.doubleclick.net/* with the wizard help..
* https://79d206c1-a-62cb3a1a-s-sites.googlegroups.com/site/123funjokes4all/creationjdate.js?attachauth=ANoY7cpRQ9lcY_QaSXG51nMX9B6Rh_yEWa4uCVOfi1W9oEmCvOMBxPW60ISSTXsw7lQTaG0oph901yfgGh6K21rTHkbku0Kxa5qhD9xP1kTaaL7Cmq18Op6QboJBPIL0H9d97548/9515: nenez9595.blogspot.com/search/label/Emak dan abah - u will find them there if you want else .. go n find their faces elsewhere%2F got u%3F?updated-max=2012-04-17T02:59:00-07:00&max-results=20&start=20&by-date=false (65752 byPANIC! : Too many URLs : >99999 [3031]d-max=2016-07-10T14:51:00-07:00&max-results=3&reverse-paginate=true&start=102&by-date=false (77312 bytes) - OK
Done.
Thanks for using HTTrack!
real 1839m37.861s
user 45m21.864s
sys 5m44.017s
Kemudian, aku mv
ke folder git dan cuba upload tapi ada isu limited fail size kat Github pulak:
$ git push
Enumerating objects: 105557, done.
Counting objects: 100% (105557/105557), done.
Compressing objects: 100% (97826/97826), done.
Writing objects: 100% (105556/105556), 1.79 GiB | 8.88 MiB/s, done.
Total 105556 (delta 96309), reused 2 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (96309/96309), done.
remote: error: GH001: Large files detected. You may want to try Git Large File Storage - https://git-lfs.github.com.
remote: error: Trace: 8098efb9e77359e435cccb71f3f68514e9b63c36b06203a32f540e0907c6835e
remote: error: See http://git.io/iEPt8g for more information.
remote: error: File hts-cache/new.txt is 281.40 MB; this exceeds GitHub's file size limit of 100.00 MB
remote: error: File hts-cache/new.zip is 1721.38 MB; this exceeds GitHub's file size limit of 100.00 MB
To https://github.com/RobbiNespu/nenez9595.blogspot.com.git
! [remote rejected] master -> master (pre-receive hook declined)
error: failed to push some refs to 'https://github.com/RobbiNespu/nenez9595.blogspot.com.git'
Hmm.. kalau buat LFS ni boleh solve kot, tapi aku duk fikir-fikir nanti kalau deploy kat Github pages
, aku ada banyak sangat limitation, lebih baik aku migrate ke BitBucket
atau Gitlab
terus.
So aku buat repository workspace kat Gitlab
dan commit kat sana semua , Nicely je dapat simpan kat remote source repository, takde issue saiz fail.
Kemudian, aku pun buat la fail .yml
untuk proses CI/CD
supaya fail static HTML ni akan available publicly melalui Gitlab pages. Boom! Jumpa issue lagi:
Running with gitlab-runner 14.0.1 (c1edb478)
on docker-auto-scale 72989761
feature flags: FF_SKIP_DOCKER_MACHINE_PROVISION_ON_CREATION_FAILURE:true
Preparing the "docker+machine" executor 00:15
Using Docker executor with image alpine:latest ...
Pulling docker image alpine:latest ...
Using docker image sha256:d4ff818577bc193b309b355b02ebc9220427090057b54a59e73b79bdfe139b83 for alpine:latest with digest alpine@sha256:234cb88d3020898631af0ccbbcca9a66ae7306ecd30c9720690858c1b007d2a0 ...
Preparing environment 00:01
Running on runner-72989761-project-27918780-concurrent-0 via runner-72989761-srm-1625504483-7949ba0d...
Getting source from Git repository
$ eval "$CI_PRE_CLONE_SCRIPT"
Fetching changes with git depth set to 50...
Initialized empty Git repository in /builds/nenez9595/nenez9595.gitlab.io/.git/
Created fresh repository.
Checking out 26301a63 as master...
Skipping Git submodules setup
Executing "step_script" stage of the job script 00:01
Using docker image sha256:d4ff818577bc193b309b355b02ebc9220427090057b54a59e73b79bdfe139b83 for alpine:latest with digest alpine@sha256:234cb88d3020898631af0ccbbcca9a66ae7306ecd30c9720690858c1b007d2a0 ...
$ echo 'Nothing to do...'
Nothing to do...
Uploading artifacts for successful job
Uploading artifacts...
public: found 105564 matching files and directories
ERROR: Uploading artifacts as "archive" to coordinator... too large archive id=1400570193 responseStatus=413 Request Entity Too Large status=413 token=8RyUGyNS
FATAL: too large
Cleaning up file based variables 00:01
ERROR: Job failed: exit code 1
Kat stage paling last dah tu iaitu part nak upload artifacts. Aku pun check la size folder tu sebab nak tahu berapa besor (tadi masa issue kat Github, aku tak check pun) dan fail apa yang besor sangat tu:
$ du -sh public/
6.8G public/
$ find . -printf '%s %p\n'| sort -nr | head -10 | grep -v ".git"
3731456 ./public/nenez9595.blogspot.com
1037750 ./public/4.bp.blogspot.com/-eSv8FTdg_sE/WHczxmty8vI/AAAAAAAABEc/XpiSmC2Bw3AAqVIJBrplkASnPGepF8uWACLcB/s1600/Screenshot_2017-01-12-15-41-58.png
710617 ./public/1.bp.blogspot.com/-otXaWMUt6HA/UVacLqRsuaI/AAAAAAAAAJg/SQlf2OXknzY/s1600/Photo 0321.jpg
677596 ./public/4.bp.blogspot.com/-HdD4RIYIGIo/UV6DwKGTXKI/AAAAAAAAAMI/yQKh3iHdTJI/s1600/02042011155.jpg
636023 ./public/4.bp.blogspot.com/-zzD7B3BKwyk/UV6D6hwUhfI/AAAAAAAAAMY/J1QSfwnCiAg/s1600/02042011154.jpg
636023 ./public/4.bp.blogspot.com/-E-2E-Jd7KTE/UV6DzqG5taI/AAAAAAAAAMQ/Jh72UHEVGCQ/s1600/02042011154.jpg
Saiz assets tu memang besar gedabak juga, pastu bila aku sort out fail mana yang besor..hasilnya takde pun fail yang besar. Cuma aku perasan banyak fail-fail search<random chars here>.html
, aku tengok isi dia takde apa yang penting pun so, okey je nak padam semua fail junky ni
$ ls -la | grep -v 'search*.html' | wc
88467 796196 5829860
$ find . -name 'search*.html' -type f -delete
Aku pun commit changes dan tunggu pipeline CI/CD
run. Alhamdulillah, aku berhasil
Salinan mirror tu boleh akses di nenez9595.gitlab.io/nenez9595.blogspot.com/index.html tapi pages tu tak render properly dan ada javascript dan css tak berfungsi dengan baik..hmmm ๐ฅฒ
Hish tak boleh jadi ni. Takkan nak sampai sini je? So aku plan nak improve skrip scrapper yang aku mentioned pada bahagian ke-2 yang yang lepas. UI pun aku kena ubah , aku akan buat nampak profesional dan cleaner. Aku boleh list papers, buku dan apa-apa yang sesuai juga yang arwah pernah publish atau pre-print ๐ค
Nanti free-free aku sambung balik, sekarang aku busy sikit dengan projek Park N Shop Hongkong, maybe hujung tahun (2021) sampai tahun (2022) depan aku takde kat Malaysia sebab kena fly pergi sana. Musim-musim pandemik ni tak tahu la macamna. Eh sembang pasal diri aku pula.. haha.. ok la bye! Tunggu bahagian ke-4 ya ๐คค
Posted by: Hugo