If curl has a default capath (debian 12 capath=/etc/ssl/certs)
it will add those certs and return ok to any valid https url,
defeating the intended use of the cacert option in the Makefile
that validates sites and certs.
To avoid that, adding option "--capath /dev/null" overrides
the default value, if any.
Currently AI bots are crawling website all around the world. For a
website hosting git content this adds a lot of extra load and traffic:
The site has lots of sections, repositories have a lot of files,
branches, tags, commit ids, etc...
Multiply that and you have a nearly unlimited number of unique urls. The
bots try to get each and every of these.
To speed up the learing process on their side a swarm of hundreds,
thousands or more ip addresses is active at the same time, ultimately
DDOS'ing the websites, making it inaccessible. 😳🤬
Well, there is one single file all of these AI bots are not interested
in: robots.txt 🤬🤬
On top some use random user agent strings, making filtering impossible.
🤬🤬🤬
For a short term sulution I deploy the repository content as static
files, hopefully making these accessible at least. We will see.
This reverts commit 8231c3e833.
Truned out this workaround was not sufficient, see the follow-up in
commit 191cc1b952 for details.
But possibly the second one does it on its own? Reverting this for
a test run.
Turns out the workaround in $WaitForFile (commit
8231c3e833) is not sufficient. It helps
sometimes, but not always. Possibly depends on CPU speed and bandwidth
of internet connection... Who knows!? 🤪
But! Reading the file goes beyond the known file size. That's suspicious
and indicates this exact issue. So add a delay, and keep reading until
sizes are equal.
This used to require a key=value store, separated with commas. An
example for `netwatch-notify` is:
/tool/netwatch/add comment="notify, name=example.com" host=93.184.215.14;
Now JSON is supported as well, so you could use:
/tool/netwatch/add comment="{\"notify\":true,\"name\":\"example.com\"}" host=93.184.215.14;
Looks more clumsy here, but may be of help in more complex setups...
Well, turns out that waiting for existence of a file is not sufficient.
Chances are that a file is available just partly, so wait until the size
no longer changes... Let's hope that works as expected. 🤞