Jonkman Microblog
  • Login
Show Navigation
  • Public

    • Public
    • Network
    • Groups
    • Popular
    • People

Conversation

Notices

  1. Hubert Chathi (hubert@social.uhoreg.ca)'s status on Tuesday, 16-Sep-2025 00:38:54 EDT Hubert Chathi Hubert Chathi

    I've caught some AI crawlers aggressively crawling some of my sites, disregarding the robots.txt. Some of the sites are of little or no interest to any real person. So I've deployed iocaine (iocaine.madhouse-project.org/) on them, in a "always spew nonsense" mode, rather than the suggested "generate nonsense if it looks like a bot" mode. But I am not unfair. I've included a robots.txt there so that any bot that respects it will be spared from ingesting it.

    Unfortunately, I'm abandoning my self-hosted git repositories, but I didn't have much on there of interest any more. Most of what was there was old, and was also in my GitLab account.

    In conversation about 2 days ago from social.uhoreg.ca permalink
    1. Hubert Chathi (hubert@social.uhoreg.ca)'s status on Tuesday, 16-Sep-2025 18:07:01 EDT Hubert Chathi Hubert Chathi
      in reply to

      As fun as it is to taunt the bot with random garbage, the idiots trying to crawl my git repo have pulled 2GB in less than 4 hours. Which should fit within my monthly limit for my VPS, but is just a stupid amount of data. So for now, I've set the hostname for my git repo to point to 127.0.0.1 instead. With any luck, the bot will try to DoS itself. But at the very least it will stop bothering me for a while. I might turn it back to iocaine at some point. I've left iocaine on the other sites.

      By the way, from the access logs, it hadn't even started crawling the garbage links that iocaine generated. It was only crawling links that it had before.

      In other news, you'd think that Google would understand about respecting robots.txt. Googlebot seems to do so, but GoogleOther does not.

      In conversation about a day ago from social.uhoreg.ca permalink
  • Help
  • About
  • FAQ
  • TOS
  • Privacy
  • Source
  • Version
  • Contact

Jonkman Microblog is a social network, courtesy of SOBAC Microcomputer Services. It runs on GNU social, version 1.2.0-beta5, available under the GNU Affero General Public License.

Creative Commons Attribution 3.0 All Jonkman Microblog content and data are available under the Creative Commons Attribution 3.0 license.

Switch to desktop site layout.