GPT-4、ChatGPT 等強大模型能即時智能回答，AI 爬蟲卻在肆意抓取威脅原創內容價值

小藍

2024-8-31

行業資訊

新用戶專享：「香港/美國云服務器」新購6折低至9元/月！點擊查看活動介紹>>>

好嘛,AI爬蟲就像隱形殺手，悄悄鉆進我們的網站偷走東西。日以繼夜、疲勞算什么，就為了給大牛的AI模型做培訓資料。你會樂意這樣？我說不會吧!今天咱們來聊聊，作為站長,咋樣保護自己的原創內容，別讓它們變成AI的盤中餐。

AI爬蟲的威脅

首先，咱們得知道，AI爬蟲真叫人頭疼。它會讓我們辛辛苦苦做出來的東西變得一文不值，甚至影響到我們的收入。想想看，要是人家想了解什么都直接找AI就能搞定，那誰還愿意來你這兒？這不就是在糟蹋我們的心血和原創精神嗎？

何況，這幫AI爬蟲的行蹤還不太明朗。有的公司會大方承認自己的爬蟲，但還有些公司卻悶聲不響，像賊似的悄悄收集我們的信息。這種行為就像藏在黑暗里的鬼祟之手，讓人大吃一驚！

保護措施之一：robots.txt

咋整？那咱們就來對付這些不打招呼就上門的'朋友們'！最常用的方法之一就是讓robots.txt出馬。這個小文件能告訴爬蟲啥東西能抓，啥不能碰。只要設定好規矩，就能把那些討厭的爬蟲擋在門外了。

但光靠robots.txt是遠遠不夠的，有些爬蟲就是不怕你的規矩，照樣偷你家東西。所以咱們得用點兒狠招兒，比如說Cloudflare的自動WAF規則，這樣才能讓咱們的防護更給力！

GPT-4、ChatGPT 等強大模型能即時智能回答，AI 爬蟲卻在肆意抓取威脅原創內容價值插圖

CloudFlare的自動化WAF規則

用上Cloudflare的自動WAF規則，網站安全性猛增！有了這些規則，黑客們的爬蟲都無處可藏！就是這么簡單，像給網站加了個圍墻，不讓壞心眼兒的爬蟲隨便進來。

還有，用Cloudflare那個自動化WAF規則的超贊之處在于，它能夠科技升級，時刻準備著應對那些翻天覆地的爬蟲行為！所以不管是什么時候，我們都不用再緊張兮兮的監視自己的網站，就不怕突然冒出啥新爬蟲了。

AI爬蟲的現狀與未來

User-agent: Baiduspider
Allow: / 
User-agent: Mediapartners-Google
Allow: /
User-agent: Google-Display-Ads-Bot
Allow: /
User-agent: Googlebot
Allow: /
User-agent: Googlebot-Mobile
Allow: /
User-agent: Googlebot-Image
Allow: /
User-agent: Adsbot-Google
Allow: /
User-agent: Sogou
Allow: /
User-agent: DotBot
Disallow: /
User-agent: DataForSeoBot
Disallow: /
User-agent: SemrushBot
Disallow: /
User-agent: MJ12bot
Disallow: /
User-agent: AhrefsBot
Disallow: /
User-agent: Feedly
Disallow: /
User-agent: YandexBot
Disallow: /
User-agent: ias-ir
Disallow: /
User-agent: adsbot
Disallow: /
User-agent: barkrowler
Disallow: /
User-agent: Mail.RU_Bot
Disallow: /
User-agent: SEOkicks
Disallow: /
User-agent: ias-va
Disallow: /
User-agent: proximic
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: grapeshot
Disallow: /
User-agent: BLEXBot
Disallow: /
#禁止 AI 爬蟲
User-agent: Bytespider
Disallow: /
User-agent: Amazonbot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: ImagesiftBot
Disallow: /
User-agent: GoogleOther
Disallow: /
User-agent: Applebot
Disallow: /
User-agent: GPTBot
Disallow: /
User-agent: DataForSeoBot
Disallow: /
User-agent: peer39 crawler
Disallow: /
User-agent: FriendlyCrawler
Disallow: /
User-agent: magpie-crawler
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: omgili
Disallow: /
User-agent: Meltwater
Disallow: /
User-agent: AwarioSmartBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
User-agent: anthropic-ai
Disallow: /
User-agent: img2dataset
Disallow: /
User-agent: YouBot
Disallow: /
User-agent: PipiBot
Disallow: /
User-agent: Seekr
Disallow: /
User-agent: scoop.it
Disallow: /
User-agent: AwarioRssBot
Disallow: /
User-agent: Diffbot
Disallow: /
User-agent: Claude-Web
Disallow: /
User-agent: FacebookBot
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: *
Allow: /robots.txt
Allow: /ads.txt
Allow: /*.ico$
Aloow: /*.webp$
Allow: /*.png$
Allow: /*.jpg$
Allow: /*.jpeg$
Allow: /*.gif$
Allow: /*.bmp$
Allow: /wp-admin/admin-ajax.php
Allow: /timthumb/
Allow: /wp-content/uploads/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /cdn-cgi/
Disallow: /*?replytocom=*
Disallow: /?s=*
Disallow: /redirect*
Sitemap: https://www.imydl.com/wp-sitemap.xml

盡管已經有些應對方法了，但是AI爬蟲的問題可不會就這么過去了。因為科技在進步，它們也會變得越發精明，防不勝防。所以，我們得時刻警覺著，升級保護自己的招數。

GPT-4、ChatGPT 等強大模型能即時智能回答，AI 爬蟲卻在肆意抓取威脅原創內容價值插圖1