View Full Version : problem with SPIDERS...
magnatique
12-02-2000, 09:04 PM
FUCK... this pisses me off...
yesterday, I was doing well on my site, arround 800-850 /hour...
was working on some trades, then BAM, 300 clicks to my trades, from the no cookie...
so traffic went down to 550 (since all my trades were set as PAID (all unproductive clicks from spider)
now, today, I worked my ass off on the traffic... went from 600 to 1250/hour... some kick ass 250/hour trades...
then BAM, this fucker hits me again, sending 400 to my trades....
was forcing 100/hour to a trade so it would get back on his feet, so it paid it mostly to it, making that trade go from 85 to 33 an hour...
now down to 900...
I hate this fuckin' thing..
is there any way I can put something on my index so that the spiders wouldn't actually hit the links on the page?!?!?
any idea people?
magnatique
12-03-2000, 11:48 AM
thanks e-van!
so far, so good :-)
Dan S
12-03-2000, 12:20 PM
That spiders are hitting my TGP regularly.
Any ideas to stop it?
Dan
laursen
12-03-2000, 12:59 PM
Either block the IP (by using .htaccess files) or use a robots.txt file?
Further information:
http://www.apache.org/docs-1.2/misc/howto.html#stoprob
magnatique
12-03-2000, 01:44 PM
http://www.chami.com/tips/internet/010198I.html
but not sure if it works..
so far so good it seems.
Dan S
12-04-2000, 01:53 AM
Hmm, not sure.....
I do not want to block SE`s spidering my sites.
I think that the problem is more likely it is a kind of "offline reader".
So somebody feeds the url of my TGP to his software, the software is following every link and is collecting all the JPG files it can find. So its just wasted bandwith, and the user is just watching the pics offline.
Most times its coming at the same time of the day.
That is actually what I want to prevent from happening.
Dan
SID MAN
12-04-2000, 02:55 AM
Dan S. - If it is some kind of off-line reader, this might help out.
I don't hope I will get in any kind of copyright violations but here it goes ;-)
===========================================
PRO-LEVEL HELLA BANDWIDTH THIEF BLOCKING by Joel
Joel wrote me with some great info on apache htaccess with a terrific list of know
offline webbrowsers that chew up bandwidth like there is no tomorrow. Joel writes:
I've been reading your newsletter as long as I can remember, and
you've even posted a few of my emails in the past, now I'm asking for some
help from you and/or the readers to put together a air tight .htaccess
file using mod_rewrite to block people downloading images or hotlinking
without actually going to the site. As we all know, someone who uses an
offline browser like WebZIP or Internet Ninja is only out for the images,
they'll never see our advertisements, they'll never join our paysites,
they simply rape our sites of everything we have to offer, and move on.
These programs are nothing new, but they're becoming a big concern for me.
After analizing my logs for my free sites for the past ~1 month, I've seen
over 1.4 million requests made by these offline browsers alone. What I'm
attempting to do is to grab the user_agent tags from each of these offline
browsers and essentially bounce them away from my images using .htaccess
with mod_rewrite.
Second part that I'm trying to protect myself against, which is a
growing concern for me, is all of these chat boards that hotlink your
images without even a shred of concern that they're eating up your
bandwidth without even providing you with a link. (But hey, I don't like
paying for someone else to get a free ride). I'm going to attach my
current .htaccess file, with hopes that the Sexswappers will append any
sites they have discovered in their logs so I may come up with a complete
listing to protect us from "Joyriders".
For anyone who isn't familiar with mod_rewrite and Apache
webservers, you may want to check out their documentation so you'll
understand it a little better before you dip into this:
Module Rewrite URL Rewriting Engine http://www.apache.org/docs/mod/mod_rewrite.html
Apache 1.3+ URL Rewriting Guide (Basic and complex examples) http://www.apache.org/docs/misc/rewriteguide.html
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^DISCo\Pump.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^Drip.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^Gets.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^IBrowse.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^InterGET.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\Ninja.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^JustView.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\tool.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\PiX.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\Explorer.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\Foto.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^Pockey.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^Slurp.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^SpaceBison.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\Image\Collector.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\Sucker.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^Webster.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^lftp.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut.*
RewriteRule .[Jj][Pp][Gg]*$ /leeches.html [L]
RewriteCond %{HTTP_REFERER} ^http://.*adfilter.com.*$ [OR]
RewriteCond %{HTTP_REFERER} ^http://207.198.147.*$ [OR]
RewriteCond %{HTTP_REFERER} ^http://www.neocities.* [OR]
RewriteCond %{HTTP_REFERER} ^http://chat.passagen.s.* [OR]
RewriteCond %{HTTP_REFERER} ^http://volpi.sti.com.* [OR]
RewriteCond %{HTTP_REFERER} ^http://batepapo0.uol.com.* [OR]
RewriteCond %{HTTP_REFERER} ^http://batepapo1.uol.com.* [OR]
RewriteCond %{HTTP_REFERER} ^http://batepapo2.uol.com.* [OR]
RewriteCond %{HTTP_REFERER} ^http://batepapo3.uol.com.* [OR]
RewriteCond %{HTTP_REFERER} ^http://batepapo4.uol.com.* [OR]
RewriteCond %{HTTP_REFERER} ^http://batepapo6.uol.com.* [OR]
RewriteCond %{HTTP_REFERER} ^http://batepapo7.uol.com.* [OR]
RewriteCond %{HTTP_REFERER} ^http://209.2.137.* [OR]
RewriteCond %{HTTP_REFERER} ^http://207.126.121.*$ [OR]
RewriteCond %{HTTP_REFERER} ^http://.*bianca.com/.*$ [OR]
RewriteCond %{HTTP_REFERER} ^http://.*angelfire.com/.*$ [OR]
RewriteCond %{HTTP_REFERER} ^http://.*hotmail.*$ [OR]
RewriteCond %{HTTP_REFERER} ^http://.*yahoo.com.*$ [OR]
RewriteCond %{HTTP_REFERER} ^http://.*geocities.com.*$ [OR]
RewriteCond %{HTTP_REFERER} ^http://www.partyhouse.*$
RewriteRule .[Jj][Pp][Gg]*$ /leeches.html [L]
.HTACCESS TO ALLOW/DENY from certain domains (for AVS)
RewriteEngine on
RewriteCond %{HTTP_REFERER} !^http://www.<CURRENT_DOMAIN> [NC]
RewriteCond %{HTTP_REFERER} !^http://<CURRENT_DOMAIN> [NC]
RewriteCond %{HTTP_REFERER} !^http://<IP OF CURRENT_DOMAIN> [NC]
RewriteCond %{HTTP_REFERER} !^http://www.<GOOD_REFERER> [NC]
RewriteCond %{HTTP_REFERER} !^http://<GOOD_REFERER> [NC]
RewriteCond %{HTTP_REFERER} !^http://<IP OF GOOD_REFERER> [NC]
RewriteRule /* http://www.<CURRENT_DOMAIN>
NOTES:
<CURRENT_DOMAIN>:
This is the domain name of the site you wish to protect. (xxxwebhosting.com)
<IP OF CURRENT_DOMAIN>:
This is the IP address of the site you wish to protect. (63.168.246.202)
<GOOD_REFERER>:
This is the domain name of a site you wish to allow to refer people to this
directory from (reliablehosting.com).
<IP OF GOOD_REFERER>:
This is the IP address of the site you wish to allow to refer people to this
directory from (198.172.12.96)
- Save this file as .htaccess in the directory you wish to protect.
- If you save it as /images/.htaccess , it will protect everything under images.
Meaning, there is no reason to save another one in /images/images2/.
Dan S
12-04-2000, 11:29 AM
Yup!!!!
That will do a fucking great job http://adultwebmasterinfo.com/ubb/biggrin.gif
I had that code up only for Go!Zilla.
Thank you man!!
vBulletin® v3.7.3, Copyright ©2000-2012, Jelsoft Enterprises Ltd.