[Az-Geocaching] Suggestion for statistics

Team Cache-Quest listserv@azgeocaching.com
Fri, 5 Sep 2003 12:25:13 -0700


Most "crawlers" will obey a site's ROBOTS.TXT file.

GEOCACHING.COM's ROBOTS.TXT disallows almost all crawling.  Here is a
copy...

User-agent: *
# Disallow all unnecessary content from the search
Disallow: /iis/*
Disallow: /login/*
Disallow: /admin/*
Disallow: /map/*
Disallow: /email/*
Disallow: /my/*
Disallow: /seek/nearest.asp*
Disallow: /seek/nearest_cache.asp*
Disallow: /seek/waypoint.asp*
Disallow: /bait.asp

Because of what Snaptek needs for azgeocaching.com, they have to ignore the
robots.txt file.  Unfortunately most web sites are smart enough now days to
recognize unauthorized crawling and will automatically prevent it.  Some
consider it a form of hacking because of what it does to the server.

You can't really blame geocaching.com.  They probably have many folks trying
to crawl the site and because of international interest there really isn't a
good time to do it.  What they really need is a decent interface to allow
folks to get bulk information.  The query generator was a step in the right
direction, but it falls short.  I'd like to simply get a list of all the
caches I've found in XML format to do with what I want, but the Query
Generator can't even do that.

Jerry (Cache-Quest)

----- Original Message ----- 
From: "Regan L Smith" <buggers@mindspring.com>
To: <listserv@azgeocaching.com>
Sent: Friday, September 05, 2003 11:38 AM
Subject: Re: [Az-Geocaching] Suggestion for statistics


> There has to be other "crawlers" out there how are they dealing with it
and
> is there anything that we the beneficiaries of your fantastic work do???
> ----- Original Message ----- 
> From: "Regan L Smith" <buggers@mindspring.com>
> To: <listserv@azgeocaching.com>
> Sent: Friday, September 05, 2003 10:42 AM
> Subject: Re: [Az-Geocaching] Suggestion for statistics
>
>
> > do they consider AZGeocaching a threat? I don't understand you guys make
> > their stuff much easier to use, and understand.