Euler
Joined: 02 Sep 2004
Posts: 109
|
| PHP spider detection |
|
|
Maybe this belongs in the programming section. I don't know. I assume discriminating between spiders and humans is important to SEO-aware webmasters.
What's the fastest way to check $_SERVER['HTTP_USER_AGENT'] to detect humans? More generally, what is the current best practice for managing this?
|
Thu Feb 24, 2005 7:11 pm
|
|
|
Euler
Joined: 02 Sep 2004
Posts: 109
|
|
|
|
Naw.
I simply wish to log human or spider while I'm logging browser type.
And I just finished a solution:
From my log file (not the apache log), notice the line beginning with Human or Robot.
quote: [pre]
/==B=e=g=i=n====E=x=e=c=u=t=i=o=n====ThuFeb24_15:23:25CST2005===\
Robot (Yahoo! Slurp): 68.142.249.107 GET /
8 queries built in 0.0711851119995 seconds.
Decl: 0.00260710716248 seconds.
DB : 0.0118429660797 seconds.
Auth: 0.0150530338287 seconds.
Acls: 0.0275280475616 seconds.
Cont: 0.0114028453827 seconds.
Fini: 0.00275111198425 seconds.
\===E=n=d===T=r=a=n=s=a=c=t=i=o=n===ThuFeb24_15:23:26CST2005===/
/==B=e=g=i=n====E=x=e=c=u=t=i=o=n====ThuFeb24_15:23:260CST2005===\
Human (FIREFOX ver.1.0): 192.168.XXX.XXX GET do=sect&op=see&sct=2
8 queries built in 0.0713069438934 seconds.
Decl: 0.0024950504303 seconds.
DB : 0.0117490291595 seconds.
Auth: 0.0151648521423 seconds.
Acls: 0.0275340080261 seconds.
Cont: 0.0115809440613 seconds.
Fini: 0.00278306007385 seconds.
\===E=n=d===T=r=a=n=s=a=c=t=i=o=n===ThuFeb24_15:23:26CST2005===/
[/pre]
I found many solutions on the net, all seemed to be optimized for programmer convenience. I needed it to be optimized for speed. So here is my current code for the detection.
quote: [pre]
$browser = array (
"MSIE", // parent
"OPERA",
"MOZILLA", // parent
"NETSCAPE",
"FIREFOX",
"SAFARI"
);
$whowhat = "Robot (". $_SERVER['HTTP_USER_AGENT'] ."): ". $_SERVER['REMOTE_ADDR'] ." ". $_SERVER['REQUEST_METHOD'] ." ". $_SERVER['QUERY_STRING'] ."\n";
foreach ($browser as $parent) {
$s = strpos(strtoupper($_SERVER['HTTP_USER_AGENT']), $parent);
$f = $s + strlen($parent);
$version = substr($_SERVER['HTTP_USER_AGENT'], $f, 5);
$version = preg_replace('/[^0-9,.]/','',$version);
if (strpos(strtoupper($_SERVER['HTTP_USER_AGENT']), $parent)) {
$whowhat = "Human (". $parent ." ver.". $version ."): ". $_SERVER['REMOTE_ADDR'] ." ". $_SERVER['REQUEST_METHOD'] ." ". $_SERVER['QUERY_STRING'] ."\n";
}
}
fputs($log, $whowhat);
[/pre]
I modified it from code I found at php.net from a gentleman by the name of Steve. So thanks, Steve, wherever you are.
The result of all of this is that any time I wish to see the humans, I can "grep Human logfile" or for Bots, of course "grep Robot logfile". Extremely convenient, efficient and stable.
|
Thu Feb 24, 2005 9:33 pm
|
|
|

|
|
All times are GMT. The time now is Tue Feb 07, 2012 9:23 am
|
|
|
|
| |