自己用PHP来分析Apache访问日志

类别:编程语言 点击:0 评论:0 推荐:

根据实际工作需要,想从访问日志里找出自己想要的东西,如找不到的文件,从google来的还是从yahoo来的或从别的地方来的,还是搜索引擎的蜘蛛访问。原理很简单就是打开文件,过滤不要的记录,分解记录字段,列表所需结果。几乎凭一个PHP的函数preg_match()搞定。下面是源代码,自己研究吧 !
<html>
<head>
<title>
Simple tools for website logs
</title>
</head>
<body>
<form name="my_form" method="post">
  Select your type :<br>
 <select name="type">
  <option value="">Get the null links</option>
  <option value="yahoo">Acess from yahoo</option>
  <option value="google">Access from google</option>
  <option value="msn">Access from Msn</option>
  <option value="robot">Access by robots</option>
 </select>
 &nbsp;
 <input type="submit" name="submit" value="get the result">
</form>
<table border=1>
  <tr bgcolor="#FFCCFF">
    <td><font color="#000000">ClientIP</font></td>
    <td><font color="#000000">AccessTime</font></td>
    <td><font color="#000000">TargetPage</font></td>
    <td><font color="#000000">Code</font></td>
    <td><font color="#000000">FromURL</font></td>
    <td><font color="#000000">Client ENV</font></td>
  </tr>
<?PHP
$doc_path= $_SERVER["DOCUMENT_ROOT"];
if(substr($doc_path,-1)!="/"){
 $doc_path=$doc_path."/";
}

if($type=='yahoo'){
        $lines = file ($doc_path.'logs/access_log');
        foreach ($lines as $line_num => $line) {
                if (preg_match ("/yahoo/i",strtolower($line))) {
                        if (!preg_match ("/slurp/",strtolower($line))){
                                preg_match("/([0-9.]+)?([ -]+)?(\[)?([0-9a-zA-Z+: \/]+)?(\])?( \"GET \/)?([a-z0-9A-Z.\/\?&=%_\-:+]+)?( HTTP\/1.[1|0|2]\" )?([0-9.]+)?( )?([0-9.\-]+)?( \")?([a-z0-9A-Z.\/\?&=%_\-:+]+)?(\" \")?(.*)/i",$line, $matches);
        echo "<tr><td>".$matches[1]."</td><td>".$matches[4]."</td><td>".$matches[7]."</td><td>".$matches[9]."</td><td>".$matches[13]."</td><td>".$matches[15]."</td><tr>";
                        }
                }

        }
}elseif($type=="robot"){
        $lines = file ($doc_path.'logs/access_log');
        foreach ($lines as $line_num => $line) {
                 if (!preg_match("/robots.txt/i",$line)){      
      if (preg_match ("/(slurp)|(msnbot)|(googlebot)|(psbot)/i",strtolower($line))){
                                preg_match("/([0-9.]+)?([ -]+)?(\[)?([0-9a-zA-Z+: \/]+)?(\])?( \"GET \/)?([a-z0-9A-Z.\/\?&=%_\-:+]+)?( HTTP\/1.[1|0|2]\" )?([0-9.]+)?( )?([0-9.\-]+)?( \")?([a-z0-9A-Z.\/\?&=%_\-:+]+)?(\" \")?(.*)/i",$line, $matches);
        echo "<tr><td>".$matches[1]."</td><td>".$matches[4]."</td><td>".$matches[7]."</td><td>".$matches[9]."</td><td>".$matches[13]."</td><td>".$matches[15]."</td><tr>";
                        }
      } 
        }
}elseif($type!=""){
        $lines = file ($doc_path.'logs/access_log');
        foreach ($lines as $line_num => $line) {
                if (preg_match ("/$type/i",strtolower($line))) {
                        if (!preg_match ("/".$type."bot/",strtolower($line))){
                            preg_match("/([0-9.]+)?([ -]+)?(\[)?([0-9a-zA-Z+: \/]+)?(\])?( \"GET \/)?([a-z0-9A-Z.\/\?&=%_\-:+]+)?( HTTP\/1.[1|0|2]\" )?([0-9.]+)?( )?([0-9.\-]+)?( \")?([a-z0-9A-Z.\/\?&=%_\-:+]+)?(\" \")?(.*)/i",$line, $matches);
       echo "<tr><td>".$matches[1]."</td><td>".$matches[4]."</td><td>".$matches[7]."</td><td>".$matches[9]."</td><td>".$matches[13]."</td><td>".$matches[15]."</td><tr>";
                        }
                }

        }

}else{
 $lines = file ($doc_path.'logs/access_log');
 foreach ($lines as $line_num => $line) {
  if (preg_match ("/ 404 /i",$line)) {
   if (!preg_match ("/robots.txt/",$line)){
                            preg_match("/([0-9.]+)?([ -]+)?(\[)?([0-9a-zA-Z+: \/]+)?(\])?( \"GET \/)?([a-z0-9A-Z.\/\?&=%_\-:+]+)?( HTTP\/1.[1|0|2]\" )?([0-9.]+)?( )?([0-9.\-]+)?( \")?([a-z0-9A-Z.\/\?&=%_\-:+]+)?(\" \")?(.*)/i",$line, $matches);
       echo "<tr><td>".$matches[1]."</td><td>".$matches[4]."</td><td>".$matches[7]."</td><td>".$matches[9]."</td><td>".$matches[13]."</td><td>".$matches[15]."</td><tr>";
   }
  }
 
 }
}
?>
</table>
</body>
</html>

本文地址:http://com.8s8s.com/it/it26545.htm