Wireless Group Policy WLAN Radius Part 3 WLAN Radius Config WLAN Client Config Linksys BEFSR41 Firmware Linksys BEFSR41 101 Linksys WAP54G config More Linksys BEFSR config Security how-to Writing DNS Zonefiles Is ZoneAlarm Spyware? Linksys Wishlist Sendmail spamblock Bind Privacy Feedback

Analyzing your IIS weblogs.

Perl scripts is a fast and easy way to analyze any type of text file. It's easy to read number of files, split the lines read, and collect all the interesting information. The following are bits and pieces out of a perl script that I'm working on (meaning: it's beta!). It makes a few assumptions on the layout of the IIS log file. The fields in my log files are: date, time, client-ip, method, URI-stem, status, sc-bytes, cs-bytes, time-taken, cs(user-agent), cs(Cookie), cs(referer). It's important to know which is where, as each line is split up, and the data collected based on it's location in the line. If you have more (or less) items on each line, you may'll have to adjust for that in the script. 

If you don't feel like typing, you can download the whole script.

Note that Windows doesn't come with perl, you'll have to download it from ActiveState.com, and install the version applicable to your system. Note that you need to register to download from ActiveState...

First, we need to set up some variables and define the log directory. Note the double-backslashes. This is necessary, as a single backslash is interpreted as an escape character.

  $logdir="\\winnt\system32\\logfiles\w3svc1";
  $TotalBytes = 0;
  @DailyBytes = ();
  @HourlyBytes = ();
  $DayCounter=0;
  $ViewCounter=0;
  $HitCounter=0;
  $OKHitCounter=0;
  $StartTime = localtime;

Next, we'll collect all the file names in the log directory

  opendir(LOGS, $logdir) || die ("unable to open log directory");
  @logfiles = grep !/^\.\.?$/, readdir(LOGS);
  closedir(LOGS)

Now, that was quick! Now we have a list of each and every file in the directory, and we can loop through the whole list to collect info from each and every one. Next, we loop through the files, and start collecting information. 

  foreach $file (sort @logfiles) {
    # we know the file exists, so, let's open and read it!
    $filename = $logdir . "/" . $file;
    $DayCounter++;
    if (open(LOG, $filename)) {
      # File is open, start reading it!
      $line = <LOG>;
      while ($line) {
        # only grab the lines with dates and times.
        if ($line =~ m/^....-..-.. ..:..:../){
          # if it's a line, it's a hit!
          $HitCounter++;
          # Treat the line as a space-delimited line, and split it up!
          @entry = split (/ /, $line);
          #Extract time into seperate values
          @Time = split (/:/, $entry[1]);

Now we have all the info from one line of the log file split into a list called "$entry". $entry[0] is the date, $entry[1] is the time, and so forth (see logfile format above). Now we're ready to collect information from the log file, which is done based on the status field. With all the Code Red and Nimda probes still going on, let's deal with 404 error messages first.

          if($entry[5] =="404") {
            $e404Page{$entry[4]}++;
            $e404Host{$entry[2]}++;
          }
        }
      # read the next line of the log file
      $line = <LOG>;
    }
  }
}    

That's it! We've now read every line of every log file, and we've collected info on what pages are generating 404 errors, and how many times, and also who are attempting to load these pages and how many times. To get a simple output of this, add the following to the bottom of what we already have.

printf ("\n Error reports\n");

if (keys %e404Page) {
  printf ("\n Top 50 pages not found\n");
  printf (" Page                                          Times\n");
  $count = 0;
  foreach $ThisOne (sort {$e404Page{$b} <=> $e404Page{$a}} 
keys %e404Page) {
    if ($count < 50) { 
      $count++;
      printf (" %-70s %9d\n", $ThisOne, $e404Page{$ThisOne});
    }
  }
} 

if (keys %e404Host) {
  printf ("\n Top 50 hosts generating 404 errors\n");
  printf (" Host                                          Times\n");
  $count = 0;
  foreach $ThisOne (sort {$e404Host{$b} <=> $e404Host{$a}} 
keys %e404Host) {
    if ($count < 50) {
      $count++;
      printf (" %-70s %9d\n", $ThisOne, $e404Host{$ThisOne});
    }
  }
}

You may have to add a few spaces in the headers, I had to trim them to make them fit nicely on this page... Note that the output might be a little odd, as there might be escape characters in the URI stems ... after all, many of these are attempts at generating buffer-overflows in IIS.

Oh, joy, now how do you find out how many hits you have, and what pagers are being checked out? Well, we'll need to add another two sections for that; one to collect all the info for status message 200, and one to generate the output. Let's back-track to the 404 data collection, and go from there.

          if($entry[5] =="404") {
            $e404Page{$entry[4]}++;
            $e404Host{$entry[2]}++;
          }
          elsif($entry[5] == "200") {
            # Regular OK message, collect all sorts of info!
            # count up total number of bytes for this run
            $TotalBytes += $entry[6];
            $DailyBytes{$entry[0]} += $entry[6];
            $HourlyBytes{$Time[0]} += $entry[6];
            # Count the page
            $PageCount{$entry[4]}++;
            # Increase HitCounter, if it's a regular page
            if ($entry[4] =~ m/html|HTML|htm|HTML/) {
              $ViewCounter++;
            }
            # Tally up the successful hit as well
            $OKHitCounter++;
            #Collect the FQDN or IP address of the visitor, and count it
            $Visitor{$entry[2]}++;
            # Grab the domain name (ie. aol.com, attbi.com)
            @FQDN = split (/\./, $entry[2]);
            $lenght = @FQDN;
            $temp = $FQDN[$lenght-2] . "." . $FQDN[$lenght-1];
            $Domain{$temp}++;
            # Next, collect the referer info
            $temp = $entry[11];
            chop($temp);
            $Referer{$temp}++;
         }

With this addition, we can generate a report that contains a lot more info. Most of the items in the code below should be self-explanatory. 

$EndTime = localtime;
print "\n Complete Report\n";
print " ===============\n";
print " Report started : $StartTime\n";
print " Report ended : $EndTime\n";
printf (" Hits : %12d\n", $HitCounter);
printf (" Hits (successful) : %12d\n", $OKHitCounter);
printf (" Hits daily average : %12d\n", $HitCounter/$DayCounter);
printf (" Page views total : %12d\n", $ViewCounter);
printf (" Page views daily average : %12d\n", $ViewCounter/$DayCounter);
printf (" Total Bytes: : %12d\n", $TotalBytes);
printf (" Bytes daily avergage : %12d\n", $TotalBytes/$DayCounter);
if (keys %DailyBytes) {
  printf ("\n  Bytes transferred by Date.\n");
  printf ("==========================================\n");
  printf ("    Date                             Bytes\n");
  foreach $ThisOne (sort(keys %DailyBytes))   {
    printf ("  %-30s %9d\n", $ThisOne, $DailyBytes{$ThisOne});
  }
  printf ("==========================================\n");
}

if (keys %HourlyBytes) {
  printf ("\n  Bytes transferred by Hour.\n");
  printf ("==========================================\n");
  printf (" Hour                                Bytes\n");
  foreach $ThisOne (sort(keys %HourlyBytes))   {
    printf ("  %-30s %9d\n", $ThisOne, $HourlyBytes{$ThisOne});
  }
  printf ("==========================================\n");
}
if (keys %PageCount) {
  printf ("\nMost popular pages\n");
  printf ("========================================================\n");
  printf (" Page                                               Hits\n");
  $count = 0;
  foreach $ThisOne (sort {$PageCount{$b} <=> $PageCount{$a} } 
keys %PageCount) {
    if ($count < 50)     {
      if ($ThisOne =~ m/html$/)       {
        $count++;
        printf ("  %-60s %9d\n", $ThisOne, $PageCount{$ThisOne});
      }
    }
  }
  printf ("========================================================\n");
}
if (keys %Visitor) {
  printf ("\n  Top 50 Visiting hosts\n");
  printf ("========================================================\n");
  printf (" Host                                              Times\n");
  $count = 0;
  foreach $ThisOne (sort {$Visitor{$b} <=> $Visitor{$a} } 
keys %Visitor) {
    if ($count < 50) { 
      $count++;
      printf ("  %-60s %9d\n", $ThisOne, $Visitor{$ThisOne});
    }
  }
  printf ("=========================================================\n");
}

if (keys %Domain) {
  printf ("\n  Top 50 Visiting Domains\n");
  printf ("==========================================\n");
  printf (" Domain                              Times\n");
  $count = 0;
  foreach $ThisOne (sort {$Domain{$b} <=> $Domain{$a} } keys %Domain) {
    if ($count < 50) {
      if ($ThisOne =~ m/[a-zA-Z]/) {
        $count++;
        printf ("  %-30s %9d\n", $ThisOne, $Domain{$ThisOne});
      }
    }
  }
  printf ("==========================================\n");
}

printf ("\n Page Statistics\n");

if (keys %PageCount) {
  printf ("\nHits by Page\n");
  printf ("======++===============================================\n");
  printf (" Page                                              Hits\n");
  $count = 0;
  foreach $ThisOne (sort %PageCount) {
    if ($count < 50)     {
      if ($ThisOne =~ m/html$/) {
        # $count++;
        printf ("  %-60s %9d\n", $ThisOne, $PageCount{$ThisOne});
      }
    }
  }
  printf ("==========================================\n");
}

if (keys %Referer) {
  printf ("\n  Referer information\n");
  $count = 0;
  foreach $ThisOne (sort {$Referer{$b} <=> $Referer{$a} } 
keys %Referer) {
    $temp = $ThisOne;
    chop($temp);
    if ($count < 50) {
      $count++;
      printf ("  %-70s %9d \n", $temp, $Referer{$ThisOne}) ;
    }
  }
}

Again, you may have to add to the lenght of some of the header lines in the report, as I've trimmed them to make them fit on this page.

If you don't feel like typing, you can download the whole script.

Although this covers only a very few status messages in the log file, it's a start ... have fun.

© 1999-2005 Lars M. Hansen