I am responsible for our internet content filter (St Bernard Iprism),
and have written several perl scripts to extract info from the raw log
that is sent to a Windows syslog server. I run these on my windows
desktop every morning, and manually review what I've exported.

I have most of what I need, but am having problems with tracking up to
1500 users every day. I am able to pull what I need for any single user,
but would also like to do is extract total bandwidth and time
consumption for all users, one user per line.

One script I use exports a file every day that looks something like
this:

10.10.0.155,bsmith
10.20.1.23,gwbush
10.10.1.20,aneuman

(IP address, user ID.)

and it also exports a permanant CSV file with Date, Number of users,
Total number of bytes, just one line per day.

What I'd like to add to the first export is:

10.10.0.155,bsmith,12345,03:45
10.20.1.23,gwbush,34567890,07:55
10.10.1.20,aneuman,875433,23:59

adding total total number of bytes downloaded, and total time spent
surfing.

I have looked at quite a few Squid log analysis programs, and I can't
seem to find one that doesn't require a: web server, money, or won't
work on a Windows workstation or server.

Have any suggestions? I tried one thing that did a separate pass for
every user, and gave up on that as the universe would end before the
script did.

Here is my Total Users Stats script, with the syslog line format added
in:

#Log file is TAB delimited. Below are the 14 fields that comprise each
log line, with sample data.

#00 2003-07-26 00:12:40 (Kiwi Date & Time that the log entry was
received from the Iprism)
#01 Local4.Info (useless info from the Syslog service that receives data
from the Iprism)
#02 10.10.2.20 (the sending deivce, in this case the Iprism)
#03 Jul 26 00:01:08 rtlogger[1486]: http (date and time of the request,
plus the type of proxy access, 'http')
#04 (log time - number of seconds since 1-1-1970 00:00)
#05 225 (time in milliseconds that it took to perform the requested
action)
#06 10.20.2.155 (client computer address doing the requesting)
#07 TCP_MISS/200 (status of the request)
#08 4909 (number of bytes requested)
#09 GET (the requested command, sometimes is POST, PROPFIND, OPTIONS,
LOCK, and a couple other things)
#10 http://www.tscn.com/ (the requested data)
#11 - ('-' if not authenticated, user name if authenticated)
#12 DIRECT/123.234.123.1 (where the request was served from)
#13 text/html ('MIME' encoding of the returned data)
#14 finance (Iprism category of the web address)

#Raw log filenames look like this: 2005-06-17-IPRISM.txt

use warnings ;
use strict ;
my ($ipaddr, $filnam, $sec, $min, $hour, $starttime, $logdate, $fields,
$endtime);
my ($totaltime, $numtotlusers, $numberoflines, $numberofbytes);
my ($item, $items, $filenum, $infile, $filename, $line, %seen, @list,
@fields);
my (@totlusrtmp, @totalusers, @uniq);
$sec = $min = $starttime = $logdate = $fields = $endtime = $totaltime =
$items = $filenum = $numtotlusers = 0;
$numberoflines = $numberofbytes = 0;
$infile = $filename = $line = "" ;
chdir('d:/download') or die "nDownload directory missing!n";
@list = glob '*IPRISM.txt' ;
$items = (scalar @list);
if ($items == 0) {
die "nNo log File!n";
}
if ($items == 1) {
$infile = $list[0];
}
if ($items > 1) {
$filenum = 0;
print "Which file number do you want to open?n";
print "Just press ENTER for the first listed file.nn";
foreach $_ (@list) {
print $filenum.". ".$_."n" ;
$filenum++;
}
chomp($filename = <STDIN>);
if ($filename eq "") {
$infile = $list[0];
}
else {$infile = $list[$filename]};
}
if ($filename ne "") {
if ($filename > $items) {
die "nWhoops, that file doesn't exist!n";
}
}
open (INFILE, "<$infile") or die "nWhoops, that file doesn't exist!n";
$starttime = time;
$line = <INFILE>;
($logdate, undef) = split (/s+/, $line) ;
print "nTotal number of Intenet Users Log File n";
print "Logdate: $logdaten";
close (INFILE);
open (INFILE, "<$infile");
open (OUTFILE, ">d:/download/TotlUsers$logdate.LOG");
open (PERMFILE, ">>d:/docs/Iprism/TotalUsers/TotlUser.CSV");
while (<INFILE>) {

chomp;
@fields = split / /;
# out blank lines, might see two a month
next if !$fields[5];
$numberoflines++;
$numberofbytes = $numberofbytes + $fields[8];
#skip unauthenticated user but permitted access
next if $fields[11] =~ m'-';
push (@totlusrtmp, $fields[6].",".$fields[11]);
next;
}

%seen = ();
foreach $item (@totlusrtmp) {
push (@totalusers,$item) unless $seen{$item}++;
}

if ($#totalusers > 0) {$numtotlusers = $#totalusers + 1};

print $numtotlusers." Total Users"."n";
print OUTFILE $numtotlusers." Total Users"."n";
print $numberoflines." Total number of lines"."n";
print OUTFILE $numberoflines." Total number of lines"."n";
print $numberofbytes." Total number of bytes"."n";
print OUTFILE $numberofbytes." Total number of bytes"."n";

#Permfile only gets the day's date, user total, byte total
print PERMFILE $logdate.",".$numtotlusers.",".$numberofbytes;

# Output IP addresses and user IDs
foreach $item (@totalusers) {

print OUTFILE $item."n";

}
print PERMFILE "n";

close (INFILE);
close (OUTFILE);
close (PERMFILE);
$endtime=time;
$totaltime = $endtime-$starttime;
($sec, $min, $hour,,,,,)=localtime($totaltime);
print "Run time = $min:";
printf ("%02d",$sec);
print "n";
---------------------------------------------

Thanks.
--
Al M-c-C-a-n-n
m a c 3 5 8 (at)-n e w s g u y-(dot) c o m