At one of our sites we needed to parse out some names and numbers from a log file. The strings also contained comment text and the order would be mixed format. These types of regular expression parsing can be tricky.
Let’s take a look at the two possible formats of the strings:
FORMAT #1:
[00:02:18][jvastine]7.52..I agree with Rowdy, but they do have potential!
FORMAT #2:
[00:02:18][@Forser]My vote is 5
Can you see a pattern for extracting the name and numbers from the string? Let’s whip up the code to parse these two types of strings as a test:
$text = array(‘[00:02:18][jvastine]7.52..I agree with Rowdy, but they do have potential!’,
‘[00:02:18][@Forser]My vote is 5′
);
foreach($text as $string) {
print “<hr>testing: “ . $string . ‘<br />’;
preg_match(“/\[([\@a-z]+)\]([a-z ]+([0-9\.]+)|([0-9\.]+))/i”,$string,$matched);
print “<b>$matched[1]</b><br />”;
if($matched[3] != ”) {
print sprintf(“%02f”,$matched[3]);
} else {
print sprintf(“%02f”,$matched[2]);
}
// end for loop
}
?>
Note the use of the sprintf() to format the number because it might be a decimal. We can trim the decimal if it’s all zeroes, but in some cases we may want to include the decimal. Also as one of my smart tech friends pointed out a string that contains multiple numbers could screw this up.
FORMAT #3:
[00:02:18][jvastine]16 hot dogs…I vote is 7
Now how would we parse out the vote of 7 and ignore the 16? This one is a bit easier because we could analyze the numbers (by going to $match[4], $match[5], etc), since a valid rating is only 1-10 in the above example, but what if there was a format with multiple rating numbers?
FORMAT #4:
[00:02:18][jvastine]6 hot dogs…I vote is 7
How to determine which is the vote in this case? This is a trick question. And not as easy one. Thoughts?