Beefy Boxes and Bandwidth Generously Provided by pair Networks Bob
laziness, impatience, and hubris
 
PerlMonks

log file sorting

by numberninja (Initiate)
 | log AltBlue out | AltBlue | The Monastery Gates | Super Search | 
 | Seekers of Perl Wisdom | Meditations | PerlMonks Discussion | Snippets | 
 | Obfuscation | Reviews | Cool Uses For Perl | Perl News | Q&A | Tutorials | 
 | Code | Poetry | Recent Threads | Newest Nodes | Donate | What's New | 

on Aug 04, 2008 at 18:45 EEST ( #702093=perlquestion: print w/ replies, xml ) Need Help??
numberninja has asked for the wisdom of the Perl Monks concerning the following question:
       

hi, i have more log file sorting questions =D. As an example, my logs might look like: B #/# #,E #/#,A #,A # i need to be able to go through the entire text file and sort the data alphabetically, to:

A #,A #,B #/# #,E #/#

last week, i got help with that, but it resulted in the two A's being combined, losing a bunch of data =( Now, on top of that, i need to edit the file further, to:

A1 A2 B1 B2 B3 E1 E2 (labels)

# # # # # # # (data 1)

# # # # # # # (data 2)

can any one provide some insight? the main thing that stumps me is the alphabetizing without deleting data, and how to count how many numbers are associated with each data label. Finally, sometimes, there may be entire series of data missing, so there would be no B at all, yet i need to have the program fill in blanks, or at least 0's, to keep the formatting standard thanks for any help you guys can offer =D

im also confused as to how to get my paragraphs to have good formatting XD, so i used the code tags

Comment on log file sorting
Send private /msg to numberninja
Re: log file sorting [id://702098]
by moritz (Abbot) on Aug 04, 2008 at 18:55 EEST
       
    What have you tried so far? Show some effort on your own, and we'll try to help you.

    Also please read Writeup Formatting Tips, please use <code>...</code> tags only for code, and paragraphs <p>...</p> for text.

[reply]
[/msg]
[d/l]
[select]
       
      ah, thanks for notifying me about the paragraph tags, i somehow missed them reading about them
      print "File Location?"; my $data_file = <>; open(RAWDATA, $data_file); my @list; while (<RAWDATA>) { chomp; my (%hash, @rest); ($hash{first}, $hash{date}, @rest) = split(",", $_); for my $r (@rest) { my ($k, $v) = split(' ', $r, 2); $hash{$k} = $v; } push(@list, \%hash); }; my %seen; for (@list) { for (keys %$_) { $seen{$_}++ } }; delete $seen{first}; delete $seen{date}; my @allkeys = ('first', 'date', sort keys %seen); my @keys = (sort keys %seen); open(SEMIDATA, ">temp.slice"); for my $h (@list) { print(SEMIDATA join(',', $h->{first}, $h->{date}, map( $_.' '.$h->{$_}, @keys ) ), "\n") or warn "print failed: $!"; } close(RAWDATA); close(SEMIDATA); open(EDITDATA, "temp.slice"); my @array_of_data = <EDITDATA>; close ("temp.slice"); foreach my $line (@array_of_data) { #all replacements go here $line =~ s!X!!g; $line =~ s!ART ,!ART / ,!g; $line =~ s!ECG ,!ECG /,!g; $line =~ s!NBP ,!NBP / ,!g; $line =~ s!PA ,!PA / ,!g; $line =~ s!RESP ,!RESP /,!g; $line =~ s!SAO2 ,!SAO2 /,!g; $line =~ s!ST ,!ST //,!g; $line =~ s!TEMP \n!TEMP /\n!g; } # Open the file for writing. open REGDATA, ">temp2.slice"; foreach my $line (@array_of_data) { # Print each line in turn to the new filehandle DATAOUT print REGDATA "$line"; } close REGDATA; }
      this is relevant part of the program i have so far, which sorts each data label alphabetically. As I was unsure of how to count how many distinct numbers followed a label, i tried to fill in non existant data with substitutions, which was merely a temporary fix. I havent even begun attempting to get the data organized into nice excel-esque columns, as that would first require standardizing its appearance.
[reply]
[/msg]
[d/l]
       
        ah, thanks for notifying me about the paragraph tags, i somehow missed them reading about them

        No problem. Just go to your original question and fix the markup.

        As for your programming problem, I think you're making it harder than it needs to be.

        For example there's no need to store your data to disk twice, and read it again. Here's what I'd do, in non-tested perl code, with some blanks left for you to figure out:

        # store all data here: my %data; while (<INPUT>){ chomp my @items = sort split m/,/; my %seen; # number the occurrences of data points, and put them into a hash for (@items) { my ($key, $val) = split m/ /, $_, 2; my $index = ++$seen{$key}; push @{$data{"$key$index"}}, $val; } } # now all data should be in the hash %data. use Data::Dumper; print Dumper \%data; # now print it: my @keys = sort keys %data; while (keys %data) { for (@keys) { if (exists $data{$_}) { # print it out here # then remove it shift @{$data{$_}}; delete $data{$_} unless @{$data{$_}}; } else { # print a placeholder here } } }

        The idea is to keep a list of all data values for each label, in your case ['#', '#'] for A1, ...

        The choice of a clever data structure (ie one that fits the way you want to access it in your code) makes it much easier.

[reply]
[/msg]
[d/l]
[select]
       

          just to make sure i understand your programming template:

          the first part does something similar to my program and splits the data according to commas, then has a space where i can count the occurrences of numbers following a label? I considered using the count function and looking for instances of \D\d{0,3}, but i don't see how to limit the count to only the area between the label and the comma. However, this doesn't really shed any light on how i can store the multiple instances of "A" in one line as seperate values.

[reply]
[/msg]
Re: log file sorting [id://702102]
by apl (Vicar) on Aug 04, 2008 at 19:17 EEST
       
    Consider reading the Hash tutorials. Using a hash and sorted keys should resolve most of your problems.
[reply]
[/msg]
[d/l]
Re: log file sorting [id://702119]
by jethro (Hermit) on Aug 04, 2008 at 19:59 EEST
       
    To keep the formatting in text output you might use a sub like this (untested):

    sub rj { # right justifies each argument with a length of first parameter my $len= shift; my $str=''; while (@_) { $str.= substr(' 'x12 . shift @_), $len); } return $str; } print rj(12, @d); #prints data in columns of length 12
    In a similar way it is possible to have a subroutine generate left shift or centered text.

    Remember to divide a difficult problem into smaller steps. Solving these smaller steps is always easier than looking at the whole problem.

    This is what I did because I have a heck of a problem understanding what you want to put where. I neither understand your sorting order ( #,B #/# #,E is alphabetically sorted??) nor where the 1 in A1 comes from.

[reply]
[/msg]
[d/l]
[select]
       
      well, my actual data labels aren't A, B, C, D, E, etc. They're vital signs, so they might start with those letters, I meant alphabetically sorted in that B comes before E. The 1 in A1 comes from the fact that there are multiple data points collect under the label "A", so each of those needs to have a sub label of A-1, A-2, etc
[reply]
[/msg]

Back to Seekers of Perl Wisdom


XP Nodelet
You have 19 votes left today.?
You gained 1 experience point.
You have 570 points until level 12 - Deacon.
Tick tock
Mon Aug 4 14:04:08 2008
Aug 04, 2008 at 21:04 EEST
Chatterbox
  • And 0 more, 1 archived

[jdporter]: Is it valid for an attribute value to contain a newline? (Talking XML here)
[jdporter]: i.e. something like <foo bar="foo\nbar"> (where \n is an actual newline)
[bart]: I think it is, but that it is meaningless. Newline = whitespace = space
[jdporter]: hm. So if my xml does contain newlines in attributes, XML processors should have no problem with it?

How do I use this? | Other CB clients
Approval Nodelet
node history
 FrontPage
Consider node:
Node Type: perlquestion [id://702093]
Approved by Corion
help ntc
Personal Nodelet

Edit | Add current node
Add to public  /  private pad
Find Nodes
Nodes You Wrote
Super Search
List Nodes By Users
Newest Nodes
Recently Active Threads
Selected Best Nodes
Best Nodes
Worst Nodes
Saints in our Book
Leftovers
AltBlue
log AltBlue out
The St. Larry Wall Shrine
Offering Plate
Awards
Craft
Quests
Editor Requests
Buy PerlMonks Gear
PerlMonks Merchandise
Perl Buzz
Perl.com
Perl 5 Wiki
Perl Jobs
Perl Mongers
Planet Perl
Use Perl
Perl Directory
Perl documentation
CPAN
Random Node
Information
PerlMonks FAQ
What's New at PerlMonks(*)
Guide to the Monastery
Voting/Experience System
Tutorials
Reviews
Library
Perl FAQs
Other Info Sources
Free Nodelet

Please read the PerlMonks FAQ (or, at least, How do I post a question effectively?)

Edit Free Nodelet

Nodelet Nodelet

Top Bottom