Dear Monks
I was contemplating over the 256 commandments from Damian’s "Perl Best Practices" when I encountered:
List Generation Use map instead of for when generating new lists from old.
Being more used to other programming languages I often use the non-Perl approach to get the job done. I would typically use a for and probably not even consider the map. So this was an eye-opener for me. I decided to do a little test to see how much the difference is between the two.
(FYI: Perl v5.8.8 built for MSWin32-x86-multi-thread running on a Dell INSPIRON 9400)
I use the example as mentioned by Damian and the Benchmark module to test:
use strict; use warnings; use Benchmark qw(:all); my @results; my $count = -5; # Populate list with 10 mio numbers for (my $i=0; $i<1000_000; $i++) { push @results, $i; } cmpthese ( $count, { for => "test_for;", map => "test_map;", } ); timethese($count, { for => "test_for;", map => "test_map;", } ); sub test_for { my @sqrt_results; for my $result (@results) { push @sqrt_results , sqrt($result); } } sub test_map { my @sqrt_results = map { sqrt $_ } @results; } [download]
First the comparison:
$count=-1 (warning: too few iterations for a reliable count) Rate for map for 2.67/s -- -10% map 2.98/s 12% -- $count=-5 Rate map for map 3.05/s -- -16% for 3.61/s 18% -- $count=-10 Rate for map for 2.73/s -- -8% map 2.95/s 8% -- [download]
Hmmm, not really impressive this “gain” of using map over for?!
Next some timing:
$count=-1 Benchmark: running for, map for at least 1 CPU seconds... for: 1 wallclock secs ( 1.16 usr + 0.00 sys = 1.16 CPU) @ 3.46/s (n=4) map: 2 wallclock secs ( 1.19 usr + 0.00 sys = 1.19 CPU) @ 3.37/s (n=4) $count=-5 Benchmark: running for, map for at least 5 CPU seconds... for: 6 wallclock secs ( 5.22 usr + 0.00 sys = 5.22 CPU) @ 3.64/s (n=19) map: 6 wallclock secs ( 5.22 usr + 0.00 sys = 5.22 CPU) @ 3.45/s (n=18) $count=-10 Benchmark: running for, map for at least 10 CPU seconds... for: 10 wallclock secs (10.11 usr + 0.00 sys = 10.11 CPU) @ 3.46/s (n=35) map: 10 wallclock secs (10.03 usr + 0.00 sys = 10.03 CPU) @ 3.29/s (n=33) [download]
Am I missing something? Is the example given by Damian a poor example? Should I really favor map over for when I want to generate a new list from another list?
Thanks upfront
Update
Beside the obvious advantages: less code, easier to understand, it is stated that map is normally considerably faster.
The bigger point, though, is that by using map, you're telling me more about your intent with the code. map says "I'm doing something to each element, something that's probably easily described, and accumulating the result." On the other hand, for says "I'm doing something with each element and it could be anything."
That's pretty much the distinction I'd make as well, although I usually phrase it another way: for is for generic iteration, map is specifically a transformation.
A surefire way to annoy me and lose points when we get sample code is people who for whatever reason use map in void context rather than a proper for loop (it doesn't say "LOOK I R IDIOMATIC CODERZ", it says "ITERATION: UR DOIN IT WRONG" (and yes, I do have a lolcat based scale for applicants :)).
The cake is a lie. The cake is a lie. The cake is a lie.
I have done some research on best practices regarding map. I wanted to ask at one point why we don't just use map instead of foreach. However I found a few nodes about that subject where, as you just did, map in a null context was brought up. However, I am not sure I understand what that means. Could you describe what map in a null context is?
Using map but not capturing the returned values:
map { something( $_ ) } @somelist;
In recent versions that's been optimized (basically the return values are silently discarded rather than a temporary list built and then discarded later) so it's not as blecherous performance wise.
However it really buys you nothing to use it instead of a for loop here, because you've now muddled the conceptual waters (Was it at one point using the returned values and changed? Did they plan on possibly using them at some point?) and makes the code harder to understand (rather than the important thing (what's being iterated over) being up front, you've got to read past the details (what's being done for each item) to find out). It's along the lines of using passive or active voice in a sentence ("The cow jumped over the moon." vs "The moon was jumped over by the cow"); using the wrong one can shift what the reader takes as the emphasis to the wrong part.
I think I get it: Why use a gun to kill a cockroach when you have a perfectly good shoe. Since you don't get the benefits of mapping the data, and as such you are wasting time and space, it's better that you use "for".
So, if I am understanding you correctly,
Using for has the following advantages:
my @result = map { f($_) ? g($_) : () } @list; # or: my @result; for (@list) { push(@result, g($_)) if (f($_)); }
my @result = map { g($_) } grep { f($_) } @list;
But that's just me . . .
for $var (@list)
A related discussion (with more links and examples) is Map: The Basics in the Tutorials section.
I have not done much with perl profiling, but when doing a performance comparison, don't you need to return the same results? It looks like the test_for returns the number of elements in the post-push array, whereas test_map returns the new array, at least if I am reading push correctly.
Update:Note that I am not saying that the results will change much. In fact, here are mine for 60 seconds. test_for2 does a return of the array at the end of the test function.
s/iter for2 map for for2 2.70 -- -1% -1% map 2.68 1% -- -0% for 2.67 1% 0% -- Benchmark: running for, for2, map for at least 60 CPU seconds... for: 61 wallclock secs (61.37 usr + 0.02 sys = 61.39 CPU) @ 0.37/s (n=23) for2: 61 wallclock secs (61.37 usr + 0.03 sys = 61.40 CPU) @ 0.37/s (n=23) map: 62 wallclock secs (61.48 usr + 0.03 sys = 61.51 CPU) @ 0.37/s (n=23)
Update 2: Would some kind monk be willing to comment on if it is sufficient to just define the raw function (as in the OP), or would you also need to have the function return into a context of some sort. In other words, should there be another layer of function call here to force list context to make this a valid comparison?
--MidLifeXis
The speed increase involves minimizing resource intensive tasks such as sorting. Instead of of running code every time you iterate over a loop, you perform a map on the data once storing the output into an array. Then you can use the information from the array. The classic example would be the Schwarzian transform .
Best practices rarely have anything to do with performance. They are usually designed to avoid pitfalls or to increase readability and maintainability, often at the cost of performance.
I think your assumption that Damian recommended map for performance reason is flawed, or did he say as much?
Please read the PerlMonks FAQ (or, at least, How do I post a question effectively?)
Edit Free Nodelet