We never know anything until we test it pt. 2

by debianjoe

bashtest2

 

In the previous post, I’d looked at the difference in run speed for a case-based and if/elif/else-based bash script.  Some of the comments (thanks to kooru and CorkyAgain for the input) led me to believe that we may have just scratched the surface of testing.

Any good science starts with a control group.  For this example I set up the for loop as was done in both of the previous tests, but totally removed any of the comparison logic and simply set the variable to a constant and continued the echo as was done previously.  After a few runs, it came in around 0.319 seconds on average on the test system.  This number will be henceforth referred to as my “base value” for the scripts.  We also need a reasonable way to see in the entirety of the script, what percentage of total run time does the comparison take.  For this, I’m thinking that we can make up a “efficiency factor” and call it something like EF, because I don’t think that there is a standard to maintain.  I am determining the made-up EF by taking the (“mean run-time” – “base value”) / “base value” * 100.  It seems like that should give us a pretty decent idea of what percentage of total time the comparison itself is taking.

Using this system our EF’s for yesterday’s test would come out to be:  10.66 for the case statement, and 83.39 for if/elif/else.  This is a much more significant number since it should (in theory), be much more accurate targeting of only the changes in evaluations.

kooru suggested that switching from the normal “[” or “test” to the bash inclusion “[[” would show a significant increase in evaluation speed.  It does.  Of 5 runs, the mean run-time came in at close to 0.425 for the double-bracketed compares.  This means that by changing only this particular little detail, we dropped the EF of if/elif/else from 83.39 to 33.23.  That’s honestly a huge improvement, more than doubling the efficiency of what is almost identical logic.

I’m about to have to eat my words about how I’d expected case to behave in in comparisons with the matches being closer to the top of the list vs how if/elif/else does.  In both situations, it did make a difference.  I set up both a double-bracketed if/elif/else and a case to find the match in the first comparison every time.  This is what CorkyAgain had suggested for optimizing scripts where you know to expect certain situations more often than others.  I did use the double-bracketed compare to give our “optimized” script the benefit of our previous findings.  In the case where there are 4 comparisons, but the match is made at the first one, if/elif/else’s EF drops to 25.39.  In the exact same situation with case-switch, the EF goes down to 10.34 after figuring a mean.  So, this is where I hang my head in shame and admit that the difference in case-switch for the match to be found closer to the top is almost insignificant, while in if/elif/else, it makes a much more noticeable difference.

In summary, using our totally made-up for this test EF factors (smaller is better for those of you who are just catching on): 

case-switch with unknown priority: EF=10.66
case-switch with stacked priority: EF=10.33
if/elif/else with unknown priority and "[": EF=83.39
if/elif/else with unknown priority and "[[": EF=33.23
if/elif/else with stacked priority and "[[": EF=25.39

I believe that we can start making generalizations about optimizing scripts now.  In any situation, case-switch does outperform spaghetti-styled if/elif/else.  If you’re writing scripts where only a few comparisons are being done, the difference is probably negligible.  Assuming that it’s a bash script, double-bracketing should be used wherever possible over single-bracketing.  In all situations, if you know that certain comparisons are more probably than others, the choices should be moved to the top of the stack as long as it doesn’t interfere with the functionality of the script itself.

Advertisements