I was hopeful that V5.4-001 with the gtm_lvscale feature might solve a long standing issue we have with the performance of large local arrays. We have observed that performance suffers dramatically with local arrays when a large number of subscripts are used. We have also observed that if sequential integer subscripts are being used performance can be increased by up to two orders of magnitude if by using negative subscripts and thus inserting into the array in descending order.
The problem is exhibited by the code below:
[code]test3 ; Test Local Variable stuff
;
w "test3> "_$zv,!
d test(320000,100) ; Very Slow
d test(-320000,100) ; Very Fast
;
q ; >>> test3
test(n,ln) ; Local Array Insertion Test
;
n (z,n,ln)
;
s n=$g(n,100) ; Negative values cause the use of subscripts -1..-|n|
s ln=$g(ln,1000)
;
s t0=$h,t0=t0*3600*24+$p(t0,",",2)
s s=$j("",ln),cn1=0
f i=1:1:n,-1:-1:n d
. s lvn(i,"lvn")=s,cn1=cn1+1
i cn1'=$tr(n,"-") w 0/0
;
s t1=$h,t1=t1*3600*24+$p(t1,",",2)
w $j($fn(cn1,","),12)_" records in"_$j(t1-t0,4)_"s. "_$s(n<0:"Descending",1:" Ascending")_" subscripts.",!
;
q ; >>> test
[/code]
Note the dramatic difference between the two modes:
$ export gtm_lvscale=1; mumps test3.m; time mumps -run test3
test3> GT.M V5.4-001 Linux x86
320,000 records in 64s. Ascending subscripts.
320,000 records in 0s. Descending subscripts.
real 1m4.704s
user 1m4.472s
sys 0m0.140s
Note: Reversing the order of the calls to test() has no effect. The n=320000 call still takes over a minute.
Using gtm_lvscale DOES help the ascending subscript version A LOT, however it never approaches the performance of the descending subscript version before both start to get very poor performance above gtm_lvscale=7:
$ for i in 1 2 3 6 7 8 9; do echo "lvscale=$i"; export gtm_lvscale=$i; mumps test3.m; time mumps -run test3; done
lvscale=1
test3> GT.M V5.4-001 Linux x86
320,000 records in 60s. Ascending subscripts.
320,000 records in 0s. Descending subscripts.
real 1m0.313s
user 1m0.028s
sys 0m0.140s
lvscale=2
test3> GT.M V5.4-001 Linux x86
320,000 records in 33s. Ascending subscripts.
320,000 records in 0s. Descending subscripts.
real 0m33.528s
user 0m33.250s
sys 0m0.168s
lvscale=3
test3> GT.M V5.4-001 Linux x86
320,000 records in 22s. Ascending subscripts.
320,000 records in 0s. Descending subscripts.
real 0m21.386s
user 0m21.173s
sys 0m0.172s
lvscale=6
test3> GT.M V5.4-001 Linux x86
320,000 records in 12s. Ascending subscripts.
320,000 records in 0s. Descending subscripts.
real 0m12.500s
user 0m12.181s
sys 0m0.284s
lvscale=7
test3> GT.M V5.4-001 Linux x86
320,000 records in 13s. Ascending subscripts.
320,000 records in 20s. Descending subscripts.
real 0m39.521s
user 0m12.205s
sys 0m0.936s
lvscale=8
test3> GT.M V5.4-001 Linux x86
320,000 records in 12s. Ascending subscripts.
320,000 records in 33s. Descending subscripts.
real 1m7.991s
user 0m9.945s
sys 0m1.232s
lvscale=9
test3> GT.M V5.4-001 Linux x86
320,000 records in 13s. Ascending subscripts.
320,000 records in 79s. Descending subscripts.
real 2m7.021s
user 0m9.509s
sys 0m1.612s
$
The key seems to be in adding nodes in with decreasing (rather than negative) subscript values. "for n=320000:-1:1..." would be quite fast also.
This certainly seems like an unreasonable asymmetry in Local Variable performance.
Best Regards,
-bob
Forgot to mention the asymmetry is inverted after lvscale > 7. Probably garbage collection related? Is there a counter for that yet?
Is this still an issue? There were major improvements to local variable performance in V5.4-002.
You are correct, the test case seems to be much better in V6 at least:
Similar improvements in the application have not been reported (probably because of existing work-arounds) but we will investigate that separately if necessary.
Thank you very much for following up on this.
-bob