``````{
>I need to find the minimum and maximum values in large arrays (~20,000
>points) of type word.  Looking for a faster way using TP7 (assembler?)
>to do it than:
>
>min := data^;
>max := data^;
>
>for i := 2 to num_points do
>begin
> data_value := data^[i];
> if data_value < min then min := data_value;
> if data_value > max then max := data_value;
>end;
>
Lets try some asm here:

From: terjem@hda.hydro.com (Terje Mathisen)
}

Procedure FindMinMax(var data; num_points : word; var min, max : word);
Assembler;
asm
push ds
lds si,[data]
mov cx,[num_points]
sub cx,1
jc @done   {Empty array! }
mov dx,[si] {Min value}
lea si,[si+2] {Point at second table entry}
mov bx,dx   {Max value}
jz @store  {Single-entry array!}

@loop:
mov ax,[si]
cmp ax,dx
jb @new_min
cmp ax,bx
ja @new_max
dec cx
jnz @loop
jmp @store

@new_min:
mov dx,ax   {Save new min value}
dec cx
jnz @loop
jmp @store

@new_max:
mov bx,ax   {Save new max value}
dec cx
jnz @loop

@store:       {Return the values found!}
lds si,[min]
mov [si],dx
lds si,[max]
mov [si],bx
@done:
pop ds
end;

{
This was written from scratch, so no testing whatsoever!  It should be quite
well optimized for both 486 and Pentium-class machines, running in about 10
cycles/word on a 486, and just 4 cycles/word on a Pentium, since the
8 inner-loop instructions will pair perfectly.  With 20,000 points in your
array, this should correspond to 6ms on a 486-33, and less than a milli-
second on a Pentium-90.

The most important feature to note is that new extremal values will be quite
rare, averaging just O(log(n)) for a random n-element array.  That's why I
jump out of the loop to handle these cases, making the normal case much
faster.  With a worst-case, already sorted array, we will find a new max
value on each iteration, which will increase the running time to 12 cycles
on the 486, while the Pentium will stay constant at 4 cycles.

If the array is pre-sorted in reverse (declining) order, the 486 is back to
10 cycles, while the P5 is actually faster, at just 3 cycles/word.

I think the only way to improve on this code is by unrolling it, which will
save up to 4 cycles/word for the 486, and just a single P5 cycle.

PS. Make sure that your array is naturally aligned (16-bit word), if not it
will run a lot slower, esp. on a P5.
}

``````