In the first part of this article I'll explain a lot of different ways to check
for older processors by exploiting bugs, undocumented features, etc. I'll also
show how to write an invalid-opcode exception handler, calculate the size of
the prefetch queue and some other things. Finally, in the last part Chris shows
how to determine the processor clockrate with the RDTSC instruction.
Chris didn't have much free time at the moment and so couldn't contribute more,
therefore I had to put this article together pretty much myself, and I hope the
quality didn't go down very much -- since Chris' texts are definitely better
than mine.
AAD (ASCII Adjust before Division) Instruction
This instruction allows us to distinguish between at least NEC's V-series and
Intel processors. AAD, usually in preparation for a division using DIV or IDIV,
works like this:
AL = AH * 10 + AL
AH = 0
Converting the unpacked two-digit BCD number in AX into binary. Thus being
"0d5h, 0ah" the normal opcode. The difference is that while Intel's chips allow
one to replace the multiplicand with any number (and by so building your own
AAD instruction for various number systems), NEC always encodes it as 10 by
default. So by replacing the second byte with a different number, we can then
check if the operand is actually used, and if not, assume it's a NEC.
mov ax, 0f0fh
db 0d5h, 10h ; opcode for AAD 16
cmp al, 0ffh ; check if multiplicand was 10 or not
jz isIntel
jnz isNEC
This should be used as another way (in addition to the one presented in the
first article on this subject) to distinguish the NEC V20/V30 series from the
Intel 8086/88.
PUSHA Instruction
Here is another good way to differentiate NECs from Intel's 8086/88. Since
V20 and V30 execute all the 80186 instructions and knowing that PUSHA executed
on the 8086/88 as "JMP $+2", one can for example, after executing it, set the
carry flag and then see if it was really set.
clc ; ensure that CF is clear
pusha ; executed on 8086/88 as JMP $+2
stc
jc isNEC_or_186plus
jnc is808x
<whatever code here>
.
.
- is NEC or 186plus
-
popa ; clean up
Of course the carry flag must not already be set before performing this test.
POP CS Trick
I'll just show one last way of accomplishing the same. The trick is that, on a
8086/88 (non-CMOS versions, at least), the opcode "0fh" will perform a POP CS,
on a 186/88 is an invalid opcode, generating an INT6 exception, while NECs and
286+ use that encoding as a prefix byte, to indicate new instructions. So, to
tell NEC's V20/V30 (also V40/V50, I think) and 8086/88 apart, and knowing that
with the byte string "0fh, 14h, 0c3h", the CPU will perform the following:
8086/88 V20/V30
------- -------
pop cs set1 bl, cl
adc al, 0C3h
It is then easy to write a piece of code that will distinguish between them:
xor al, al ; BTW: clears CF
push cs
db 0fh, 14h, 0c3h ; intruction(s) -- see above
cmp al, 0c3h ; check if ADC was executed
je is808x
jne isNEC_V20plus
<whatever code here>
.
.
- is NEC V20plus
-
pop ax ; clean up (no POP CS available)
Note that, again, the carry flag must be cleared before execution of this test.
Also, just a reminder that this is to be used when you know that the processor
is not a 186 or above but an older one.
Word Write
On the 8086/88 (+ V20/V30), when a word write is performed at offset 0ffffh in
a segment, one byte will be written at that offset and the other at offset 0,
while an 80186 family processor will write one byte at offset 0ffffh, and the
other, one byte beyond the end of the segment (offset 10000h). So all we have
to do is test if it wraps around or not:
mov ax, ds:[0ffffh] ; save original bytes
mov word ptr ds:[0ffffh], 0aaaah
cmp byte ptr ds:[0], 0aah ; did 2nd byte wrap around?
mov ds:[0ffffh], ax ; restore original bytes
je is808x
jne is8018x
Again, note that this should only be used for the specified processors.
Multi-Prefix Intructions
The standard 8086/88 processors have a bug such that they loose multiple
prefixes if an interrupt occurs, while CMOS versions do not, since this bug was
fixed in the 80C86/C88 processors (NEC V20/V30 processors also do not have this
bug -- allowing the following code to also be applicable to them). If we
execute a string operation with a repeat prefix and also a segment override for
long enough to be interrupted, then, if we are on a 8086/88 the REP prefix will
be lost when the instruction is interrupted, since on return, only the last
prefix will be retained. If instead, we are on a low-power consumption CMOS
version, the code will successfully complete.
mov cx, 0ffffh
sti
rep lods byte ptr es:[si] ; sure to be interrupted
cli
jcxz notstandard_808x ; check if REP was completed
<if here, then it's just a standard 8086/88>
.
.
Just in case you want to use a piece of code like this without having to worry
about that bug, here's how to get it work correctly every time (with interrupts
enabled -- this time with MOVS):
do_REP: rep movs byte ptr es:[di], es:[si] ; may be interrupted!
jcxz carry_on ; if not, carry on,
loop do_REP ; else, complete REP
carry_on:
Invalid-Opcode Exception Handler (INT6)
From the 80186 and upwards, all processors allow one to implement an
invalid-opcode exception handler, which gives us a great way of telling the
families of CPUs apart. All one does is, hook the INT6 interrupt vector with
our own handler and see if some specific instructions trigger an INT6 or not.
With our handler we trap those exceptions and then toggle a little flag, that
show us the processor doesn't support that instruction.
In the code below I hooked the INT6 vector by changing the IVT (Interrupt
Vector Table) directly, but one can also use DOS services for that, test which
processor we're running on and after that restore things back to what they were
before (except registers, place some push/pop code yourself according to your
needs -- by the way, Robert Collins is a god!). Anyway, the code is pretty much
self-explanatory:
; Hook INT6 -- set up our own handler
push 0 ; point to IVT (0000:0000) - (1
pop es ; byte saved thanks to Chris!)
cli
lds ax, es:[64] ; get original handler vector
mov es:[64], offset INT6_handler ; then, replace it with
mov es:[6*4+2], cs ; our own handler
sti
; Test if processor is at least a 80186 -- Executes "SHL DX, 10"?
mov cx, 1 ; set up invalid-opcode flag
shl dx, 0ah
jcxz unknown_CPU
; Test if processor is at least a 80286 -- Executes "SMSW DX"?
smsw dx
jcxz is80186
; Test if processor is at least a 80386 -- Executes "MOV EDX, EDX"?
mov edx, edx
jcxz is80286
; Test if processor is at least a 80486 -- Executes "XADD DL, DL"?
xadd dl, dl
jcxz is80386
<if here, then it's a 80486 or higher processor>
.
.
; Restore original INT6 handler address -- for all processors type!
cli
mov es:[64], ax ; restore original INT6 offset
mov es:[64+2], ds ; restore original INT6 segment
sti
<whatever code here>
.
.
; Our own INT6 handler
INT6_handler:
xor cx, cx ; toggle invalid-opcode flag
push bp
mov bp, sp
add word ptr ss:[bp+2], 3 ; adjust the return address to
; after the invalid opcode (3
; bytes for all)
pop bp
iret
Note, that for this code: 1) should only be used if you know the processor is
at least a 80186, 2) if you fiddle with the contents of AX, ES and DS and
change them before restoring the original INT6 handler don't forget to first
save and then restore them!, 3) of course the code in the INT6_handler should
only be executed by means of an INT6!
Maybe a very small extra explanation is required regarding the INT6_handler. We
need to adjust the return address, since when an invalid opcode exception is
issued the saved contents of CS and EIP (which are pushed onto the stack) point
to the instruction that generated the exception, instead of the next one (as
usually happens for other interrupts).
Instruction Prefetch Queue
16-bit (ie. 8086s, 80186s, V30s) processors have a prefetch queue 6 bytes in
size and replenish the instruction queue after having at least two bytes empty
in the queue, while their 8-bit bus versions (ie. 8088s, 80188, V20s) only have
a 4 byte prefetch queue and initiate the prefetch cycle when there is at least
one empty byte in it.
So, knowing this about their Bus Interface Unit design, it isn't difficult to
write some code to distinguish between the two categories. We'll make a routine
that uses self-modifying code to change the opcode at the fifth byte and then
see if it was executed or not.
xor cx, cx
cli ; prevent against queue being emptied
lea di, patch
mov al, 90h ; load NOP opcode
stosb ; patch fifth byte to a NOP
nop
nop
nop
nop
patch: inc cx ; did the INC execute?
sti
jcxz is8bit
<if here, then it's an 16-bit processor>
I believe there is enough time for the prefetch queue to fill, though I have no
chance to confirm it!
Just in case you want to be on the safe side, here's a routine that will most
certainly work:
xor dx, dx
cli ; prevent against queue being emptied
lea di, patch+2
mov al, 90h ; load NOP opcode
mov cx, 3
std
rep stosb ; patch fifth byte to a NOP
nop
nop
nop
nop
patch: inc dx ; did the INC execute?
nop
nop
sti
test dx, dx
jz is8bit
<if here, then it's an 16-bit processor>
Again, I must stress that this code should only be used for the specified
processors, since it will without a doubt fail on others.
Do It The Optimized Way!
Here is our size-optimized way of determining the processor type. It's an
algorithm that uses Intel's guidelines and tests between pre-80286, 80286,
80386, 80486 without CPUID and 80486+ with CPUID support.
Chris is using a similar routine in his CPU identification utility.
; Detection of pre-80286/80286/386+ processors
mov ax, 7202h ; set bits 12-14 and clear bit 15
push ax
popf
pushf
pop ax
test ah, 0f0h
js ispre286 ; bit 15 of FLAGS is set on pre-286
jz is80286 ; bits 12..15 of FLAGS are clear on 286
; processor in real mode (no V86 mode
; on 286)
; <if here, then it's a 80386 or higher processor>
; Detection of 80386/80486(w/out CPUID)/80486+(CPUID compliant)
pushfd
pop eax
mov edx, eax
xor eax, 00240000h ; flip bits 18 (AC) and 21 (ID)
push eax
popfd
pushfd
pop eax
xor eax, edx ; check if both bits didn't toggle
jz is80386
shr eax, 19 ; check if only bit 18 toggled
jz is80486_without_CPUID
<if here, then it's a 80486 with CPUID or higher processor>
And so, we got the whole code down to a measly 46 bytes!
CR0 Register - Bit 4
The 80386 DX may be differentiated from the other models by trying to clear bit
4 (ET) in the CR0 register. It can be toggled on the 80386 DX, while it is
hardwired to 1 on any of the other family models. So this gives us a good way
to differentiate them, by trying to clear that bit and then see if it got
forced to set or not.
; Test CR0 register -- bit 4 (ET)
mov eax, cr0
mov edx, eax ; save original CR0
and al, 11101111b ; clear bit 4
mov cr0, eax
mov eax, cr0
mov cr0, edx ; restore original CR0
test al, 00010000b ; check if bit 4 was forced high
jz isa80386DXmodel
jnz isnota80386DX_and_therefore_is_some_other_model
Note that I'm not sure if this can safelly/trustfully be done under protected
mode!
Clockrate
Before Pentium, it was difficult to determine the processor clockrate. It
typically based on sophisticated timing loops, which were often unreliable.
With Pentium, Intel introduced RDTSC instruction, which returned number of
clocks since the processor start. The following code illustrates how to use it.
; Determine RDTSC support (assuming that CPUID is supported)
mov eax, 1
cpuid
test edx, 10h ; bit 4 is set when RDTSC is supported
jz nordtsc
; Disable all interrupts but timer IRQ0
in al, 21h
mov ah, al
in al, 0A1h
push ax ; Save previous values
mov al, 0FEh
out 21h, al
mov al, 0FFh
out 0A1h, al
; Assuming that timer runs at 55ms periods, get the clockrate
hlt ; Wait for timer
rdtsc ; Read TSC
mov ebx, eax ; Save lo
mov ecx, edx ; Save hi
hlt ; Wait for timer
rdtsc ; Read TSC
sub eax, ebx ; Difference lo
sbb edx, ecx ; Difference hi
; Calculate clockrate in MHz
mov ecx, 54925
div ecx
mov [Clockrate], eax
; Restore interrupt states
pop ax
out 0A1h, al
mov al, ah
out 21h, al
The above code can be run in real mode, V86 mode or protected mode in ring0. In
V86 mode it will hang Pentium and Pentium MMX processors, but on other
processors it will work OK.
In this code, clockrate is determined as: (T2-T1)PIT/(DM), where T1 and T2
are numbers of clocks returned by RDTSC, PIT is the value divided in the
Programmable Interval Timer (equals 0x1234DD), D is the value by which PIT is
divided (0x10000) and M is 1000000 (we want it in MHz).
Is This The End?
I think this is the end as old CPUs are concerned, since a lot of techniques
have already been covered here (though there are some more), but not for other
processors, like AMD and IBM and whatever else Chris and I think up before the
next article.
Take the time to visit Chris' web page, where you can find the source for his
CPU identification utility (for Netwide Assembler). His place is at:
{http://ams.ampr.org/cdragan/}
Also, here are some other sources of information that you might want to take a
look at (available somewhere on the net -- since I don't remember where I got
them from):
WHATCHIP.ASM (Christy Gemmell)
86BUGS.LST (Harald Feldmann/Hamarsoft)
[distributed with Ralf Brown's Interrupt list]
OPCODES.LST (Potemkin's Hackers Group)
[distributed with Ralf Brown's Interrupt list]
cpu.asm (Robert Mashlan)
WHATCPU.ASM (Dave M. Walker)
COMPTEST 2.60 (Norbert Juffa)
Ralf Brown's Interrupt List: {http://www.cs.cmu.edu/~ralf/files.html}
This, in addition to the ones already referenced in the first article of this
series.
|