blob: 40c9802c1c2d7c9ad6bce365ab6572ccf39248e3 [file] [log] [blame] [raw]
@node Hacking
@chapter Implementation details
This part describes the GRUB internals so that developers can
understand the implementation and start to hack GRUB. Of course, the
source code has the complete information, so refer to it when you are
not satisfied with this documentation.
@node Memory map
@chapter The memory map of various components
GRUB is broken into 2 distinct components, or @dfn{stages}, which are
loaded at different times in the boot process. The Stage 1 has to know
where to find Stage 2, and the Stage 2 has to know where to find its
configuration file (if Stage 2 doesn't have a configuration file, it
drops into the command line interface and waits for a user command).
Here is the memory map of the various components
@footnote{Currently GRUB does not use the extended memory for itself,
since it is used to load an operating system. But we are planning to use
it for GRUB itself in the future by @dfn{lazy loading}. Ask okuji for
more information.}:
@table @asis
@item 0 to 4K-1
Interrupt & BIOS area
@item down from 8K-1
16-bit stack area
@item 8K to (ebss1.5)
Stage 1.5 (optionally) loaded here by Stage 1
@item 0x7c00 to 0x7dff
Stage 1 loaded here by the BIOS
@item 0x7e00 to 0x7e08
Scratch space used by Stage 1
@item 32K to (ebss2)
Stage 2 loaded here by Stage 1.5 or Stage 1
@item (middle area)
Heap used for random memory allocation
@item down from 416K-1
32-bit stack area
@item 416K to 448K-1
Filesystem info buffer (when reading a filesystem)
@item 448K to 479.5K-1
BIOS track read buffer
@item 479.5K to 480K-1
512 byte fixed SCRATCH area
@item 480K to 511K-1
General storage heap
@end table
See the file @file{stage2/shared.h}, for more information.
@node Embedded data
@chapter Embedded variables in GRUB
GRUB's @dfn{stage1} and @dfn{stage2} have embedded variables whose
locations are well-defined, so that the installation can patch the
binary file directly without recompilation of the modules.
In @dfn{stage1}, these are defined (The number in the parenthesis of
each entry is an offset number):
@table @asis
@item @dfn{stage1 version} (0x3e)
This is the version bytes (should 03:00).
@item @dfn{loading drive} (0x40)
This is the BIOS drive number to load the block from. If the number
is 0xff, then load from the booting drive.
@item @dfn{stage2 sector} (0x41)
This is the location of the first sector of the @dfn{stage2}.
@item @dfn{stage2 address} (0x45)
This is the data for the @code{jmp} command to the starting address of
the component loaded by the stage1.
A @dfn{stage1.5} should be loaded at address 0x2000, and a @dfn{stage2}
should be loaded at address 0x8000. Both use a CS of 0.
@item @dfn{stage2 segment} (0x47)
This is the segment of the starting address of the component loaded by
the @dfn{stage1}.
@end table
In the first sector of @dfn{stage1.5} and @dfn{stage2}, the blocklists
are recorded between @dfn{firstlist} (0x200) and @dfn{lastlist}
(determined when assembling the file @file{stage2/start.S}).
The trick here is that it is actually read backward, and the first
8-byte blocklist is not read here, but after the pointer is decremented
8 bytes, then after reading it, it decrements again, reads, decrements,
reads, etc. until it is finished. The terminating condition is when the
number of sectors to be read in the next blocklist is 0.
The format of a blocklist can be seen from the example in the code just
before the @code{firstlist} label. Note that it is always from the
beginning of the disk, and @emph{not} relative to the partition
boundaries.
In @dfn{stage1.5} and @dfn{stage2} (these are all defined at the
beginning of @file{shared_src/asm.S}):
@table @asis
@item @dfn{major version} (0x6)
This is the major version byte (should be 3).
@item @dfn{minor version} (0x7)
This is the minor version byte (should be 0).
@item @dfn{install_partition} (0x8)
This is an unsigned long representing the partition on the currently
booted disk which GRUB should expect to find it's data files and treat
as the default root partition.
The format of is exactly the same as the @dfn{partition} part (the
@dfn{disk} part is ignored) of the data passed to an OS by a
Multiboot-compliant boot loader in the @dfn{boot_device} data element,
with one exception.
The exception is that if the first level of disk partitioning is left as
0xFF (decimal 255, which is marked as no partitioning being used), but
the second level does have a partition number, it looks for the first
BSD-style PC partition, and finds the numbered BSD sub-partition in it.
The default @dfn{install_partition} 0xFF00FF, would then find the first
BSD-style PC partition, and use the @samp{a} partition in it, and
0xFF01FF would use the @samp{b} partition, etc.
If an explicit first-level partition is given, then no search is
performed, and it will expect that the BSD-style PC partition is in the
appropriate location, else a @samp{no such partition} error will be
returned.
If a @dfn{stage1.5} is being used, it will pass its own
@dfn{install_partition} to any @dfn{stage2} it loads, therefore
overwriting the one present in the @dfn{stage2}.
@item @dfn{stage2_id} (0xc)
This is the @dfn{stage1.5} or @dfn{stage2} identifier.
@item @dfn{version_string} (0xd)
This is the @dfn{stage1.5} or @dfn{stage2} version string. It isn't
meant to be changed, simply easy to find.
@item @dfn{config_file} (after the terminating zero of @dfn{version_string})
This is the location, using the GRUB filesystem syntax, of the config
file. It will, by default, look in the @dfn{install_partition} of the
disk GRUB was loaded from, though one can use any valid GRUB filesystem
string, up to and including making it look on other disks.
The boot loader itself doesn't search for the end of
@dfn{version_string}, it simply knows where @dfn{config_file} is, so the
beginning of the string cannot be moved after compile-time. This should
be OK, since the @dfn{version_string} is meant to be static.
The code of @dfn{stage2} starts again at offset 0x70, so
@dfn{config_file} string obviously can't go past there. Also, remember
to terminate the string with a 0.
Note that @dfn{stage1.5} uses a tricky internal representation for
@dfn{config_file}, which is the format of
@code{@var{device}:@var{filename}} (@samp{:} is not present actually).
@var{device} is an unsigned long like @dfn{install_partition}, and
@var{filename} is an absolute filename or a blocklist. If @var{device}
is disabled, that is, the drive number is 0xff, then @dfn{stage1.5} uses
the @dfn{boot drive} and the @dfn{install partition} instead.
@end table
@node Filesystem interface
@chapter The generic interface for the fs code
For any particular partition, it is presumed that only one of the
@dfn{normal} filesystems such as FAT, FFS, or ext2fs can be used, so
there is a switch table managed by the functions in
@file{disk_io.c}. The notation is that you can only @dfn{mount} one at a
time.
The blocklist filesystem has a special place in the system. In addition
to the @dfn{normal} filesystem (or even without one mounted), you can
access disk blocks directly (in the indicated partition) via the
blocklist notation. Using the blocklist filesystem doesn't effect any
other filesystem mounts.
The variables which can be read by the filesystem backend are:
@vtable @code
@item current_drive
Contain the current BIOS drive number (numbered from 0, if a floppy,
and numbered from 0x80, if a hard disk).
@item current_partition
Contain the current partition number.
@item current_slice
Contain the current partition type.
@item saved_drive
Contain the @dfn{drive} part of the root device.
@item saved_partition
Contain the @dfn{partition} part of the root device.
@item part_start
Contain the current partition starting address.
@item part_length
Contain the current partition length, in sectors.
@item print_possibilities
True when the @code{dir} function should print the possible completions
of a file, and false when it should try to actually open a file of that
name.
@item FSYS_BUF
Point to a filesystem buffer which is 32K in size, to use in any way
which the filesystem backend desires.
@end vtable
The variables which need to be written by a filesystem backend are:
@vtable @code
@item filepos
Should be the current position in the file.
@strong{Caution:} the value of @var{filepos} can be changed out from
under the filesystem code in the current implementation. Don't depend on
it being the same for later calls into the back-end code!
@item filemax
Should be the length of the file.
@item disk_read_func
Should be set to the value of @samp{disk_read_hook} @emph{only} during
reading of data for the file, not any other fs data, inodes, FAT tables,
whatever, then set to @code{NULL} at all other times (it will be
@code{NULL} by default). If this isn't done correctly, then the
@command{testload} and @command{install} commands won't work
correctly.
@end vtable
The functions expected to be used by the filesystem backend are:
@ftable @code
@item devread
Only read sectors from within a partition. Sector 0 is the first sector
in the partition.
@item grub_read
If the backend uses the blocklist code (like the FAT filesystem backend
does), then @code{grub_read} can be used, after setting @var{block_file}
to 1.
@end ftable
The functions expected to be defined by the filesystem backend are
described at least moderately in the file @file{filesys.h}. Their usage
is fairly evident from their use in the functions in @file{disk_io.c},
look for the use of the @var{fsys_table} array.
@strong{Caution:} The semantics are such that then @samp{mount}ing the
filesystem, presume the filesystem buffer @var{FSYS_BUF} is corrupted,
and (re-)load all important contents. When opening and reading a file,
presume that the data from the @samp{mount} is available, and doesn't
get corrupted by the open/read (i.e. multiple opens and/or reads will be
done with only one mount if in the same filesystem).
@node Bootstrap tricks
@chapter The bootstrap mechanism used in GRUB
The disk space can be used in a boot loader is very restricted because
a MBR (@pxref{MBR}) is only 512 bytes but it also contains a partition
table (@pxref{Partition table}) and a BPB. So the question is how to
make a boot loader code enough small to be fit in a MBR.
However, GRUB is a very large program, so we break GRUB into 2 (or 3)
distinct components, @dfn{stage1} and @dfn{stage2} (and optionally
@dfn{stage1.5}). @xref{Memory map}, for more information.
We embed @dfn{stage1} in a MBR or in the boot sector of a partition
, and place @dfn{stage2} in a filesystem. The optional
@dfn{stage1.5} can be installed in a filesystem, in the @dfn{boot loader}
area in a FFS, and in the sectors right after a MBR, because
@dfn{stage1.5} is enough small and the sectors right after a MBR is
normally an unused region. The size of this region is the number of
sectors per head minus 1.
Thus, all the @dfn{stage1} must do is just load a @dfn{stage2} or
@dfn{stage1.5}. But even if @dfn{stage1} needs not to support the user
interface or the filesystem interface, it is impossible to make
@dfn{stage1} less than 400 bytes, because GRUB should support both the
CHS mode and the LBA mode (@pxref{Low-level disk I/O}).
The solution used by GRUB is that @dfn{stage1} loads only the first
sector of a @dfn{stage2} (or a @dfn{stage1.5}) and @dfn{stage2} itself
loads the rest. The flow of @dfn{stage1} is:
@enumerate
@item
Initialize the system briefly.
@item
Detect the geometry and the accessing mode of the @dfn{loading drive}.
@item
Load the first sector of the @dfn{stage2}.
@item
Jump to the starting address of the @dfn{stage2}.
@end enumerate
The flow of @dfn{stage2} (and @dfn{stage1.5}) is:
@enumerate
@item
Load the rest of itself to the real starting address, that is, the
starting address plus 512 bytes. The blocklists are stored in the last
part of the first sector.
@item
Long jump to the real starting address.
@end enumerate
Note that @dfn{stage2} (or @dfn{stage1.5}) does not probe the geometry
or the accessing mode of the @dfn{loading drive}, since @dfn{stage1} has
already probed them.
@node I/O ports detection
@chapter How to detect I/O ports used for a BIOS drive
In the @sc{pc} world, BIOS cannot detect if a hard disk drive is SCSI or
IDE, generally speaking. Thus, it is not trivial to know which BIOS
drive corresponds to an OS device. So the Multiboot Specification
describes some techniques on how to guess mappings (@pxref{BIOS device
mapping techniques, Multiboot Specification, BIOS device mapping
techniques, multiboot, The Multiboot Specification}).
However, the techniques described are unreliable or difficult to be
implemented, so we use a different technique from them in GRUB. Our
technique is @dfn{INT 13H tracking technique}. More precisely, it runs
the INT 13 call (@pxref{Low-level disk I/O}) in single-step mode just
like a debugger and parses the instructions.
To execute the call one instruction at a time, set the TF (trap flag)
flag in the register @dfn{FLAGS}. By this, your CPU generates @dfn{Break
Point Trap} after each instruction is executed and call INT 1. In the
stack in the interrupt handler, callee's FLAGS and the far pointer which
points to the next instruction to be executed are pushed, so we can know
what instruction will be executed in the next time and the current
contents of all the registers. If the next instruction is an I/O
operation, the interrupt handler adds the I/O port into the @dfn{I/O
map}.
If the INT 13 handler returns, the TF flag is cleared automatically by
the instruction @code{iret}, and then output the I/O map on the screen.
See the source code for the command @command{ioprobe}
(@pxref{Command-line-specific commands}), for more information.
@node Memory detection
@chapter How to detect all installed @sc{ram}
There are three BIOS calls which return the information of installed
@sc{ram}. GRUB uses these calls to detect all installed @sc{ram} and
which address range should be treated by operating systems.
@menu
* Query System Address Map:: INT 15H, AX=E820h interrupt call
* Get Large Memory Size:: INT 15H, AX=E801h interrupt call
* Get Extended Memory Size:: INT 15H, AX=88h interrupt call
@end menu
@node Query System Address Map
@section INT 15H, AX=E820h interrupt call
Real mode only.
This call returns a memory map of all the installed @sc{ram}, and of
physical memory ranges reserved by the BIOS. The address map is returned
by making successive calls to this API, each returning one "run" of
physical address information. Each run has a type which dictates how
this run of physical address range should be treated by the operating
system.
If the information returned from INT 15h, AX=E820h in some way differs
from INT 15h, AX=E801h (@pxref{Get Large Memory Size}) or INT 15h AH=88h
(@pxref{Get Extended Memory Size}), then the information returned from
E820h supersedes what is returned from these older interfaces. This
allows the BIOS to return whatever information it wishes to for
compatibility reasons.
Input:
@multitable @columnfractions .15 .25 .6
@item @code{EAX} @tab Function Code @tab E820h
@item @code{EBX} @tab Continuation @tab Contains the @dfn{continuation
value} to get the next run of physical memory. This is the value
returned by a previous call to this routine. If this is the first call,
@code{EBX} must contain zero.
@item @code{ES:DI} @tab Buffer Pointer @tab Pointer to an Address Range
Descriptor structure which the BIOS is to fill in.
@item @code{ECX} @tab Buffer Size @tab The length in bytes of the
structure passed to the BIOS. The BIOS will fill in at most @code{ECX}
bytes of the structure or however much of the structure the BIOS
implements. The minimum size which must be supported by both the BIOS
and the caller is 20 bytes. Future implementations may extend this
structure.
@item @code{EDX} @tab Signature @tab @samp{SMAP} - Used by the BIOS to
verify the caller is requesting the system map information to be
returned in @code{ES:DI}.
@end multitable
Output:
@multitable @columnfractions 0.15 0.25 0.6
@item @code{CF} @tab Carry Flag @tab Non-Carry - indicates no error
@item @code{EAX} @tab Signature @tab @samp{SMAP} - Signature to verify
correct BIOS revision.
@item @code{ES:DI} @tab Buffer Pointer @tab Returned Address Range
Descriptor pointer. Same value as on input.
@item @code{ECX} @tab Buffer Size @tab Number of bytes returned by the
BIOS in the address range descriptor. The minimum size structure
returned by the BIOS is 20 bytes.
@item @code{EBX} @tab Continuation @tab Contains the continuation value
to get the next address descriptor. The actual significance of the
continuation value is up to the discretion of the BIOS. The caller must
pass the continuation value unchanged as input to the next iteration of
the E820h call in order to get the next Address Range Descriptor. A
return value of zero means that this is the last descriptor. Note that
the BIOS indicate that the last valid descriptor has been returned by
either returning a zero as the continuation value, or by returning
carry.
@end multitable
The Address Range Descriptor Structure is:
@multitable @columnfractions 0.25 0.3 0.45
@item Offset in Bytes @tab Name @tab Description
@item 0 @tab @dfn{BaseAddrLow} @tab Low 32 Bits of Base Address
@item 4 @tab @dfn{BaseAddrHigh} @tab High 32 Bits of Base Address
@item 8 @tab @dfn{LengthLow} @tab Low 32 Bits of Length in Bytes
@item 12 @tab @dfn{LengthHigh} @tab High 32 Bits of Length in Bytes
@item 16 @tab @dfn{Type} @tab Address type of this range
@end multitable
The @dfn{BaseAddrLow} and @dfn{BaseAddrHigh} together are the 64 bit
@dfn{BaseAddress} of this range. The @dfn{BaseAddress} is the physical
address of the start of the range being specified.
The @dfn{LengthLow} and @dfn{LengthHigh} together are the 64 bit
@dfn{Length} of this range. The @dfn{Length} is the physical contiguous
length in bytes of a range being specified.
The @dfn{Type} field describes the usage of the described address range
as defined in the table below:
@multitable @columnfractions 0.1 0.35 0.55
@item Value @tab Mnemonic @tab Description
@item 1 @tab @dfn{AddressRangeMemory} @tab This run is available
@sc{ram} usable by the operating system.
@item 2 @tab @dfn{AddressRangeReserved} @tab This run of addresses is in
use or reserved by the system, and must not be used by the operating
system.
@item Other @tab @dfn{Undefined} @tab Undefined - Reserved for future
use. Any range of this type must be treated by the OS as if the type
returned was @dfn{AddressRangeReserved}.
@end multitable
The BIOS can use the @dfn{AddressRangeReserved} address range type to
block out various addresses as @emph{not suitable} for use by a
programmable device.
Some of the reasons a BIOS would do this are:
@itemize @bullet
@item
The address range contains system @sc{rom}.
@item
The address range contains @sc{ram} in use by the @sc{rom}.
@item
The address range is in use by a memory mapped system device.
@item
The address range is for whatever reason are unsuitable for a
standard device to use as a device memory space.
@end itemize
Here is the list of assumptions and limitations:
@enumerate
@item
The BIOS will return address ranges describing base board memory and ISA
or PCI memory that is contiguous with that base board memory.
@item
The BIOS @emph{will not} return a range description for the memory
mapping of PCI devices. ISA Option @sc{rom}'s, and ISA plug & play
cards. This is because the OS has mechanisms available to detect them.
@item
The BIOS will return chipset defined address holes that are not being
used by devices as reserved.
@item
Address ranges defined for base board memory mapped I/O devices (for
example APICs) will be returned as reserved.
@item
All occurrences of the system BIOS will be mapped as reserved. This
includes the area below 1 MB, at 16 MB (if present) and at end of the
address space (4 GB).
@item
Standard PC address ranges will not be reported. Example video memory at
A0000 to BFFFF physical will not be described by this function. The
range from E0000 to EFFFF is base board specific and will be reported as
suits the bas board.
@item
All of lower memory is reported as normal memory. It is OS's
responsibility to handle standard @sc{ram} locations reserved for
specific uses, for example: the interrupt vector table (0:0) and the
BIOS data area (40:0).
@end enumerate
Here we explain an example address map. This sample address map
describes a machine which has 128 MB @sc{ram}, 640K of base memory and
127 MB extended. The base memory has 639K available for the user and 1K
for an extended BIOS data area. There is a 4 MB Linear Frame Buffer
(LFB) based at 12 MB. The memory hole created by the chipset is from 8
M to 16 M. There are memory mapped APIC devices in the system. The IO
Unit is at FEC00000 and the Local Unit is at FEE00000. The system BIOS
is remapped to 4G - 64K.
Note that the 639K endpoint of the first memory range is also the base
memory size reported in the BIOS data segment at 40:13.
Key to types: @dfn{ARM} is AddressRangeMemory, @dfn{ARR} is
AddressRangeReserved.
@multitable @columnfractions 0.15 0.1 0.1 0.65
@item Base (Hex) @tab Length @tab Type @tab Description
@item 0000 0000 @tab 639K @tab ARM @tab Available Base memory -
typically the same value as is returned via the INT 12 function.
@item 0009 FC00 @tab 1K @tab ARR @tab Memory reserved for use by the
BIOS(s). This area typically includes the Extended BIOS data area.
@item 000F 0000 @tab 64K @tab ARR @tab System BIOS.
@item 0010 0000 @tab 7M @tab ARM @tab Extended memory, this is not
limited to the 64MB address range.
@item 0080 0000 @tab 8M @tab ARR @tab Chipset memory hole required to
support the LFB mapping at 12 MB.
@item 0100 0000 @tab 120M @tab ARM @tab Base board @sc{ram} relocated
above a chipset memory hole.
@item FE00 0000 @tab 4K @tab ARR @tab IO APIC memory mapped I/O at
FEC00000. Note the range of addresses required for an APIC device may
vary from base OEM to OEM.
@item FEE0 0000 @tab 4K @tab ARR @tab Local APIC memory mapped I/O at
FEE00000.
@item FFFF 0000 @tab 64K @tab ARR @tab Remapped System BIOS at end of
address space.
@end multitable
The following code segment is intended to describe the algorithm needed
when calling the Query System Address Map function. It is an
implementation example and uses non standard mechanisms.
@example
E820Present = FALSE;
Regs.ebx = 0;
do
@{
Regs.eax = 0xE820;
Regs.es = SEGMENT (&Descriptor);
Regs.di = OFFSET (&Descriptor);
Regs.ecx = sizeof (Descriptor);
Regs.edx = 'SMAP';
_int (0x15, Regs);
if ((Regs.eflags & EFLAGS_CARRY) || Regs.eax != 'SMAP')
@{
break;
@}
if (Regs.ecx < 20 || Regs.ecx > sizeof (Descriptor))
@{
/* bug in bios - all returned descriptors must be at
least 20 bytes long, and can not be larger than
the input buffer. */
break;
@}
E820Present = TRUE;
.
.
.
Add address range Descriptor.BaseAddress through
Descriptor.BaseAddress + Descriptor.Length
as type Descriptor.Type
.
.
.
@}
while (Regs.ebx != 0);
if (! E820Present)
@{
.
.
.
call INT 15H, AX E801h and/or INT 15H, AH=88h to obtain old style
memory information
.
.
.
@}
@end example
@node Get Large Memory Size
@section INT 15H, AX=E801h interrupt call
Real mode only.
Originally defined for EISA servers, this interface is capable of
reporting up to 4 GB of @sc{ram}. While not nearly as flexible as
E820h, it is present in many more systems.
Input:
@multitable @columnfractions 0.15 0.25 0.6
@item @code{AX} @tab Function Code @tab E801h.
@end multitable
Output:
@multitable @columnfractions 0.15 0.25 0.6
@item @code{CF} @tab Carry Flag @tab Non-Carry - indicates no error.
@item @code{AX} @tab Extended 1 @tab Number of contiguous KB between 1
and 16 MB, maximum 0x3C00 = 15 MB.
@item @code{BX} @tab Extended 2 @tab Number of contiguous 64KB blocks
between 16 MB and 4GB.
@item @code{CX} @tab Configured 1 @tab Number of contiguous KB between 1
and 16 MB, maximum 0x3c00 = 15 MB.
@item @code{DX} @tab Configured 2 @tab Number of contiguous 64KB blocks
between 16 MB and 4 GB.
@end multitable
Not sure what this difference between the @dfn{Extended} and
@dfn{Configured} numbers are, but they appear to be identical, as
reported from the BIOS.
It is possible for a machine using this interface to report a memory
hole just under 16 MB (Count 1 is less than 15 MB, but Count 2 is
non-zero).
@node Get Extended Memory Size
@section INT 15H, AX=88h interrupt call
Real mode only.
This interface is quite primitive. It returns a single value for
contiguous memory above 1 MB. The biggest limitation is that the value
returned is a 16-bit value, in KB, so it has a maximum saturation of
just under 64 MB even presuming it returns as much as it can. On some
systems, it won't return anything above the 16 MB boundary.
The one useful point is that it works on every PC available.
Input:
@multitable @columnfractions 0.15 0.25 0.6
@item @code{AH} @tab Function Code @tab 88h
@end multitable
Output:
@multitable @columnfractions 0.15 0.25 0.6
@item @code{CF} @tab Carry Flag @tab Non-Carry - indicates no error.
@item @code{AX} @tab Memory Count @tab Number of contiguous KB above 1
MB.
@end multitable
@node Low-level disk I/O
@chapter INT 13H disk I/O interrupts
In the PC world, living with the BIOS disk interface is definitely a
nightmare. This section documents how awful the chaos is and how GRUB
deals with the BIOS disks.
@menu
* CHS Translation:: CHS addressing and LBA addressing
* CHS mode disk I/O:: INT 13H, AH=0xh interrupt call
* LBA mode disk I/O:: INT 13H, AH=4xh interrupt call
@end menu
@node CHS Translation
@section CHS addressing and LBA addressing
CHS --- Cylinder/Head/Sector --- is the traditional way to address
sectors on a disk. There are at least two types of CHS addressing; the
CHS that is used at the INT 13H interface and the CHS that is used at
the ATA device interface. In the MFM/RLL/ESDI and early ATA days the CHS
used at the INT 13H interface was the same as the CHS used at the device
interface.
Today we have CHS translating BIOS types that can use one CHS at the INT
13H interface and a different CHS at the device interface. These two
types of CHS will be called the logical CHS or @dfn{L-CHS} and the
physical CHS or @dfn{P-CHS} in this section. L-CHS is the CHS used at
the INT 13H interface and P-CHS is the CHS used at the device interface.
The L-CHS used at the INT 13 interface allows up to 256 heads, up to
1024 cylinders and up to 63 sectors. This allows support of up to 8GB
drives. This scheme started with either ESDI or SCSI adapters many years
ago.
The P-CHS used at the device interface allows up to 16 heads up to 65535
cylinders, and up to 63 sectors. This allows access to about 2^26 sectors
(32GB) on an ATA device. When a P-CHS is used at the INT 13H interface
it is limited to 1024 cylinders, 16 heads and 63 sectors. This is where
the old 528MB limit originated.
LBA --- Logical Block Address --- is another way of addressing sectors
that uses a simple numbering scheme starting with zero as the address of
the first sector on a device. The ATA standard requires that cylinder 0,
head 0, sector 1 address the same sector as addressed by LBA 0. LBA
addressing can be used at the ATA interface if the ATA device supports
it. LBA addressing is also used at the INT 13H interface by the AH=4xH
read/write calls.
ATA devices may also support LBA at the device interface. LBA allows
access to approximately 2^28 sectors (137GB) on an ATA device.
A SCSI host adapter can convert a L-CHS directly to an LBA used in the
SCSI read/write commands. On a PC today, SCSI is also limited to 8GB
when CHS addressing is used at the INT 13H interface.
First, all OS's that want to be co-resident with another OS (and that is
all of the PC based OS's that I know of) @emph{must} use INT 13H to
determine the capacity of a hard disk. And that capacity information
@emph{must} be determined in L-CHS mode. Why is this? Because:
@enumerate
@item
FDISK and the partition tables are really L-CHS based.
@item
MS/PC DOS uses INT 13H AH=02H and AH=03H to read and write the disk and
these BIOS calls are L-CHS based.
@item
The boot processing done by the BIOS is all L-CHS based.
@end enumerate
During the boot processing, all of the disk read accesses are done in
L-CHS mode via INT 13H and this includes loading the first of the OS's
kernel code or boot manager's code.
Second, because there can be multiple BIOS types in any one system, each
drive may be under the control of a different type of BIOS. For example,
drive 80H (the first hard drive) could be controlled by the original
system BIOS, drive 81H (the second drive) could be controlled by a
option @sc{rom} BIOS and drive 82H (the third drive) could be controlled
by a software driver. Also, be aware that each drive could be a
different type, for example, drive 80H could be an MFM drive, drive 81H
could be an ATA drive, drive 82H could be a SCSI drive.
Third, not all OS's understand or use BIOS drive numbers greater than
81H. Even if there is INT 13H support for drives 82H or greater, the OS
may not use that support.
Fourth, the BIOS INT 13H configuration calls are:
@table @asis
@item AH=08H, Get Drive Parameters
This call is restricted to drives up to 528MB without CHS translation
and to drives up to 8GB with CHS translation. For older BIOS with no
support for >1024 cylinders or >528MB, this call returns the same CHS as
is used at the ATA interface (the P-CHS). For newer BIOS's that do
support >1024 cylinders or >528MB, this call returns a translated CHS
(the L-CHS). The CHS returned by this call is used by FDISK to build
partition records.
@item AH=41H, Get BIOS Extensions Support
This call is used to determine if the IBM/Microsoft Extensions or if the
Phoenix Enhanced INT 13H calls are supported for the BIOS drive number.
@item AH=48H, Extended Get Drive Parameters
This call is used to determine the CHS geometries, LBA information and
other data about the BIOS drive number.
@end table
An ATA disk must implement both CHS and LBA addressing and must at any
given time support only one P-CHS at the device interface. And, the
drive must maintain a strict relationship between the sector addressing
in CHS mode and LBA mode. Quoting @cite{the ATA-2 document}:
@example
@group
LBA = ( (cylinder * heads_per_cylinder + heads )
* sectors_per_track ) + sector - 1
where heads_per_cylinder and sectors_per_track are the current
translation mode values.
@end group
@end example
This algorithm can also be used by a BIOS or an OS to convert a L-CHS to
an LBA.
This algorithm can be reversed such that an LBA can be converted to a
CHS:
@example
@group
cylinder = LBA / (heads_per_cylinder * sectors_per_track)
temp = LBA % (heads_per_cylinder * sectors_per_track)
head = temp / sectors_per_track
sector = temp % sectors_per_track + 1
@end group
@end example
While most OS's compute disk addresses in an LBA scheme, an OS like DOS
must convert that LBA to a CHS in order to call INT 13H.
The basic problem is that there is no requirement that a CHS translating
BIOS followed these rules. There are many other algorithms that can be
implemented to perform a similar function. Today, there are at least two
popular implementations: the Phoenix implementation (described above) and
the non-Phoenix implementations. Because a protected mode OS that does
not want to use INT 13H must implement the same CHS translation
algorithm. If it doesn't, your data gets scrambled.
In the perfect world of tomorrow, maybe only LBA will be used. But today
we are faced with the following problems:
@itemize @bullet
@item
Some drives >528MB don't implement LBA.
@item
Some drives are optimized for CHS and may have lower performance when
given commands in LBA mode. Don't forget that LBA is something new for
the ATA disk designers who have worked very hard for many years to
optimize CHS address handling. And not all drive designs require the use
of LBA internally.
@item
The L-CHS to LBA conversion is more complex and slower than the bit
shifting L-CHS to P-CHS conversion.
@item
DOS, FDISK and the MBR are still CHS based --- they use the CHS returned
by INT 13H AH=08H. Any OS that can be installed on the same disk with
DOS must understand CHS addressing.
@item
The BIOS boot processing and loading of the first OS kernel code is done
in CHS mode --- the CHS returned by INT 13H AH=08H is used.
@item
Microsoft has said that their OS's will not use any disk capacity that
can not also be accessed by INT 13H AH=0xH.
@end itemize
These are difficult problems to overcome in today's industry
environment. The result: chaos.
@node CHS mode disk I/O
@section INT 13H, AH=0xh interrupt call
Real mode only. These functions are the traditional CHS mode disk
interface. GRUB calls them only if LBA mode is not available.
INT 13H, AH=02h reads sectors into memory.
Input:
@multitable @columnfractions .15 .85
@item @code{AH} @tab 02h
@item @code{AL} @tab The number of sectors to read (must be non-zero).
@item @code{CH} @tab Low 8 bits of cylinder number.
@item @code{CL} @tab Sector number in bits 0-5, and high 2 bits of
cylinder number in bits 6-7.
@item @code{DH} @tab Head number.
@item @code{DL} @tab Drive number (bit 7 set for hard disk).
@item @code{ES:BX} @tab Data buffer.
@end multitable
Output:
@multitable @columnfractions .15 .85
@item @code{CF} @tab Set on error.
@item @code{AH} @tab Status.
@item @code{AL} @tab The number of sectors transferred (only valid if CF
set for some BIOSes).
@end multitable
INT 13H, AH=03h writes disk sectors.
Input:
@multitable @columnfractions .15 .85
@item @code{AH} @tab 03h
@item @code{AL} @tab The number of sectors to write (must be non-zero).
@item @code{CH} @tab Low 8 bits of cylinder number.
@item @code{CL} @tab Sector number in bits 0-5, and high 2 bits of
cylinder number in bits 6-7.
@item @code{DH} @tab Head number.
@item @code{DL} @tab Drive number (bit 7 set for hard disk).
@item @code{ES:BX} @tab Data buffer.
@end multitable
Output:
@multitable @columnfractions .15 .85
@item @code{CF} @tab Set on error.
@item @code{AH} @tab Status.
@item @code{AL} @tab The number of sectors transferred (only valid if CF
set for some BIOSes).
@end multitable
INT 13H, AH=08h returns drive parameters. For systems predating the IBM
PC/AT, this call is only valid for hard disks.
Input:
@multitable @columnfractions .15 .85
@item @code{AH} @tab 08h
@item @code{DL} @tab Drive number (bit 7 set for hard disk).
@end multitable
Output:
@multitable @columnfractions .15 .85
@item @code{CF} @tab Set on error.
@item @code{AH} @tab 0.
@item @code{AL} @tab 0 on at least some BIOSes.
@item @code{BL} @tab Drive type (AT/PS2 floppies only).
@item @code{CH} @tab Low 8 bits of maximum cylinder number.
@item @code{CL} @tab Maximum sector number in bits 0-5, and high 2 bits
of maximum cylinder number in bits 6-7.
@item @code{DH} @tab Maximum head number.
@item @code{DL} @tab The number of drives.
@item @code{ES:DI} @tab Drive parameter table (floppies only).
@end multitable
@node LBA mode disk I/O
@section INT 13H, AH=4xh interrupt call
Real mode only. These functions are IBM/MS INT 13 Extensions to support
LBA mode. GRUB uses them if available so that it can read/write over 8GB
area.
INT 13, AH=41h checks if LBA is supported.
Input:
@multitable @columnfractions 0.15 0.85
@item @code{AH} @tab 41h.
@item @code{BX} @tab 55AAh.
@item @code{DL} @tab Drive number.
@end multitable
Output:
@multitable @columnfractions 0.15 0.85
@item @code{CF} @tab Set on error.
@item @code{AH} @tab Major version of extensions (01h for 1.x, 20h for
2.0 / EDD-1.0, 21h for 2.1 / EDD-1.1 and 30h for EDD-3.0) if successful,
otherwise 01h (the error code of @dfn{invalid function}).
@item @code{BX} @tab AA55h if installed.
@item @code{AL} @tab Internal use.
@item @code{CX} @tab API subset support bitmap (see below).
@item @code{DH} @tab Extension version.
@end multitable
The bitfields for the API subset support bitmap are@footnote{It is known
that (at least) the AMI BIOS in SuperMicro P6SBA motherboard
(AMIBIOSC0631) does @emph{not} return the bitfields correctly.}:
@multitable @columnfractions 0.15 0.85
@item Bit(s) @tab Description
@item 0 @tab Extended disk access functions (AH=42h-44h, 47h, 48h)
supported.
@item 1 @tab Removable drive controller functions (AH=45h, 46h, 48h,
49h, INT 15H, AH=52h) supported.
@item 2 @tab Enhanced disk drive (EDD) functions (AH=48h, 4Eh)
supported.
@item 3-15 @tab Reserved (0).
@end multitable
INT 13, AH=42h reads sectors into memory.
Input:
@multitable @columnfractions .15 .85
@item @code{AH} @tab 42h.
@item @code{DL} @tab Drive number.
@item @code{DS:SI} @tab Disk Address Packet (see below).
@end multitable
Output:
@multitable @columnfractions .15 .85
@item @code{CF} @tab Set on error.
@item @code{AH} @tab 0 if successful, otherwise error code.
@end multitable
The format of @dfn{Disk Address Packet} is:
@multitable @columnfractions 0.15 0.15 0.7
@item Offset (hex) @tab Size (byte) @tab Description
@item 00 @tab 1 @tab 10h (The size of packet).
@item 01 @tab 1 @tab Reserved (0).
@item 02 @tab 2 @tab The number of blocks to transfer (max 007F for
Phoenix EDD).
@item 04 @tab 4 @tab Transfer buffer (SEGMENT:OFFSET).
@item 08 @tab 8 @tab Starting absolute block number.
@end multitable
INT 13, AH=43h writes disk sectors.
Input:
@multitable @columnfractions 0.15 0.85
@item @code{AH} @tab 43h.
@item @code{AL} @tab Write flags (In version 1.0 and 2.0, bit 0 is the
flag for @dfn{verify write} and other bits are reserved (0). In version
2.1, 00h and 01h indicates @dfn{write without verify}, and 02h indicates
@dfn{write with verify}.
@item @code{DL} @tab Drive number.
@item @code{DS:SI} @tab Disk Address Packet (see above).
@end multitable
Output:
@multitable @columnfractions 0.15 0.85
@item @code{CF} @tab Set on error.
@item @code{AH} @tab 0 if successful, otherwise error code.
@end multitable
INT 13, AH=48h returns drive parameters. GRUB only makes use of the
total number of sectors, and ignore the CHS information, because only
L-CHS makes sense. @xref{CHS Translation}, for more information.
Input:
@multitable @columnfractions 0.15 0.85
@item @code{AH} @tab 48h.
@item @code{DL} @tab Drive number.
@item @code{DS:SI} @tab Buffer for drive parameters (see below).
@end multitable
Output:
@multitable @columnfractions 0.15 0.85
@item @code{CF} @tab Set on error.
@item @code{AH} @tab 0 if successful, otherwise error code.
@end multitable
The format of drive parameters is:
@multitable @columnfractions 0.25 0.15 0.6
@item Offset (hex) @tab Size (byte) @tab Description
@item 00 @tab 2 @tab The size of buffer. Before calling this function,
set to the maximum buffer size, at least 1Ah. The size actually filled
is returned (1Ah for version 1.0, 1Eh for 2.x and 42h for 3.0).
@item 02 @tab 2 @tab Information flags (see below).
@item 04 @tab 4 @tab The number of physical cylinders.
@item 08 @tab 4 @tab The number of physical heads.
@item 0C @tab 4 @tab The number of physical sectors per track.
@item 10 @tab 8 @tab The total number of sectors.
@item 18 @tab 2 @tab The bytes per sector.
@comment Add an empty row for readability...
@item @tab @tab
@item @strong{v2.0 and later} @tab @tab
@item 1A @tab 4 @tab EDD configuration parameters.
@comment Add an empty row for readability...
@item @tab @tab
@item @strong{v3.0} @tab @tab
@item 1E @tab 2 @tab Signature BEDD to indicate presence of Device Path
information.
@item 20 @tab 1 @tab The length of Device Path information, including
signature and this byte (24h for version 3.0).
@item 21 @tab 3 @tab Reserved (0).
@item 24 @tab 4 @tab ASCIZ name of host bus (@samp{ISA} or @samp{PCI}).
@item 28 @tab 8 @tab ASCIZ name of interface type (@samp{ATA},
@samp{ATAPI}, @samp{SCSI}, @samp{USB}, @samp{1394} or @samp{FIBRE}).
@item 30 @tab 8 @tab Interface Path.
@item 38 @tab 8 @tab Device Path.
@item 40 @tab 1 @tab Reserved (0).
@item 41 @tab 1 @tab Checksum of bytes 1Eh-40h (2's complement of sum,
which makes the 8 bit sum of bytes 1Eh-41h equal to 00h).
@end multitable
The information flags are:
@multitable @columnfractions 0.15 0.85
@item Bit(s) @tab Description
@item 0 @tab DMA boundary errors handles transparently.
@item 1 @tab CHS information is valid.
@item 2 @tab Removable drive.
@item 3 @tab Write with verify supported.
@item 4 @tab Drive has change-line support (required if drive is
removable).
@item 5 @tab Drive can be locked (required if drive is removable).
@item 6 @tab CHS information set to maximum supported values, not
current media.
@item 7-15 @tab Reserved (0).
@end multitable
@node MBR
@chapter The structure of Master Boot Record
A Master Boot Record (@dfn{MBR}) is the sector at cylinder 0, head 0,
sector 1 of a hard disk. A MBR-like structure must be created in each of
partitions by the FDISK program.
At the completion of your system's Power On Self Test (@dfn{POST}), INT
19H is called. Usually INT 19 tries to read a boot sector from the first
floppy drive@footnote{Which drive is read first depends on your BIOS
settings.}. If a boot sector is found on the floppy disk, that boot
sector is read into memory at location 0000:7C00 and INT 19H jumps to
memory location 0000:7C00. However, if no boot sector is found on the
first floppy drive, INT 19H tries to read the MBR from the first hard
drive. If an MBR is found it is read into memory at location 0000:7C00
and INT 19H jumps to memory location 0000:7C00. The small program in the
MBR will attempt to locate an active (bootable) partition in its
partition table@footnote{This behavior is DOS MBR's, and GRUB ignores
the active flag.}. The small program in the boot sector must locate the
first part of the operating system's kernel loader program (or perhaps
the kernel itself or perhaps a @dfn{boot manager program}) and read that
into memory.
INT 19H is also called when the @key{CTRL}-@key{ALT}-@key{DEL} keys are
used. On most systems, @key{CTRL}-@key{ALT}-@key{DEL} causes an short
version of the POST to be executed before INT 19H is called.
The stuff is:
@table @asis
@item Offset 0000
The address where the MBR code starts.
@item Offset 01BE
The address where the partition table starts (@pxref{Partition table}).
@item Offset 01FE
The signature, AA55.
@end table
However, the first 62 bytes of a boot sector are known as the BIOS
Parameter Block (@dfn{BPB}), so GRUB cannot use these bytes for its own
purpose.
If an active partition is found, that partition's boot record is read
into 0000:7C00 and the MBR code jumps to 0000:7C00 with @code{SI}
pointing to the partition table entry that describes the partition being
booted. The boot record program uses this data to determine the drive
being booted from and the location of the partition on the disk.
The first byte of an active partition table entry is 80. This byte is
loaded into the @code{DL} register before INT 13H is called to read the
boot sector. When INT 13H is called, @code{DL} is the BIOS device
number. Because of this, the boot sector read by this MBR program can
only be read from BIOS device number 80 (the first hard disk). This is
one of the reasons why it is usually not possible to boot from any other
hard disk.
@node Partition table
@chapter The format of partition table
@menu
* Partition basics:: Overview the partition table
* Partition types:: The list of the @dfn{type} code
* Partition entry format:: The format of the table entry
* Partition table rules:: Some basic rules for partition table
@end menu
@node Partition basics
@section Overview the partition table
FDISK creates all partition records (sectors). The primary purpose of a
partition record is to hold a partition table. The rules for how FDISK
works are unwritten but so far most FDISK programs seem to follow the
same basic idea.
First, all partition table records (sectors) have the same format. This
includes the partition table record at cylinder 0, head 0, sector 1 --
what is known as the Master Boot Record (MBR). The last 66 bytes of a
partition table record contain a partition table and a 2 byte
signature. The first 446 bytes of these sectors usually contain a
program but only the program in the MBR is ever executed (so extended
partition table records could contain something other than a program in
the first 466 bytes). For more information, see @ref{MBR}.
Second, extended partitions are @emph{nested} inside one another and
extended partition table records form a @dfn{linked list}. I will
attempt to show this in a diagram at @ref{Partition entry format}.
Each partition table entry is 16 bytes and contains things like the
start and end location of a partition in CHS, the start in LBA, the size
in sectors, the partition @dfn{type} and the @dfn{active} flag. Older
versions of FDISK may compute incorrect LBA or size values. And when
your computer boots itself, only the CHS fields of the partition table
entries are used (another reason LBA doesn't solve the >528MB
problem). The CHS fields in the partition tables are in L-CHS format,
see @ref{CHS Translation}.
@node Partition types
@section The list of the @dfn{type} code
There is no central clearing house to assign the codes used in the one
byte @dfn{type} field. But codes are assigned (or used) to define most
every type of file system that anyone has ever implemented on the x86
PC: 12-bit FAT, 16-bit FAT, HPFS, NTFS, etc. Plus, an extended partition
also has a unique type code.
In the FDISK program @samp{sfdisk}, the following list is assumed:
@table @asis
@item 00
Empty
@item 01
DOS 12-bit FAT
@item 02
XENIX /
@item 03
XENIX /usr
@item 04
DOS 16-bit FAT <32M
@item 05
DOS Extended
@item 06
DOS 16-bit FAT >=32M
@item 07
HPFS / NTFS
@item 08
AIX boot or SplitDrive
@item 09
AIX data or Coherent
@item 0A
OS/2 Boot Manager
@item 0B
Windows95 FAT32
@item 0C
Windows95 FAT32 (LBA)
@item 0E
Windows95 FAT16 (LBA)
@item 0F
Windows95 Extended (LBA)
@item 10
OPUS
@item 11
Hidden DOS FAT12
@item 12
Compaq diagnostics
@item 14
Hidden DOS FAT16
@item 16
Hidden DOS FAT16 (big)
@item 17
Hidden HPFS/NTFS
@item 18
AST Windows swapfile
@item 24
NEC DOS
@item 3C
PartitionMagic recovery
@item 40
Venix 80286
@item 41
Linux/MINIX (sharing disk with DRDOS)
@item 42
SFS or Linux swap (sharing disk with DRDOS)
@item 43
Linux native (sharing disk with DRDOS)
@item 50
DM (disk manager)
@item 51
DM6 Aux1 (or Novell)
@item 52
CP/M or Microsoft SysV/AT
@item 53
DM6 Aux3
@item 54
DM6
@item 55
EZ-Drive (disk manager)
@item 56
Golden Bow (disk manager)
@item 5C
Priam Edisk (disk manager)
@item 61
SpeedStor
@item 63
GNU Hurd or Mach or Sys V/386 (such as ISC UNIX)@footnote{But the reason
why they decided that 63 means GNU Hurd is not known. Do not use 63 for
GNU Hurd.}
@item 64
Novell Netware 286
@item 65
Novell Netware 386
@item 70
DiskSecure Multi-Boot
@item 75
PC/IX
@item 77
QNX4.x
@item 78
QNX4.x 2nd part
@item 79
QNX4.x 3rd part
@item 80
MINIX until 1.4a
@item 81
MINIX / old Linux
@item 82
Linux swap
@item 83
Linux native@footnote{This is not true. Use 83 for ext2fs even if the
owner OS is GNU/Hurd.}
@item 84
OS/2 hidden C: drive
@item 85
Linux extended
@item 86
NTFS volume set
@item 87
NTFS volume set
@item 93
Amoeba
@item 94
Amoeba BBT
@item A0
IBM Thinkpad hibernation
@item A5
BSD/386
@item A7
NeXTSTEP 486
@item B7
BSDI fs
@item B8
BSDI swap
@item C1
DRDOS/sec (FAT-12)
@item C4
DRDOS/sec (FAT-16, < 32M)
@item C6
DRDOS/sec (FAT-16, >= 32M)
@item C7
Syrinx
@item DB
CP/M or Concurrent CP/M or Concurrent DOS or CTOS
@item E1
DOS access or SpeedStor 12-bit FAT extended partition
@item E3
DOS R/O or SpeedStor
@item E4
SpeedStor 16-bit FAT extended partition < 1024 cyl.
@item F1
SpeedStor
@item F2
DOS 3.3+ secondary
@item F4
SpeedStor large partition
@item FE
SpeedStor >1024 cyl. or LANstep
@item FF
Xenix Bad Block Table
@end table
@node Partition entry format
@section The format of the table entry
The 16 bytes of a partition table entry are used as follows:
@example
@group
+--- Bit 7 is the active partition flag, bits 6-0 are zero.
|
| +--- Starting CHS in INT 13 call format.
| |
| | +--- Partition type byte.
| | |
| | | +--- Ending CHS in INT 13 call format.
| | | |
| | | | +-- Starting LBA.
| | | | |
| | | | | +-- Size in sectors.
| | | | | |
v <--+---> v <--+--> v v
0 1 2 3 4 5 6 7 8 9 A B C D E F
DH DL CH CL TB DL CH CL LBA..... SIZE....
80 01 01 00 06 0e be 94 3e000000 0c610900 1st entry
00 00 81 95 05 0e fe 7d 4a610900 724e0300 2nd entry
00 00 00 00 00 00 00 00 00000000 00000000 3rd entry
00 00 00 00 00 00 00 00 00000000 00000000 4th entry
@end group
@end example
Bytes 0-3 are used by the small program in the Master Boot Record to
read the first sector of an active partition into memory. The @dfn{DH},
@dfn{DL}, @dfn{CH} and @dfn{CL} above show which x86 register is loaded
when the MBR program calls INT 13H AH=02h to read the active partition's
boot sector. For more information, see @ref{MBR}.
These entries define the following partitions:
@enumerate
@item
The first partition, a primary partition DOS FAT, starts at CHS 0H,1H,1H
(LBA 3EH) and ends at CHS 294H,EH,3EH with a size of 9610CH sectors.
@item
The second partition, an extended partition, starts at CHS 295H,0H,1H
(LBA 9614AH) and ends at CHS 37DH,EH,3EH with a size of 34E72H sectors.
@item
The third and fourth table entries are unused.
@end enumerate
@node Partition table rules
@section Some basic rules for partition table
Keep in mind that there are @emph{no} written rules and @emph{no}
industry standards on how FDISK should work but here are some basic
rules that seem to be followed by most versions of FDISK:
@enumerate
@item
In the MBR there can be 0-4 @dfn{primary} partitions, OR, 0-3 primary
partitions and 0-1 extended partition entry.
@item
In an extended partition there can be 0-1 @dfn{secondary} partition
entries and 0-1 extended partition entries.
@item
Only 1 primary partition in the MBR can be marked @dfn{active} at any
given time.
@item
In most versions of FDISK, the first sector of a partition will be
aligned such that it is at head 0, sector 1 of a cylinder. This means
that there may be unused sectors on the track(s) prior to the first
sector of a partition and that there may be unused sectors following a
partition table sector.
For example, most new versions of FDISK start the first partition
(primary or extended) at cylinder 0, head 1, sector 0. This leaves the
sectors at cylinder 0, head 0, sectors 2...n as unused sectors. This
same layout may be seen on the first track of an extended partition.
See example 2 below.
Also note that software drivers like Ontrack's Disk Manager depend on
these unused sectors because these drivers will @dfn{hide} their code
there (in cylinder 0, head 0, sectors 2...n). This is also a good place
for boot sector virus programs to hang out.
@item
The partition table entries (slots) can be used in any order. Some
versions of FDISK fill the table from the bottom up and some versions of
FDISK fill the table from the top down. Deleting a partition can leave
an unused entry (slot) in the middle of a table.
@item
And then there is the @dfn{hack} that some newer OS's (OS/2 and Linux)
use in order to place a partition spanning or passed cylinder 1024 on a
system that does not have a CHS translating BIOS. These systems create a
partition table entry with the partition's starting and ending CHS
information set to all FFH. The starting and ending LBA information is
used to describe the location of the partition. The LBA can be converted
back to a CHS --- most likely a CHS with more than 1024 cylinders. Since
such a CHS can't be used by the system BIOS, these partitions can not be
booted or accessed until the OS's kernel and hard disk device drivers
are loaded. It is not known if the systems using this @dfn{hack} follow
the same rules for the creation of these type of partitions.
@end enumerate
There are @emph{no} written rules as to how an OS scans the partition
table entries so each OS can have a different method. For DOS, this
means that different versions could assign different drive letters to
the same FAT file system partitions.