| @node Hacking |
| @chapter Implementation details |
| |
| This part describes the GRUB internals so that developers can |
| understand the implementation and start to hack GRUB. Of course, the |
| source code has the complete information, so refer to it when you are |
| not satisfied with this documentation. |
| |
| |
| @node Memory map |
| @chapter The memory map of various components |
| |
| GRUB is broken into 2 distinct components, or @dfn{stages}, which are |
| loaded at different times in the boot process. The Stage 1 has to know |
| where to find Stage 2, and the Stage 2 has to know where to find its |
| configuration file (if Stage 2 doesn't have a configuration file, it |
| drops into the command line interface and waits for a user command). |
| |
| Here is the memory map of the various components |
| @footnote{Currently GRUB does not use the extended memory for itself, |
| since it is used to load an operating system. But we are planning to use |
| it for GRUB itself in the future by @dfn{lazy loading}. Ask okuji for |
| more information.}: |
| |
| @table @asis |
| @item 0 to 4K-1 |
| Interrupt & BIOS area |
| |
| @item down from 8K-1 |
| 16-bit stack area |
| |
| @item 8K to (ebss1.5) |
| Stage 1.5 (optionally) loaded here by Stage 1 |
| |
| @item 0x7c00 to 0x7dff |
| Stage 1 loaded here by the BIOS |
| |
| @item 0x7e00 to 0x7e08 |
| Scratch space used by Stage 1 |
| |
| @item 32K to (ebss2) |
| Stage 2 loaded here by Stage 1.5 or Stage 1 |
| |
| @item (middle area) |
| Heap used for random memory allocation |
| |
| @item down from 416K-1 |
| 32-bit stack area |
| |
| @item 416K to 448K-1 |
| Filesystem info buffer (when reading a filesystem) |
| |
| @item 448K to 479.5K-1 |
| BIOS track read buffer |
| |
| @item 479.5K to 480K-1 |
| 512 byte fixed SCRATCH area |
| |
| @item 480K to 511K-1 |
| General storage heap |
| @end table |
| |
| See the file @file{stage2/shared.h}, for more information. |
| |
| |
| @node Embedded data |
| @chapter Embedded variables in GRUB |
| |
| GRUB's @dfn{stage1} and @dfn{stage2} have embedded variables whose |
| locations are well-defined, so that the installation can patch the |
| binary file directly without recompilation of the modules. |
| |
| In @dfn{stage1}, these are defined (The number in the parenthesis of |
| each entry is an offset number): |
| |
| @table @asis |
| @item @dfn{stage1 version} (0x3e) |
| This is the version bytes (should 03:00). |
| |
| @item @dfn{loading drive} (0x40) |
| This is the BIOS drive number to load the block from. If the number |
| is 0xff, then load from the booting drive. |
| |
| @item @dfn{stage2 sector} (0x41) |
| This is the location of the first sector of the @dfn{stage2}. |
| |
| @item @dfn{stage2 address} (0x45) |
| This is the data for the @code{jmp} command to the starting address of |
| the component loaded by the stage1. |
| |
| A @dfn{stage1.5} should be loaded at address 0x2000, and a @dfn{stage2} |
| should be loaded at address 0x8000. Both use a CS of 0. |
| |
| @item @dfn{stage2 segment} (0x47) |
| This is the segment of the starting address of the component loaded by |
| the @dfn{stage1}. |
| @end table |
| |
| In the first sector of @dfn{stage1.5} and @dfn{stage2}, the blocklists |
| are recorded between @dfn{firstlist} (0x200) and @dfn{lastlist} |
| (determined when assembling the file @file{stage2/start.S}). |
| |
| The trick here is that it is actually read backward, and the first |
| 8-byte blocklist is not read here, but after the pointer is decremented |
| 8 bytes, then after reading it, it decrements again, reads, decrements, |
| reads, etc. until it is finished. The terminating condition is when the |
| number of sectors to be read in the next blocklist is 0. |
| |
| The format of a blocklist can be seen from the example in the code just |
| before the @code{firstlist} label. Note that it is always from the |
| beginning of the disk, and @emph{not} relative to the partition |
| boundaries. |
| |
| In @dfn{stage1.5} and @dfn{stage2} (these are all defined at the |
| beginning of @file{shared_src/asm.S}): |
| |
| @table @asis |
| @item @dfn{major version} (0x6) |
| This is the major version byte (should be 3). |
| |
| @item @dfn{minor version} (0x7) |
| This is the minor version byte (should be 0). |
| |
| @item @dfn{install_partition} (0x8) |
| This is an unsigned long representing the partition on the currently |
| booted disk which GRUB should expect to find it's data files and treat |
| as the default root partition. |
| |
| The format of is exactly the same as the @dfn{partition} part (the |
| @dfn{disk} part is ignored) of the data passed to an OS by a |
| Multiboot-compliant boot loader in the @dfn{boot_device} data element, |
| with one exception. |
| |
| The exception is that if the first level of disk partitioning is left as |
| 0xFF (decimal 255, which is marked as no partitioning being used), but |
| the second level does have a partition number, it looks for the first |
| BSD-style PC partition, and finds the numbered BSD sub-partition in it. |
| The default @dfn{install_partition} 0xFF00FF, would then find the first |
| BSD-style PC partition, and use the @samp{a} partition in it, and |
| 0xFF01FF would use the @samp{b} partition, etc. |
| |
| If an explicit first-level partition is given, then no search is |
| performed, and it will expect that the BSD-style PC partition is in the |
| appropriate location, else a @samp{no such partition} error will be |
| returned. |
| |
| If a @dfn{stage1.5} is being used, it will pass its own |
| @dfn{install_partition} to any @dfn{stage2} it loads, therefore |
| overwriting the one present in the @dfn{stage2}. |
| |
| @item @dfn{stage2_id} (0xc) |
| This is the @dfn{stage1.5} or @dfn{stage2} identifier. |
| |
| @item @dfn{version_string} (0xd) |
| This is the @dfn{stage1.5} or @dfn{stage2} version string. It isn't |
| meant to be changed, simply easy to find. |
| |
| @item @dfn{config_file} (after the terminating zero of @dfn{version_string}) |
| This is the location, using the GRUB filesystem syntax, of the config |
| file. It will, by default, look in the @dfn{install_partition} of the |
| disk GRUB was loaded from, though one can use any valid GRUB filesystem |
| string, up to and including making it look on other disks. |
| |
| The boot loader itself doesn't search for the end of |
| @dfn{version_string}, it simply knows where @dfn{config_file} is, so the |
| beginning of the string cannot be moved after compile-time. This should |
| be OK, since the @dfn{version_string} is meant to be static. |
| |
| The code of @dfn{stage2} starts again at offset 0x70, so |
| @dfn{config_file} string obviously can't go past there. Also, remember |
| to terminate the string with a 0. |
| |
| Note that @dfn{stage1.5} uses a tricky internal representation for |
| @dfn{config_file}, which is the format of |
| @code{@var{device}:@var{filename}} (@samp{:} is not present actually). |
| @var{device} is an unsigned long like @dfn{install_partition}, and |
| @var{filename} is an absolute filename or a blocklist. If @var{device} |
| is disabled, that is, the drive number is 0xff, then @dfn{stage1.5} uses |
| the @dfn{boot drive} and the @dfn{install partition} instead. |
| @end table |
| |
| |
| @node Filesystem interface |
| @chapter The generic interface for the fs code |
| |
| For any particular partition, it is presumed that only one of the |
| @dfn{normal} filesystems such as FAT, FFS, or ext2fs can be used, so |
| there is a switch table managed by the functions in |
| @file{disk_io.c}. The notation is that you can only @dfn{mount} one at a |
| time. |
| |
| The blocklist filesystem has a special place in the system. In addition |
| to the @dfn{normal} filesystem (or even without one mounted), you can |
| access disk blocks directly (in the indicated partition) via the |
| blocklist notation. Using the blocklist filesystem doesn't effect any |
| other filesystem mounts. |
| |
| The variables which can be read by the filesystem backend are: |
| |
| @vtable @code |
| @item current_drive |
| Contain the current BIOS drive number (numbered from 0, if a floppy, |
| and numbered from 0x80, if a hard disk). |
| |
| @item current_partition |
| Contain the current partition number. |
| |
| @item current_slice |
| Contain the current partition type. |
| |
| @item saved_drive |
| Contain the @dfn{drive} part of the root device. |
| |
| @item saved_partition |
| Contain the @dfn{partition} part of the root device. |
| |
| @item part_start |
| Contain the current partition starting address. |
| |
| @item part_length |
| Contain the current partition length, in sectors. |
| |
| @item print_possibilities |
| True when the @code{dir} function should print the possible completions |
| of a file, and false when it should try to actually open a file of that |
| name. |
| |
| @item FSYS_BUF |
| Point to a filesystem buffer which is 32K in size, to use in any way |
| which the filesystem backend desires. |
| @end vtable |
| |
| The variables which need to be written by a filesystem backend are: |
| |
| @vtable @code |
| @item filepos |
| Should be the current position in the file. |
| |
| @strong{Caution:} the value of @var{filepos} can be changed out from |
| under the filesystem code in the current implementation. Don't depend on |
| it being the same for later calls into the back-end code! |
| |
| @item filemax |
| Should be the length of the file. |
| |
| @item disk_read_func |
| Should be set to the value of @samp{disk_read_hook} @emph{only} during |
| reading of data for the file, not any other fs data, inodes, FAT tables, |
| whatever, then set to @code{NULL} at all other times (it will be |
| @code{NULL} by default). If this isn't done correctly, then the |
| @command{testload} and @command{install} commands won't work |
| correctly. |
| @end vtable |
| |
| The functions expected to be used by the filesystem backend are: |
| |
| @ftable @code |
| @item devread |
| Only read sectors from within a partition. Sector 0 is the first sector |
| in the partition. |
| |
| @item grub_read |
| If the backend uses the blocklist code (like the FAT filesystem backend |
| does), then @code{grub_read} can be used, after setting @var{block_file} |
| to 1. |
| @end ftable |
| |
| The functions expected to be defined by the filesystem backend are |
| described at least moderately in the file @file{filesys.h}. Their usage |
| is fairly evident from their use in the functions in @file{disk_io.c}, |
| look for the use of the @var{fsys_table} array. |
| |
| @strong{Caution:} The semantics are such that then @samp{mount}ing the |
| filesystem, presume the filesystem buffer @var{FSYS_BUF} is corrupted, |
| and (re-)load all important contents. When opening and reading a file, |
| presume that the data from the @samp{mount} is available, and doesn't |
| get corrupted by the open/read (i.e. multiple opens and/or reads will be |
| done with only one mount if in the same filesystem). |
| |
| |
| @node Bootstrap tricks |
| @chapter The bootstrap mechanism used in GRUB |
| |
| The disk space can be used in a boot loader is very restricted because |
| a MBR (@pxref{MBR}) is only 512 bytes but it also contains a partition |
| table (@pxref{Partition table}) and a BPB. So the question is how to |
| make a boot loader code enough small to be fit in a MBR. |
| |
| However, GRUB is a very large program, so we break GRUB into 2 (or 3) |
| distinct components, @dfn{stage1} and @dfn{stage2} (and optionally |
| @dfn{stage1.5}). @xref{Memory map}, for more information. |
| |
| We embed @dfn{stage1} in a MBR or in the boot sector of a partition |
| , and place @dfn{stage2} in a filesystem. The optional |
| @dfn{stage1.5} can be installed in a filesystem, in the @dfn{boot loader} |
| area in a FFS, and in the sectors right after a MBR, because |
| @dfn{stage1.5} is enough small and the sectors right after a MBR is |
| normally an unused region. The size of this region is the number of |
| sectors per head minus 1. |
| |
| Thus, all the @dfn{stage1} must do is just load a @dfn{stage2} or |
| @dfn{stage1.5}. But even if @dfn{stage1} needs not to support the user |
| interface or the filesystem interface, it is impossible to make |
| @dfn{stage1} less than 400 bytes, because GRUB should support both the |
| CHS mode and the LBA mode (@pxref{Low-level disk I/O}). |
| |
| The solution used by GRUB is that @dfn{stage1} loads only the first |
| sector of a @dfn{stage2} (or a @dfn{stage1.5}) and @dfn{stage2} itself |
| loads the rest. The flow of @dfn{stage1} is: |
| |
| @enumerate |
| @item |
| Initialize the system briefly. |
| |
| @item |
| Detect the geometry and the accessing mode of the @dfn{loading drive}. |
| |
| @item |
| Load the first sector of the @dfn{stage2}. |
| |
| @item |
| Jump to the starting address of the @dfn{stage2}. |
| @end enumerate |
| |
| The flow of @dfn{stage2} (and @dfn{stage1.5}) is: |
| |
| @enumerate |
| @item |
| Load the rest of itself to the real starting address, that is, the |
| starting address plus 512 bytes. The blocklists are stored in the last |
| part of the first sector. |
| |
| @item |
| Long jump to the real starting address. |
| @end enumerate |
| |
| Note that @dfn{stage2} (or @dfn{stage1.5}) does not probe the geometry |
| or the accessing mode of the @dfn{loading drive}, since @dfn{stage1} has |
| already probed them. |
| |
| |
| @node I/O ports detection |
| @chapter How to detect I/O ports used for a BIOS drive |
| |
| In the @sc{pc} world, BIOS cannot detect if a hard disk drive is SCSI or |
| IDE, generally speaking. Thus, it is not trivial to know which BIOS |
| drive corresponds to an OS device. So the Multiboot Specification |
| describes some techniques on how to guess mappings (@pxref{BIOS device |
| mapping techniques, Multiboot Specification, BIOS device mapping |
| techniques, multiboot, The Multiboot Specification}). |
| |
| However, the techniques described are unreliable or difficult to be |
| implemented, so we use a different technique from them in GRUB. Our |
| technique is @dfn{INT 13H tracking technique}. More precisely, it runs |
| the INT 13 call (@pxref{Low-level disk I/O}) in single-step mode just |
| like a debugger and parses the instructions. |
| |
| To execute the call one instruction at a time, set the TF (trap flag) |
| flag in the register @dfn{FLAGS}. By this, your CPU generates @dfn{Break |
| Point Trap} after each instruction is executed and call INT 1. In the |
| stack in the interrupt handler, callee's FLAGS and the far pointer which |
| points to the next instruction to be executed are pushed, so we can know |
| what instruction will be executed in the next time and the current |
| contents of all the registers. If the next instruction is an I/O |
| operation, the interrupt handler adds the I/O port into the @dfn{I/O |
| map}. |
| |
| If the INT 13 handler returns, the TF flag is cleared automatically by |
| the instruction @code{iret}, and then output the I/O map on the screen. |
| See the source code for the command @command{ioprobe} |
| (@pxref{Command-line-specific commands}), for more information. |
| |
| |
| @node Memory detection |
| @chapter How to detect all installed @sc{ram} |
| |
| There are three BIOS calls which return the information of installed |
| @sc{ram}. GRUB uses these calls to detect all installed @sc{ram} and |
| which address range should be treated by operating systems. |
| |
| @menu |
| * Query System Address Map:: INT 15H, AX=E820h interrupt call |
| * Get Large Memory Size:: INT 15H, AX=E801h interrupt call |
| * Get Extended Memory Size:: INT 15H, AX=88h interrupt call |
| @end menu |
| |
| |
| @node Query System Address Map |
| @section INT 15H, AX=E820h interrupt call |
| |
| Real mode only. |
| |
| This call returns a memory map of all the installed @sc{ram}, and of |
| physical memory ranges reserved by the BIOS. The address map is returned |
| by making successive calls to this API, each returning one "run" of |
| physical address information. Each run has a type which dictates how |
| this run of physical address range should be treated by the operating |
| system. |
| |
| If the information returned from INT 15h, AX=E820h in some way differs |
| from INT 15h, AX=E801h (@pxref{Get Large Memory Size}) or INT 15h AH=88h |
| (@pxref{Get Extended Memory Size}), then the information returned from |
| E820h supersedes what is returned from these older interfaces. This |
| allows the BIOS to return whatever information it wishes to for |
| compatibility reasons. |
| |
| Input: |
| |
| @multitable @columnfractions .15 .25 .6 |
| @item @code{EAX} @tab Function Code @tab E820h |
| |
| @item @code{EBX} @tab Continuation @tab Contains the @dfn{continuation |
| value} to get the next run of physical memory. This is the value |
| returned by a previous call to this routine. If this is the first call, |
| @code{EBX} must contain zero. |
| |
| @item @code{ES:DI} @tab Buffer Pointer @tab Pointer to an Address Range |
| Descriptor structure which the BIOS is to fill in. |
| |
| @item @code{ECX} @tab Buffer Size @tab The length in bytes of the |
| structure passed to the BIOS. The BIOS will fill in at most @code{ECX} |
| bytes of the structure or however much of the structure the BIOS |
| implements. The minimum size which must be supported by both the BIOS |
| and the caller is 20 bytes. Future implementations may extend this |
| structure. |
| |
| @item @code{EDX} @tab Signature @tab @samp{SMAP} - Used by the BIOS to |
| verify the caller is requesting the system map information to be |
| returned in @code{ES:DI}. |
| @end multitable |
| |
| Output: |
| |
| @multitable @columnfractions 0.15 0.25 0.6 |
| @item @code{CF} @tab Carry Flag @tab Non-Carry - indicates no error |
| |
| @item @code{EAX} @tab Signature @tab @samp{SMAP} - Signature to verify |
| correct BIOS revision. |
| |
| @item @code{ES:DI} @tab Buffer Pointer @tab Returned Address Range |
| Descriptor pointer. Same value as on input. |
| |
| @item @code{ECX} @tab Buffer Size @tab Number of bytes returned by the |
| BIOS in the address range descriptor. The minimum size structure |
| returned by the BIOS is 20 bytes. |
| |
| @item @code{EBX} @tab Continuation @tab Contains the continuation value |
| to get the next address descriptor. The actual significance of the |
| continuation value is up to the discretion of the BIOS. The caller must |
| pass the continuation value unchanged as input to the next iteration of |
| the E820h call in order to get the next Address Range Descriptor. A |
| return value of zero means that this is the last descriptor. Note that |
| the BIOS indicate that the last valid descriptor has been returned by |
| either returning a zero as the continuation value, or by returning |
| carry. |
| @end multitable |
| |
| The Address Range Descriptor Structure is: |
| |
| @multitable @columnfractions 0.25 0.3 0.45 |
| @item Offset in Bytes @tab Name @tab Description |
| |
| @item 0 @tab @dfn{BaseAddrLow} @tab Low 32 Bits of Base Address |
| |
| @item 4 @tab @dfn{BaseAddrHigh} @tab High 32 Bits of Base Address |
| |
| @item 8 @tab @dfn{LengthLow} @tab Low 32 Bits of Length in Bytes |
| |
| @item 12 @tab @dfn{LengthHigh} @tab High 32 Bits of Length in Bytes |
| |
| @item 16 @tab @dfn{Type} @tab Address type of this range |
| @end multitable |
| |
| The @dfn{BaseAddrLow} and @dfn{BaseAddrHigh} together are the 64 bit |
| @dfn{BaseAddress} of this range. The @dfn{BaseAddress} is the physical |
| address of the start of the range being specified. |
| |
| The @dfn{LengthLow} and @dfn{LengthHigh} together are the 64 bit |
| @dfn{Length} of this range. The @dfn{Length} is the physical contiguous |
| length in bytes of a range being specified. |
| |
| The @dfn{Type} field describes the usage of the described address range |
| as defined in the table below: |
| |
| @multitable @columnfractions 0.1 0.35 0.55 |
| @item Value @tab Mnemonic @tab Description |
| |
| @item 1 @tab @dfn{AddressRangeMemory} @tab This run is available |
| @sc{ram} usable by the operating system. |
| |
| @item 2 @tab @dfn{AddressRangeReserved} @tab This run of addresses is in |
| use or reserved by the system, and must not be used by the operating |
| system. |
| |
| @item Other @tab @dfn{Undefined} @tab Undefined - Reserved for future |
| use. Any range of this type must be treated by the OS as if the type |
| returned was @dfn{AddressRangeReserved}. |
| @end multitable |
| |
| The BIOS can use the @dfn{AddressRangeReserved} address range type to |
| block out various addresses as @emph{not suitable} for use by a |
| programmable device. |
| |
| Some of the reasons a BIOS would do this are: |
| |
| @itemize @bullet |
| @item |
| The address range contains system @sc{rom}. |
| |
| @item |
| The address range contains @sc{ram} in use by the @sc{rom}. |
| |
| @item |
| The address range is in use by a memory mapped system device. |
| |
| @item |
| The address range is for whatever reason are unsuitable for a |
| standard device to use as a device memory space. |
| @end itemize |
| |
| Here is the list of assumptions and limitations: |
| |
| @enumerate |
| @item |
| The BIOS will return address ranges describing base board memory and ISA |
| or PCI memory that is contiguous with that base board memory. |
| |
| @item |
| The BIOS @emph{will not} return a range description for the memory |
| mapping of PCI devices. ISA Option @sc{rom}'s, and ISA plug & play |
| cards. This is because the OS has mechanisms available to detect them. |
| |
| @item |
| The BIOS will return chipset defined address holes that are not being |
| used by devices as reserved. |
| |
| @item |
| Address ranges defined for base board memory mapped I/O devices (for |
| example APICs) will be returned as reserved. |
| |
| @item |
| All occurrences of the system BIOS will be mapped as reserved. This |
| includes the area below 1 MB, at 16 MB (if present) and at end of the |
| address space (4 GB). |
| |
| @item |
| Standard PC address ranges will not be reported. Example video memory at |
| A0000 to BFFFF physical will not be described by this function. The |
| range from E0000 to EFFFF is base board specific and will be reported as |
| suits the bas board. |
| |
| @item |
| All of lower memory is reported as normal memory. It is OS's |
| responsibility to handle standard @sc{ram} locations reserved for |
| specific uses, for example: the interrupt vector table (0:0) and the |
| BIOS data area (40:0). |
| @end enumerate |
| |
| Here we explain an example address map. This sample address map |
| describes a machine which has 128 MB @sc{ram}, 640K of base memory and |
| 127 MB extended. The base memory has 639K available for the user and 1K |
| for an extended BIOS data area. There is a 4 MB Linear Frame Buffer |
| (LFB) based at 12 MB. The memory hole created by the chipset is from 8 |
| M to 16 M. There are memory mapped APIC devices in the system. The IO |
| Unit is at FEC00000 and the Local Unit is at FEE00000. The system BIOS |
| is remapped to 4G - 64K. |
| |
| Note that the 639K endpoint of the first memory range is also the base |
| memory size reported in the BIOS data segment at 40:13. |
| |
| Key to types: @dfn{ARM} is AddressRangeMemory, @dfn{ARR} is |
| AddressRangeReserved. |
| |
| @multitable @columnfractions 0.15 0.1 0.1 0.65 |
| @item Base (Hex) @tab Length @tab Type @tab Description |
| |
| @item 0000 0000 @tab 639K @tab ARM @tab Available Base memory - |
| typically the same value as is returned via the INT 12 function. |
| |
| @item 0009 FC00 @tab 1K @tab ARR @tab Memory reserved for use by the |
| BIOS(s). This area typically includes the Extended BIOS data area. |
| |
| @item 000F 0000 @tab 64K @tab ARR @tab System BIOS. |
| |
| @item 0010 0000 @tab 7M @tab ARM @tab Extended memory, this is not |
| limited to the 64MB address range. |
| |
| @item 0080 0000 @tab 8M @tab ARR @tab Chipset memory hole required to |
| support the LFB mapping at 12 MB. |
| |
| @item 0100 0000 @tab 120M @tab ARM @tab Base board @sc{ram} relocated |
| above a chipset memory hole. |
| |
| @item FE00 0000 @tab 4K @tab ARR @tab IO APIC memory mapped I/O at |
| FEC00000. Note the range of addresses required for an APIC device may |
| vary from base OEM to OEM. |
| |
| @item FEE0 0000 @tab 4K @tab ARR @tab Local APIC memory mapped I/O at |
| FEE00000. |
| |
| @item FFFF 0000 @tab 64K @tab ARR @tab Remapped System BIOS at end of |
| address space. |
| @end multitable |
| |
| The following code segment is intended to describe the algorithm needed |
| when calling the Query System Address Map function. It is an |
| implementation example and uses non standard mechanisms. |
| |
| @example |
| E820Present = FALSE; |
| Regs.ebx = 0; |
| do |
| @{ |
| Regs.eax = 0xE820; |
| Regs.es = SEGMENT (&Descriptor); |
| Regs.di = OFFSET (&Descriptor); |
| Regs.ecx = sizeof (Descriptor); |
| Regs.edx = 'SMAP'; |
| |
| _int (0x15, Regs); |
| |
| if ((Regs.eflags & EFLAGS_CARRY) || Regs.eax != 'SMAP') |
| @{ |
| break; |
| @} |
| |
| if (Regs.ecx < 20 || Regs.ecx > sizeof (Descriptor)) |
| @{ |
| /* bug in bios - all returned descriptors must be at |
| least 20 bytes long, and can not be larger than |
| the input buffer. */ |
| break; |
| @} |
| |
| E820Present = TRUE; |
| . |
| . |
| . |
| Add address range Descriptor.BaseAddress through |
| Descriptor.BaseAddress + Descriptor.Length |
| as type Descriptor.Type |
| . |
| . |
| . |
| @} |
| while (Regs.ebx != 0); |
| |
| if (! E820Present) |
| @{ |
| . |
| . |
| . |
| call INT 15H, AX E801h and/or INT 15H, AH=88h to obtain old style |
| memory information |
| . |
| . |
| . |
| @} |
| @end example |
| |
| |
| @node Get Large Memory Size |
| @section INT 15H, AX=E801h interrupt call |
| |
| Real mode only. |
| |
| Originally defined for EISA servers, this interface is capable of |
| reporting up to 4 GB of @sc{ram}. While not nearly as flexible as |
| E820h, it is present in many more systems. |
| |
| Input: |
| |
| @multitable @columnfractions 0.15 0.25 0.6 |
| @item @code{AX} @tab Function Code @tab E801h. |
| @end multitable |
| |
| Output: |
| |
| @multitable @columnfractions 0.15 0.25 0.6 |
| @item @code{CF} @tab Carry Flag @tab Non-Carry - indicates no error. |
| |
| @item @code{AX} @tab Extended 1 @tab Number of contiguous KB between 1 |
| and 16 MB, maximum 0x3C00 = 15 MB. |
| |
| @item @code{BX} @tab Extended 2 @tab Number of contiguous 64KB blocks |
| between 16 MB and 4GB. |
| |
| @item @code{CX} @tab Configured 1 @tab Number of contiguous KB between 1 |
| and 16 MB, maximum 0x3c00 = 15 MB. |
| |
| @item @code{DX} @tab Configured 2 @tab Number of contiguous 64KB blocks |
| between 16 MB and 4 GB. |
| @end multitable |
| |
| Not sure what this difference between the @dfn{Extended} and |
| @dfn{Configured} numbers are, but they appear to be identical, as |
| reported from the BIOS. |
| |
| It is possible for a machine using this interface to report a memory |
| hole just under 16 MB (Count 1 is less than 15 MB, but Count 2 is |
| non-zero). |
| |
| |
| @node Get Extended Memory Size |
| @section INT 15H, AX=88h interrupt call |
| |
| Real mode only. |
| |
| This interface is quite primitive. It returns a single value for |
| contiguous memory above 1 MB. The biggest limitation is that the value |
| returned is a 16-bit value, in KB, so it has a maximum saturation of |
| just under 64 MB even presuming it returns as much as it can. On some |
| systems, it won't return anything above the 16 MB boundary. |
| |
| The one useful point is that it works on every PC available. |
| |
| Input: |
| |
| @multitable @columnfractions 0.15 0.25 0.6 |
| @item @code{AH} @tab Function Code @tab 88h |
| @end multitable |
| |
| Output: |
| |
| @multitable @columnfractions 0.15 0.25 0.6 |
| @item @code{CF} @tab Carry Flag @tab Non-Carry - indicates no error. |
| |
| @item @code{AX} @tab Memory Count @tab Number of contiguous KB above 1 |
| MB. |
| @end multitable |
| |
| |
| @node Low-level disk I/O |
| @chapter INT 13H disk I/O interrupts |
| |
| In the PC world, living with the BIOS disk interface is definitely a |
| nightmare. This section documents how awful the chaos is and how GRUB |
| deals with the BIOS disks. |
| |
| @menu |
| * CHS Translation:: CHS addressing and LBA addressing |
| * CHS mode disk I/O:: INT 13H, AH=0xh interrupt call |
| * LBA mode disk I/O:: INT 13H, AH=4xh interrupt call |
| @end menu |
| |
| |
| @node CHS Translation |
| @section CHS addressing and LBA addressing |
| |
| CHS --- Cylinder/Head/Sector --- is the traditional way to address |
| sectors on a disk. There are at least two types of CHS addressing; the |
| CHS that is used at the INT 13H interface and the CHS that is used at |
| the ATA device interface. In the MFM/RLL/ESDI and early ATA days the CHS |
| used at the INT 13H interface was the same as the CHS used at the device |
| interface. |
| |
| Today we have CHS translating BIOS types that can use one CHS at the INT |
| 13H interface and a different CHS at the device interface. These two |
| types of CHS will be called the logical CHS or @dfn{L-CHS} and the |
| physical CHS or @dfn{P-CHS} in this section. L-CHS is the CHS used at |
| the INT 13H interface and P-CHS is the CHS used at the device interface. |
| |
| The L-CHS used at the INT 13 interface allows up to 256 heads, up to |
| 1024 cylinders and up to 63 sectors. This allows support of up to 8GB |
| drives. This scheme started with either ESDI or SCSI adapters many years |
| ago. |
| |
| The P-CHS used at the device interface allows up to 16 heads up to 65535 |
| cylinders, and up to 63 sectors. This allows access to about 2^26 sectors |
| (32GB) on an ATA device. When a P-CHS is used at the INT 13H interface |
| it is limited to 1024 cylinders, 16 heads and 63 sectors. This is where |
| the old 528MB limit originated. |
| |
| LBA --- Logical Block Address --- is another way of addressing sectors |
| that uses a simple numbering scheme starting with zero as the address of |
| the first sector on a device. The ATA standard requires that cylinder 0, |
| head 0, sector 1 address the same sector as addressed by LBA 0. LBA |
| addressing can be used at the ATA interface if the ATA device supports |
| it. LBA addressing is also used at the INT 13H interface by the AH=4xH |
| read/write calls. |
| |
| ATA devices may also support LBA at the device interface. LBA allows |
| access to approximately 2^28 sectors (137GB) on an ATA device. |
| |
| A SCSI host adapter can convert a L-CHS directly to an LBA used in the |
| SCSI read/write commands. On a PC today, SCSI is also limited to 8GB |
| when CHS addressing is used at the INT 13H interface. |
| |
| First, all OS's that want to be co-resident with another OS (and that is |
| all of the PC based OS's that I know of) @emph{must} use INT 13H to |
| determine the capacity of a hard disk. And that capacity information |
| @emph{must} be determined in L-CHS mode. Why is this? Because: |
| |
| @enumerate |
| @item |
| FDISK and the partition tables are really L-CHS based. |
| |
| @item |
| MS/PC DOS uses INT 13H AH=02H and AH=03H to read and write the disk and |
| these BIOS calls are L-CHS based. |
| |
| @item |
| The boot processing done by the BIOS is all L-CHS based. |
| @end enumerate |
| |
| During the boot processing, all of the disk read accesses are done in |
| L-CHS mode via INT 13H and this includes loading the first of the OS's |
| kernel code or boot manager's code. |
| |
| Second, because there can be multiple BIOS types in any one system, each |
| drive may be under the control of a different type of BIOS. For example, |
| drive 80H (the first hard drive) could be controlled by the original |
| system BIOS, drive 81H (the second drive) could be controlled by a |
| option @sc{rom} BIOS and drive 82H (the third drive) could be controlled |
| by a software driver. Also, be aware that each drive could be a |
| different type, for example, drive 80H could be an MFM drive, drive 81H |
| could be an ATA drive, drive 82H could be a SCSI drive. |
| |
| Third, not all OS's understand or use BIOS drive numbers greater than |
| 81H. Even if there is INT 13H support for drives 82H or greater, the OS |
| may not use that support. |
| |
| Fourth, the BIOS INT 13H configuration calls are: |
| |
| @table @asis |
| @item AH=08H, Get Drive Parameters |
| This call is restricted to drives up to 528MB without CHS translation |
| and to drives up to 8GB with CHS translation. For older BIOS with no |
| support for >1024 cylinders or >528MB, this call returns the same CHS as |
| is used at the ATA interface (the P-CHS). For newer BIOS's that do |
| support >1024 cylinders or >528MB, this call returns a translated CHS |
| (the L-CHS). The CHS returned by this call is used by FDISK to build |
| partition records. |
| |
| @item AH=41H, Get BIOS Extensions Support |
| This call is used to determine if the IBM/Microsoft Extensions or if the |
| Phoenix Enhanced INT 13H calls are supported for the BIOS drive number. |
| |
| @item AH=48H, Extended Get Drive Parameters |
| This call is used to determine the CHS geometries, LBA information and |
| other data about the BIOS drive number. |
| @end table |
| |
| An ATA disk must implement both CHS and LBA addressing and must at any |
| given time support only one P-CHS at the device interface. And, the |
| drive must maintain a strict relationship between the sector addressing |
| in CHS mode and LBA mode. Quoting @cite{the ATA-2 document}: |
| |
| @example |
| @group |
| LBA = ( (cylinder * heads_per_cylinder + heads ) |
| * sectors_per_track ) + sector - 1 |
| |
| where heads_per_cylinder and sectors_per_track are the current |
| translation mode values. |
| @end group |
| @end example |
| |
| This algorithm can also be used by a BIOS or an OS to convert a L-CHS to |
| an LBA. |
| |
| This algorithm can be reversed such that an LBA can be converted to a |
| CHS: |
| |
| @example |
| @group |
| cylinder = LBA / (heads_per_cylinder * sectors_per_track) |
| temp = LBA % (heads_per_cylinder * sectors_per_track) |
| head = temp / sectors_per_track |
| sector = temp % sectors_per_track + 1 |
| @end group |
| @end example |
| |
| While most OS's compute disk addresses in an LBA scheme, an OS like DOS |
| must convert that LBA to a CHS in order to call INT 13H. |
| |
| The basic problem is that there is no requirement that a CHS translating |
| BIOS followed these rules. There are many other algorithms that can be |
| implemented to perform a similar function. Today, there are at least two |
| popular implementations: the Phoenix implementation (described above) and |
| the non-Phoenix implementations. Because a protected mode OS that does |
| not want to use INT 13H must implement the same CHS translation |
| algorithm. If it doesn't, your data gets scrambled. |
| |
| In the perfect world of tomorrow, maybe only LBA will be used. But today |
| we are faced with the following problems: |
| |
| @itemize @bullet |
| @item |
| Some drives >528MB don't implement LBA. |
| |
| @item |
| Some drives are optimized for CHS and may have lower performance when |
| given commands in LBA mode. Don't forget that LBA is something new for |
| the ATA disk designers who have worked very hard for many years to |
| optimize CHS address handling. And not all drive designs require the use |
| of LBA internally. |
| |
| @item |
| The L-CHS to LBA conversion is more complex and slower than the bit |
| shifting L-CHS to P-CHS conversion. |
| |
| @item |
| DOS, FDISK and the MBR are still CHS based --- they use the CHS returned |
| by INT 13H AH=08H. Any OS that can be installed on the same disk with |
| DOS must understand CHS addressing. |
| |
| @item |
| The BIOS boot processing and loading of the first OS kernel code is done |
| in CHS mode --- the CHS returned by INT 13H AH=08H is used. |
| |
| @item |
| Microsoft has said that their OS's will not use any disk capacity that |
| can not also be accessed by INT 13H AH=0xH. |
| @end itemize |
| |
| These are difficult problems to overcome in today's industry |
| environment. The result: chaos. |
| |
| |
| @node CHS mode disk I/O |
| @section INT 13H, AH=0xh interrupt call |
| |
| Real mode only. These functions are the traditional CHS mode disk |
| interface. GRUB calls them only if LBA mode is not available. |
| |
| INT 13H, AH=02h reads sectors into memory. |
| |
| Input: |
| |
| @multitable @columnfractions .15 .85 |
| @item @code{AH} @tab 02h |
| |
| @item @code{AL} @tab The number of sectors to read (must be non-zero). |
| |
| @item @code{CH} @tab Low 8 bits of cylinder number. |
| |
| @item @code{CL} @tab Sector number in bits 0-5, and high 2 bits of |
| cylinder number in bits 6-7. |
| |
| @item @code{DH} @tab Head number. |
| |
| @item @code{DL} @tab Drive number (bit 7 set for hard disk). |
| |
| @item @code{ES:BX} @tab Data buffer. |
| @end multitable |
| |
| Output: |
| |
| @multitable @columnfractions .15 .85 |
| @item @code{CF} @tab Set on error. |
| |
| @item @code{AH} @tab Status. |
| |
| @item @code{AL} @tab The number of sectors transferred (only valid if CF |
| set for some BIOSes). |
| @end multitable |
| |
| INT 13H, AH=03h writes disk sectors. |
| |
| Input: |
| |
| @multitable @columnfractions .15 .85 |
| @item @code{AH} @tab 03h |
| |
| @item @code{AL} @tab The number of sectors to write (must be non-zero). |
| |
| @item @code{CH} @tab Low 8 bits of cylinder number. |
| |
| @item @code{CL} @tab Sector number in bits 0-5, and high 2 bits of |
| cylinder number in bits 6-7. |
| |
| @item @code{DH} @tab Head number. |
| |
| @item @code{DL} @tab Drive number (bit 7 set for hard disk). |
| |
| @item @code{ES:BX} @tab Data buffer. |
| @end multitable |
| |
| Output: |
| |
| @multitable @columnfractions .15 .85 |
| @item @code{CF} @tab Set on error. |
| |
| @item @code{AH} @tab Status. |
| |
| @item @code{AL} @tab The number of sectors transferred (only valid if CF |
| set for some BIOSes). |
| @end multitable |
| |
| INT 13H, AH=08h returns drive parameters. For systems predating the IBM |
| PC/AT, this call is only valid for hard disks. |
| |
| Input: |
| |
| @multitable @columnfractions .15 .85 |
| @item @code{AH} @tab 08h |
| |
| @item @code{DL} @tab Drive number (bit 7 set for hard disk). |
| @end multitable |
| |
| Output: |
| |
| @multitable @columnfractions .15 .85 |
| @item @code{CF} @tab Set on error. |
| |
| @item @code{AH} @tab 0. |
| |
| @item @code{AL} @tab 0 on at least some BIOSes. |
| |
| @item @code{BL} @tab Drive type (AT/PS2 floppies only). |
| |
| @item @code{CH} @tab Low 8 bits of maximum cylinder number. |
| |
| @item @code{CL} @tab Maximum sector number in bits 0-5, and high 2 bits |
| of maximum cylinder number in bits 6-7. |
| |
| @item @code{DH} @tab Maximum head number. |
| |
| @item @code{DL} @tab The number of drives. |
| |
| @item @code{ES:DI} @tab Drive parameter table (floppies only). |
| @end multitable |
| |
| |
| @node LBA mode disk I/O |
| @section INT 13H, AH=4xh interrupt call |
| |
| Real mode only. These functions are IBM/MS INT 13 Extensions to support |
| LBA mode. GRUB uses them if available so that it can read/write over 8GB |
| area. |
| |
| INT 13, AH=41h checks if LBA is supported. |
| |
| Input: |
| |
| @multitable @columnfractions 0.15 0.85 |
| @item @code{AH} @tab 41h. |
| |
| @item @code{BX} @tab 55AAh. |
| |
| @item @code{DL} @tab Drive number. |
| @end multitable |
| |
| Output: |
| |
| @multitable @columnfractions 0.15 0.85 |
| @item @code{CF} @tab Set on error. |
| |
| @item @code{AH} @tab Major version of extensions (01h for 1.x, 20h for |
| 2.0 / EDD-1.0, 21h for 2.1 / EDD-1.1 and 30h for EDD-3.0) if successful, |
| otherwise 01h (the error code of @dfn{invalid function}). |
| |
| @item @code{BX} @tab AA55h if installed. |
| |
| @item @code{AL} @tab Internal use. |
| |
| @item @code{CX} @tab API subset support bitmap (see below). |
| |
| @item @code{DH} @tab Extension version. |
| @end multitable |
| |
| The bitfields for the API subset support bitmap are@footnote{It is known |
| that (at least) the AMI BIOS in SuperMicro P6SBA motherboard |
| (AMIBIOSC0631) does @emph{not} return the bitfields correctly.}: |
| |
| @multitable @columnfractions 0.15 0.85 |
| @item Bit(s) @tab Description |
| |
| @item 0 @tab Extended disk access functions (AH=42h-44h, 47h, 48h) |
| supported. |
| |
| @item 1 @tab Removable drive controller functions (AH=45h, 46h, 48h, |
| 49h, INT 15H, AH=52h) supported. |
| |
| @item 2 @tab Enhanced disk drive (EDD) functions (AH=48h, 4Eh) |
| supported. |
| |
| @item 3-15 @tab Reserved (0). |
| @end multitable |
| |
| INT 13, AH=42h reads sectors into memory. |
| |
| Input: |
| |
| @multitable @columnfractions .15 .85 |
| @item @code{AH} @tab 42h. |
| |
| @item @code{DL} @tab Drive number. |
| |
| @item @code{DS:SI} @tab Disk Address Packet (see below). |
| @end multitable |
| |
| Output: |
| |
| @multitable @columnfractions .15 .85 |
| @item @code{CF} @tab Set on error. |
| |
| @item @code{AH} @tab 0 if successful, otherwise error code. |
| @end multitable |
| |
| The format of @dfn{Disk Address Packet} is: |
| |
| @multitable @columnfractions 0.15 0.15 0.7 |
| @item Offset (hex) @tab Size (byte) @tab Description |
| |
| @item 00 @tab 1 @tab 10h (The size of packet). |
| |
| @item 01 @tab 1 @tab Reserved (0). |
| |
| @item 02 @tab 2 @tab The number of blocks to transfer (max 007F for |
| Phoenix EDD). |
| |
| @item 04 @tab 4 @tab Transfer buffer (SEGMENT:OFFSET). |
| |
| @item 08 @tab 8 @tab Starting absolute block number. |
| @end multitable |
| |
| INT 13, AH=43h writes disk sectors. |
| |
| Input: |
| |
| @multitable @columnfractions 0.15 0.85 |
| @item @code{AH} @tab 43h. |
| |
| @item @code{AL} @tab Write flags (In version 1.0 and 2.0, bit 0 is the |
| flag for @dfn{verify write} and other bits are reserved (0). In version |
| 2.1, 00h and 01h indicates @dfn{write without verify}, and 02h indicates |
| @dfn{write with verify}. |
| |
| @item @code{DL} @tab Drive number. |
| |
| @item @code{DS:SI} @tab Disk Address Packet (see above). |
| @end multitable |
| |
| Output: |
| |
| @multitable @columnfractions 0.15 0.85 |
| @item @code{CF} @tab Set on error. |
| |
| @item @code{AH} @tab 0 if successful, otherwise error code. |
| @end multitable |
| |
| INT 13, AH=48h returns drive parameters. GRUB only makes use of the |
| total number of sectors, and ignore the CHS information, because only |
| L-CHS makes sense. @xref{CHS Translation}, for more information. |
| |
| Input: |
| |
| @multitable @columnfractions 0.15 0.85 |
| @item @code{AH} @tab 48h. |
| |
| @item @code{DL} @tab Drive number. |
| |
| @item @code{DS:SI} @tab Buffer for drive parameters (see below). |
| @end multitable |
| |
| Output: |
| |
| @multitable @columnfractions 0.15 0.85 |
| @item @code{CF} @tab Set on error. |
| |
| @item @code{AH} @tab 0 if successful, otherwise error code. |
| @end multitable |
| |
| The format of drive parameters is: |
| |
| @multitable @columnfractions 0.25 0.15 0.6 |
| @item Offset (hex) @tab Size (byte) @tab Description |
| |
| @item 00 @tab 2 @tab The size of buffer. Before calling this function, |
| set to the maximum buffer size, at least 1Ah. The size actually filled |
| is returned (1Ah for version 1.0, 1Eh for 2.x and 42h for 3.0). |
| |
| @item 02 @tab 2 @tab Information flags (see below). |
| |
| @item 04 @tab 4 @tab The number of physical cylinders. |
| |
| @item 08 @tab 4 @tab The number of physical heads. |
| |
| @item 0C @tab 4 @tab The number of physical sectors per track. |
| |
| @item 10 @tab 8 @tab The total number of sectors. |
| |
| @item 18 @tab 2 @tab The bytes per sector. |
| |
| @comment Add an empty row for readability... |
| @item @tab @tab |
| |
| @item @strong{v2.0 and later} @tab @tab |
| |
| @item 1A @tab 4 @tab EDD configuration parameters. |
| |
| @comment Add an empty row for readability... |
| @item @tab @tab |
| |
| @item @strong{v3.0} @tab @tab |
| |
| @item 1E @tab 2 @tab Signature BEDD to indicate presence of Device Path |
| information. |
| |
| @item 20 @tab 1 @tab The length of Device Path information, including |
| signature and this byte (24h for version 3.0). |
| |
| @item 21 @tab 3 @tab Reserved (0). |
| |
| @item 24 @tab 4 @tab ASCIZ name of host bus (@samp{ISA} or @samp{PCI}). |
| |
| @item 28 @tab 8 @tab ASCIZ name of interface type (@samp{ATA}, |
| @samp{ATAPI}, @samp{SCSI}, @samp{USB}, @samp{1394} or @samp{FIBRE}). |
| |
| @item 30 @tab 8 @tab Interface Path. |
| |
| @item 38 @tab 8 @tab Device Path. |
| |
| @item 40 @tab 1 @tab Reserved (0). |
| |
| @item 41 @tab 1 @tab Checksum of bytes 1Eh-40h (2's complement of sum, |
| which makes the 8 bit sum of bytes 1Eh-41h equal to 00h). |
| @end multitable |
| |
| The information flags are: |
| |
| @multitable @columnfractions 0.15 0.85 |
| @item Bit(s) @tab Description |
| |
| @item 0 @tab DMA boundary errors handles transparently. |
| |
| @item 1 @tab CHS information is valid. |
| |
| @item 2 @tab Removable drive. |
| |
| @item 3 @tab Write with verify supported. |
| |
| @item 4 @tab Drive has change-line support (required if drive is |
| removable). |
| |
| @item 5 @tab Drive can be locked (required if drive is removable). |
| |
| @item 6 @tab CHS information set to maximum supported values, not |
| current media. |
| |
| @item 7-15 @tab Reserved (0). |
| @end multitable |
| |
| |
| @node MBR |
| @chapter The structure of Master Boot Record |
| |
| A Master Boot Record (@dfn{MBR}) is the sector at cylinder 0, head 0, |
| sector 1 of a hard disk. A MBR-like structure must be created in each of |
| partitions by the FDISK program. |
| |
| At the completion of your system's Power On Self Test (@dfn{POST}), INT |
| 19H is called. Usually INT 19 tries to read a boot sector from the first |
| floppy drive@footnote{Which drive is read first depends on your BIOS |
| settings.}. If a boot sector is found on the floppy disk, that boot |
| sector is read into memory at location 0000:7C00 and INT 19H jumps to |
| memory location 0000:7C00. However, if no boot sector is found on the |
| first floppy drive, INT 19H tries to read the MBR from the first hard |
| drive. If an MBR is found it is read into memory at location 0000:7C00 |
| and INT 19H jumps to memory location 0000:7C00. The small program in the |
| MBR will attempt to locate an active (bootable) partition in its |
| partition table@footnote{This behavior is DOS MBR's, and GRUB ignores |
| the active flag.}. The small program in the boot sector must locate the |
| first part of the operating system's kernel loader program (or perhaps |
| the kernel itself or perhaps a @dfn{boot manager program}) and read that |
| into memory. |
| |
| INT 19H is also called when the @key{CTRL}-@key{ALT}-@key{DEL} keys are |
| used. On most systems, @key{CTRL}-@key{ALT}-@key{DEL} causes an short |
| version of the POST to be executed before INT 19H is called. |
| |
| The stuff is: |
| |
| @table @asis |
| @item Offset 0000 |
| The address where the MBR code starts. |
| |
| @item Offset 01BE |
| The address where the partition table starts (@pxref{Partition table}). |
| |
| @item Offset 01FE |
| The signature, AA55. |
| @end table |
| |
| However, the first 62 bytes of a boot sector are known as the BIOS |
| Parameter Block (@dfn{BPB}), so GRUB cannot use these bytes for its own |
| purpose. |
| |
| If an active partition is found, that partition's boot record is read |
| into 0000:7C00 and the MBR code jumps to 0000:7C00 with @code{SI} |
| pointing to the partition table entry that describes the partition being |
| booted. The boot record program uses this data to determine the drive |
| being booted from and the location of the partition on the disk. |
| |
| The first byte of an active partition table entry is 80. This byte is |
| loaded into the @code{DL} register before INT 13H is called to read the |
| boot sector. When INT 13H is called, @code{DL} is the BIOS device |
| number. Because of this, the boot sector read by this MBR program can |
| only be read from BIOS device number 80 (the first hard disk). This is |
| one of the reasons why it is usually not possible to boot from any other |
| hard disk. |
| |
| |
| @node Partition table |
| @chapter The format of partition table |
| |
| @menu |
| * Partition basics:: Overview the partition table |
| * Partition types:: The list of the @dfn{type} code |
| * Partition entry format:: The format of the table entry |
| * Partition table rules:: Some basic rules for partition table |
| @end menu |
| |
| |
| @node Partition basics |
| @section Overview the partition table |
| |
| FDISK creates all partition records (sectors). The primary purpose of a |
| partition record is to hold a partition table. The rules for how FDISK |
| works are unwritten but so far most FDISK programs seem to follow the |
| same basic idea. |
| |
| First, all partition table records (sectors) have the same format. This |
| includes the partition table record at cylinder 0, head 0, sector 1 -- |
| what is known as the Master Boot Record (MBR). The last 66 bytes of a |
| partition table record contain a partition table and a 2 byte |
| signature. The first 446 bytes of these sectors usually contain a |
| program but only the program in the MBR is ever executed (so extended |
| partition table records could contain something other than a program in |
| the first 466 bytes). For more information, see @ref{MBR}. |
| |
| Second, extended partitions are @emph{nested} inside one another and |
| extended partition table records form a @dfn{linked list}. I will |
| attempt to show this in a diagram at @ref{Partition entry format}. |
| |
| Each partition table entry is 16 bytes and contains things like the |
| start and end location of a partition in CHS, the start in LBA, the size |
| in sectors, the partition @dfn{type} and the @dfn{active} flag. Older |
| versions of FDISK may compute incorrect LBA or size values. And when |
| your computer boots itself, only the CHS fields of the partition table |
| entries are used (another reason LBA doesn't solve the >528MB |
| problem). The CHS fields in the partition tables are in L-CHS format, |
| see @ref{CHS Translation}. |
| |
| |
| @node Partition types |
| @section The list of the @dfn{type} code |
| |
| There is no central clearing house to assign the codes used in the one |
| byte @dfn{type} field. But codes are assigned (or used) to define most |
| every type of file system that anyone has ever implemented on the x86 |
| PC: 12-bit FAT, 16-bit FAT, HPFS, NTFS, etc. Plus, an extended partition |
| also has a unique type code. |
| |
| In the FDISK program @samp{sfdisk}, the following list is assumed: |
| |
| @table @asis |
| @item 00 |
| Empty |
| |
| @item 01 |
| DOS 12-bit FAT |
| |
| @item 02 |
| XENIX / |
| |
| @item 03 |
| XENIX /usr |
| |
| @item 04 |
| DOS 16-bit FAT <32M |
| |
| @item 05 |
| DOS Extended |
| |
| @item 06 |
| DOS 16-bit FAT >=32M |
| |
| @item 07 |
| HPFS / NTFS |
| |
| @item 08 |
| AIX boot or SplitDrive |
| |
| @item 09 |
| AIX data or Coherent |
| |
| @item 0A |
| OS/2 Boot Manager |
| |
| @item 0B |
| Windows95 FAT32 |
| |
| @item 0C |
| Windows95 FAT32 (LBA) |
| |
| @item 0E |
| Windows95 FAT16 (LBA) |
| |
| @item 0F |
| Windows95 Extended (LBA) |
| |
| @item 10 |
| OPUS |
| |
| @item 11 |
| Hidden DOS FAT12 |
| |
| @item 12 |
| Compaq diagnostics |
| |
| @item 14 |
| Hidden DOS FAT16 |
| |
| @item 16 |
| Hidden DOS FAT16 (big) |
| |
| @item 17 |
| Hidden HPFS/NTFS |
| |
| @item 18 |
| AST Windows swapfile |
| |
| @item 24 |
| NEC DOS |
| |
| @item 3C |
| PartitionMagic recovery |
| |
| @item 40 |
| Venix 80286 |
| |
| @item 41 |
| Linux/MINIX (sharing disk with DRDOS) |
| |
| @item 42 |
| SFS or Linux swap (sharing disk with DRDOS) |
| |
| @item 43 |
| Linux native (sharing disk with DRDOS) |
| |
| @item 50 |
| DM (disk manager) |
| |
| @item 51 |
| DM6 Aux1 (or Novell) |
| |
| @item 52 |
| CP/M or Microsoft SysV/AT |
| |
| @item 53 |
| DM6 Aux3 |
| |
| @item 54 |
| DM6 |
| |
| @item 55 |
| EZ-Drive (disk manager) |
| |
| @item 56 |
| Golden Bow (disk manager) |
| |
| @item 5C |
| Priam Edisk (disk manager) |
| |
| @item 61 |
| SpeedStor |
| |
| @item 63 |
| GNU Hurd or Mach or Sys V/386 (such as ISC UNIX)@footnote{But the reason |
| why they decided that 63 means GNU Hurd is not known. Do not use 63 for |
| GNU Hurd.} |
| |
| @item 64 |
| Novell Netware 286 |
| |
| @item 65 |
| Novell Netware 386 |
| |
| @item 70 |
| DiskSecure Multi-Boot |
| |
| @item 75 |
| PC/IX |
| |
| @item 77 |
| QNX4.x |
| |
| @item 78 |
| QNX4.x 2nd part |
| |
| @item 79 |
| QNX4.x 3rd part |
| |
| @item 80 |
| MINIX until 1.4a |
| |
| @item 81 |
| MINIX / old Linux |
| |
| @item 82 |
| Linux swap |
| |
| @item 83 |
| Linux native@footnote{This is not true. Use 83 for ext2fs even if the |
| owner OS is GNU/Hurd.} |
| |
| @item 84 |
| OS/2 hidden C: drive |
| |
| @item 85 |
| Linux extended |
| |
| @item 86 |
| NTFS volume set |
| |
| @item 87 |
| NTFS volume set |
| |
| @item 93 |
| Amoeba |
| |
| @item 94 |
| Amoeba BBT |
| |
| @item A0 |
| IBM Thinkpad hibernation |
| |
| @item A5 |
| BSD/386 |
| |
| @item A7 |
| NeXTSTEP 486 |
| |
| @item B7 |
| BSDI fs |
| |
| @item B8 |
| BSDI swap |
| |
| @item C1 |
| DRDOS/sec (FAT-12) |
| |
| @item C4 |
| DRDOS/sec (FAT-16, < 32M) |
| |
| @item C6 |
| DRDOS/sec (FAT-16, >= 32M) |
| |
| @item C7 |
| Syrinx |
| |
| @item DB |
| CP/M or Concurrent CP/M or Concurrent DOS or CTOS |
| |
| @item E1 |
| DOS access or SpeedStor 12-bit FAT extended partition |
| |
| @item E3 |
| DOS R/O or SpeedStor |
| |
| @item E4 |
| SpeedStor 16-bit FAT extended partition < 1024 cyl. |
| |
| @item F1 |
| SpeedStor |
| |
| @item F2 |
| DOS 3.3+ secondary |
| |
| @item F4 |
| SpeedStor large partition |
| |
| @item FE |
| SpeedStor >1024 cyl. or LANstep |
| |
| @item FF |
| Xenix Bad Block Table |
| @end table |
| |
| @node Partition entry format |
| @section The format of the table entry |
| |
| The 16 bytes of a partition table entry are used as follows: |
| |
| @example |
| @group |
| +--- Bit 7 is the active partition flag, bits 6-0 are zero. |
| | |
| | +--- Starting CHS in INT 13 call format. |
| | | |
| | | +--- Partition type byte. |
| | | | |
| | | | +--- Ending CHS in INT 13 call format. |
| | | | | |
| | | | | +-- Starting LBA. |
| | | | | | |
| | | | | | +-- Size in sectors. |
| | | | | | | |
| v <--+---> v <--+--> v v |
| |
| 0 1 2 3 4 5 6 7 8 9 A B C D E F |
| DH DL CH CL TB DL CH CL LBA..... SIZE.... |
| |
| 80 01 01 00 06 0e be 94 3e000000 0c610900 1st entry |
| |
| 00 00 81 95 05 0e fe 7d 4a610900 724e0300 2nd entry |
| |
| 00 00 00 00 00 00 00 00 00000000 00000000 3rd entry |
| |
| 00 00 00 00 00 00 00 00 00000000 00000000 4th entry |
| @end group |
| @end example |
| |
| Bytes 0-3 are used by the small program in the Master Boot Record to |
| read the first sector of an active partition into memory. The @dfn{DH}, |
| @dfn{DL}, @dfn{CH} and @dfn{CL} above show which x86 register is loaded |
| when the MBR program calls INT 13H AH=02h to read the active partition's |
| boot sector. For more information, see @ref{MBR}. |
| |
| These entries define the following partitions: |
| |
| @enumerate |
| @item |
| The first partition, a primary partition DOS FAT, starts at CHS 0H,1H,1H |
| (LBA 3EH) and ends at CHS 294H,EH,3EH with a size of 9610CH sectors. |
| |
| @item |
| The second partition, an extended partition, starts at CHS 295H,0H,1H |
| (LBA 9614AH) and ends at CHS 37DH,EH,3EH with a size of 34E72H sectors. |
| |
| @item |
| The third and fourth table entries are unused. |
| @end enumerate |
| |
| |
| @node Partition table rules |
| @section Some basic rules for partition table |
| |
| Keep in mind that there are @emph{no} written rules and @emph{no} |
| industry standards on how FDISK should work but here are some basic |
| rules that seem to be followed by most versions of FDISK: |
| |
| @enumerate |
| @item |
| In the MBR there can be 0-4 @dfn{primary} partitions, OR, 0-3 primary |
| partitions and 0-1 extended partition entry. |
| |
| @item |
| In an extended partition there can be 0-1 @dfn{secondary} partition |
| entries and 0-1 extended partition entries. |
| |
| @item |
| Only 1 primary partition in the MBR can be marked @dfn{active} at any |
| given time. |
| |
| @item |
| In most versions of FDISK, the first sector of a partition will be |
| aligned such that it is at head 0, sector 1 of a cylinder. This means |
| that there may be unused sectors on the track(s) prior to the first |
| sector of a partition and that there may be unused sectors following a |
| partition table sector. |
| |
| For example, most new versions of FDISK start the first partition |
| (primary or extended) at cylinder 0, head 1, sector 0. This leaves the |
| sectors at cylinder 0, head 0, sectors 2...n as unused sectors. This |
| same layout may be seen on the first track of an extended partition. |
| See example 2 below. |
| |
| Also note that software drivers like Ontrack's Disk Manager depend on |
| these unused sectors because these drivers will @dfn{hide} their code |
| there (in cylinder 0, head 0, sectors 2...n). This is also a good place |
| for boot sector virus programs to hang out. |
| |
| @item |
| The partition table entries (slots) can be used in any order. Some |
| versions of FDISK fill the table from the bottom up and some versions of |
| FDISK fill the table from the top down. Deleting a partition can leave |
| an unused entry (slot) in the middle of a table. |
| |
| @item |
| And then there is the @dfn{hack} that some newer OS's (OS/2 and Linux) |
| use in order to place a partition spanning or passed cylinder 1024 on a |
| system that does not have a CHS translating BIOS. These systems create a |
| partition table entry with the partition's starting and ending CHS |
| information set to all FFH. The starting and ending LBA information is |
| used to describe the location of the partition. The LBA can be converted |
| back to a CHS --- most likely a CHS with more than 1024 cylinders. Since |
| such a CHS can't be used by the system BIOS, these partitions can not be |
| booted or accessed until the OS's kernel and hard disk device drivers |
| are loaded. It is not known if the systems using this @dfn{hack} follow |
| the same rules for the creation of these type of partitions. |
| @end enumerate |
| |
| There are @emph{no} written rules as to how an OS scans the partition |
| table entries so each OS can have a different method. For DOS, this |
| means that different versions could assign different drive letters to |
| the same FAT file system partitions. |