Tuesday, March 13, 2012

Device Drivers, Part 14: A Dive Inside the Hard Disk for Understanding Partitions



Inside the hard drive
This article, which is part of the 
This article, which is part of the series on Linux device drivers, takes you on a tour inside a hard disk.
“Doesn’t it sound like a mechanical engineering subject: The design of the hard disk?” questioned Shweta. “Yes, it does. But understanding it gives us an insight into its programming aspect,” reasoned Pugs, while waiting for the commencement of the seminar on storage systems.
The seminar started with a few hard disks in the presenter’s hand and then a dive into her system, showing the output of fdisk -l (Figure 1).
Partition listing by fdisk
Figure 1: Partition listing by fdisk
The first line shows the hard disk size in human-friendly format and in bytes. The second line mentions the number of logical heads, logical sectors per track, and the actual number of cylinders on the disk — together known as the geometry of the disk.
The 255 heads indicate the number of platters or disks, as one read-write head is needed per disk. Let’s number them, say D1, D2, … D255. Now, each disk would have the same number of concentric circular tracks, starting from the outside to the inside. In the above case, there are 60,801 such tracks per disk. Let’s number them, say T1, T2, … T60801. And a particular track number from all the disks forms a cylinder of the same number. For example, tracks T2 from D1, D2, … D255 will together form the cylinder C2. Now, each track has the same number of logical sectors — 63 in our case, say S1, S2, … S63. And each sector is typically 512 bytes. Given this data, one can actually compute the total usable hard disk size, using the following formula:
Usable hard disk size in bytes = (Number of heads or disks) * (Number of tracks per disk) * (Number of sectors per track) * (Number of bytes per sector, i.e. sector size)
For the disk under consideration, it would be: 255 * 60801 * 63 * 512 bytes = 500105249280 bytes.
Note that this number may be slightly less than the actual hard disk (500107862016 bytes, in our case). The reason is that the formula doesn’t consider the bytes in the last partial or incomplete cylinder. The primary reason for that is the difference between today’s technology of organising the actual physical disk geometry and the traditional geometry representation using heads, cylinders and sectors.
Note that in the fdisk output, we referred to the heads and sectors per track as logical not physical. One may ask that if today’s disks don’t have such physical geometry concepts, then why still maintain them and represent them in a logical form? The main reason is to be able to continue with the same concepts of partitioning, and be able to maintain the same partition table formats, especially for the most prevalent DOS-type partition tables, which heavily depend on this simplistic geometry. Note the computation of cylinder size (255 heads * 63 sectors / track * 512 bytes / sector = 8225280 bytes) in the third line and then the demarcation of partitions in units of complete cylinders.

DOS-type partition tables

This brings us to the next important topic: understanding DOS-type partition tables. But first, what is a partition, and why should we partition? A hard disk can be divided into one or more logical disks, each of which is called a partition. This is useful for organising different types of data separately, for example, different operating system data, user data, temporary data, etc.
So, partitions are basically logical divisions and need to be maintained by metadata, which is the partition table. A DOS-type partition table contains four partition entries, each a 16-byte entry. Each of these entries can be depicted by the following ‘C’ structure:
typedef struct
{
    unsigned char boot_type; // 0x00 - Inactive; 0x80 - Active (Bootable)
    unsigned char start_head;
    unsigned char start_sec:6;
    unsigned char start_cyl_hi:2;
    unsigned char start_cyl;
    unsigned char part_type;
    unsigned char end_head;
    unsigned char end_sec:6;
    unsigned char end_cyl_hi:2;
    unsigned char end_cyl;
    unsigned long abs_start_sec;
    unsigned long sec_in_part;
} PartEntry;
This partition table, followed by the two-byte signature 0xAA55, resides at the end of the disk’s first sector, commonly known as the Master Boot Record (MBR). Hence, the starting offset of this partition table within the MBR is 512 - (4 * 16 + 2) = 446. Also, a 4-byte disk signature is placed at offset 440.
The remaining top 440 bytes of the MBR are typically used to place the first piece of boot code, that is loaded by the BIOS to boot the system from the disk. The part_info.c listing contains these various definitions, along with code for parsing and printing a formatted output of the partition table.
From the partition table entry structure, it could be noted that the start and end cylinder fields are only 10 bits long, thus allowing a maximum of 1023 cylinders only. However, for today’s huge hard disks, this is in no way sufficient. Hence, in overflow cases, the corresponding <head, cylinder, sector> triplet in the partition table entry is set to the maximum value, and the actual value is computed using the last two fields: the absolute start sector number (abs_start_sec) and the number of sectors in this partition (sec_in_part).
The code for this too is in part_info.c:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
 
#define SECTOR_SIZE 512
#define MBR_SIZE SECTOR_SIZE
#define MBR_DISK_SIGNATURE_OFFSET 440
#define MBR_DISK_SIGNATURE_SIZE 4
#define PARTITION_TABLE_OFFSET 446
#define PARTITION_ENTRY_SIZE 16 // sizeof(PartEntry)
#define PARTITION_TABLE_SIZE 64 // sizeof(PartTable)
#define MBR_SIGNATURE_OFFSET 510
#define MBR_SIGNATURE_SIZE 2
#define MBR_SIGNATURE 0xAA55
#define BR_SIZE SECTOR_SIZE
#define BR_SIGNATURE_OFFSET 510
#define BR_SIGNATURE_SIZE 2
#define BR_SIGNATURE 0xAA55
 
typedef struct {
    unsigned char boot_type; // 0x00 - Inactive; 0x80 - Active (Bootable)
    unsigned char start_head;
    unsigned char start_sec:6;
    unsigned char start_cyl_hi:2;
    unsigned char start_cyl;
    unsigned char part_type;
    unsigned char end_head;
    unsigned char end_sec:6;
    unsigned char end_cyl_hi:2;
    unsigned char end_cyl;
    unsigned long abs_start_sec;
    unsigned long sec_in_part;
} PartEntry;
 
typedef struct {
    unsigned char boot_code[MBR_DISK_SIGNATURE_OFFSET];
    unsigned long disk_signature;
    unsigned short pad;
    unsigned char pt[PARTITION_TABLE_SIZE];
    unsigned short signature;
} MBR;
 
void print_computed(unsigned long sector) {
    unsigned long heads, cyls, tracks, sectors;
 
    sectors = sector % 63 + 1 /* As indexed from 1 */;
    tracks = sector / 63;
    cyls = tracks / 255 + 1 /* As indexed from 1 */;
    heads = tracks % 255;
    printf("(%3d/%5d/%1d)", heads, cyls, sectors);
}
 
int main(int argc, char *argv[]) {
    char *dev_file = "/dev/sda";
    int fd, i, rd_val;
    MBR m;
    PartEntry *p = (PartEntry *)(m.pt);
 
    if (argc == 2) {
        dev_file = argv[1];
    }
    if ((fd = open(dev_file, O_RDONLY)) == -1) {
        fprintf(stderr, "Failed opening %s: ", dev_file);
        perror("");
        return 1;
    }
    if ((rd_val = read(fd, &m, sizeof(m))) != sizeof(m)) {
        fprintf(stderr, "Failed reading %s: ", dev_file);
        perror("");
        close(fd);
        return 2;
    }
    close(fd);
    printf("\nDOS type Partition Table of %s:\n", dev_file);
    printf("  B Start (H/C/S)   End (H/C/S) Type  StartSec    TotSec\n");
    for (i = 0; i < 4; i++) {
        printf("%d:%d (%3d/%4d/%2d) (%3d/%4d/%2d)  %02X %10d %9d\n",
            i + 1, !!(p[i].boot_type & 0x80),
            p[i].start_head,
            1 + ((p[i].start_cyl_hi << 8) | p[i].start_cyl),
            p[i].start_sec,
            p[i].end_head,
            1 + ((p[i].end_cyl_hi << 8) | p[i].end_cyl),
            p[i].end_sec,
            p[i].part_type,
            p[i].abs_start_sec, p[i].sec_in_part);
    }
    printf("\nRe-computed Partition Table of %s:\n", dev_file);
    printf("  B Start (H/C/S)   End (H/C/S) Type  StartSec    TotSec\n");
    for (i = 0; i < 4; i++) {
        printf("%d:%d ", i + 1, !!(p[i].boot_type & 0x80));
        print_computed(p[i].abs_start_sec);
        printf(" ");
        print_computed(p[i].abs_start_sec + p[i].sec_in_part - 1);
        printf(" %02X %10d %9d\n", p[i].part_type,
            p[i].abs_start_sec, p[i].sec_in_part);
    }
    printf("\n");
    return 0;
}
As the above is an application, compile it with gcc part_info.c -o part_info, and then run ./part_info /dev/sda to check out your primary partitioning information on /dev/sda. Figure 2 shows the output of ./part_info on the presenter’s system. Compare it with the fdisk output in Figure 1.
Output of ./part_info
Figure 2: Output of ./part_info

Partition types and boot records

Now, as this partition table is hard-coded to have four entries, that’s the maximum number of partitions you can have. These are called primary partitions, each having an associated type in the corresponding partition table entry. These types are typically coined by various OS vendors, and hence sort of map to various OSs like DOS, Minix, Linux, Solaris, BSD, FreeBSD, QNX, W95, Novell Netware, etc., to be used for/with the particular OS. However, this is more a formality than a real requirement.
Besides this, one of the four primary partitions can be labelled as something called an extended partition, which has a special significance. As the name suggests, it is used to further extend hard disk division, i.e., to have more partitions. These are called logical partitions and are created within the extended partition. The metadata of these is maintained in a linked-list format, allowing an unlimited number of logical partitions (at least theoretically).
For that, the first sector of the extended partition, commonly called the Boot Record (BR), is used like the MBR to store (the linked-list head of) the partition table for the logical partitions. Subsequent linked-list nodes are stored in the first sector of the subsequent logical partitions, referred to as the Logical Boot Record (LBR). Each linked-list node is a complete 4-entry partition table, though only the first two entries are used — the first for the linked-list data, namely, information about the immediate logical partition, and the second as the linked list’s next pointer, pointing to the list of remaining logical partitions.
To compare and understand the primary partitioning details on your system’s hard disk, follow the steps (as the root user — hence with care) given below:
./part_info /dev/sda ## Displays the partition table on /dev/sda
fdisk -l /dev/sda ## To display and compare the partition table entries with the above
In case you have multiple hard disks (/dev/sdb, …), hard disk device files with other names (/dev/hda, ……), or an extended partition, you may try ./part_info <device_file_name> on them as well. Trying on an extended partition would give you the information about the starting partition table of the logical partitions.
Right now, we have carefully and selectively played (read-only) with the system’s hard disk. Why carefully? Since otherwise, we may render our system non-bootable. But no learning is complete without a total exploration. Hence, in our next session, we will create a dummy disk in RAM and do destructive exploration on it.

No comments:

Post a Comment

Thank you for your comment