This page is now located at https://syntactically.org/~lucy/curiosities/feat-d128.html

Missing FEATs: D128

The 2022 and 2023 ARM architecture extensions introduced quite a lot of interesting features, but many of them sadly remain without good documentation in the official architecture reference manual. Luckily, we don't just have the manual: we also have ASL and FVPs.

FEAT_D128 introduces a new form of 128-bit (rather than 64-bit) wide memory descriptors, as part of the «VMSAv9-128». What's in those new descriptor formats?

General Outline

The use of the new descriptor format is controlled by TCR2_ELx.D128 for Stage 1 translations, except that it cannot be enabled for the EL2 translation regime (i.e. the single-stage single-privilege-level translation regime used by nVHE EL2 code). When it is enabled, several other translation features must be enabled, as shown by a constraint that the corresponding TCR2_ELx bits are RES1 in this situation, namely PnCH (part of the translation hardening extension, which I will discuss in another post), AIE, indicating memory attribute indices are 4, rather than 3, bits, and most interestingly PIE: the new descriptor formats are reliant on the permission indexing extensions, and the new formats do not contain the legacy AP[2:1] bits.

For Stage 2 translations, the use of the new descriptor format is controlled by VTCR_EL2.D128. As in stage 1, VTCR_EL2.S2PIE is also RES1 when this is enabled. Strangely, VTCR_EL2.AssuredOnly is RES0 when D128 is enabled, but the AssuredOnly bit (in bit 114 rather than bit 58, where it is located for 64-bit descriptors) is nevertheless unconditionally enabled when D128 is enabled.

Before considering the actual details of the descriptor formats, it is worth noting that the doubling in size of the descriptors naturally also halves the number of them that fit within one page. Rather than make a table at a given level take up two pages, the architecture reduces the number of bits resolved with each table lookup. For example, a configuration with 48-bit input addresses and 4k pages would usually use four levels of tables: there are \(\oldstyle 36\) (\(\oldstyle 48 - 12\)) bits of input address to resolve, and since each 4k table can contain \(\oldstyle 2^9\) (\(\oldstyle 512 = 4096/8\)) 8-byte descriptors, each table resolves 9 bits leading to four tables. With D128 enabled, however, each table contains only 256 16-byte descriptors, and so each level of lookup resolves only 8 bits of the input address, requiring 5 table lookups to resolve the full address.

Table Descriptors

127 126 125 124 123 122 115 114 113 112 111 110 109 108 96
NSTable
APTable
*XNTable
Protected or AssuredOnly
DisCH
skl
95 64
63 56 55 32
Next-level table address
31 \(m\) \(m-1\) 12 11 10 9 7 6 5 1 0
Next-level table address A nT 1
Table Descriptor Format

Turning our attention first to the table descriptor format, the most obvious change, beyond the extension of the next-level table's physical address to 56 bits, is the removal of bit 1's function of indicating whether this descriptor is intended for a table or a block. This function has instead been taken on by the two-bit skl field, which specifies a number of descriptor levels, after this one, to skip. A descriptor is treated as a block descriptor if its skip-level field indicates that the next descriptor would be for Level 4 (n.b. as in the VMSAv8-64, the maximum lookup level is 3, and translation configurations which require more than 4 levels begin at levels below 0).

Unlike the traditional block/table descriptor dichotomy, however, earlier table descriptors can also contain non-zero skl fields. In these cases, the number of bits of input address resolved by the next table is increased, with a concomitant increase in table size beyond one page. TTBRx_ELx (and VTTBR_EL2) also contain SKLx (resp. SKL) fields which behave in a similar way to skip some number of initial levels of lookup.

The assignment of bits 123 to 126 as various XNTable and APTable bits is unclear: the ASL (in AArch64.S1ApplyTablePerms) clearly intentionally saves these bits as APTable/XNTable/PXNTable/UXNTable, but the saved values are never used, since D128 implies PIE in every regime.

The FEAT_THE Protected (Stage 1)/AssuredOnly (Stage 2) bit is moved from bit 52 to bit 114, presumably to make room for the larger physical addresses.

The nT bit used to avoid break-before-make when replacing a block descriptor with a table descriptor or vice versa, previously assigned bit 16 of leaf descriptors, is given bit 6 of table descriptors, presumably since the skl field makes «blocks of tables» possible.

The DisCH bit is entirely new, and, for Stage 1 translations only, disables the effect of the Contiguous bit in leaf descriptors under this table. Similarly, the A bit is new and, for both Stage 1 and Stage 2 translations, provides a table-level Access flag, which can be managed by hardware.

Leaf Descriptors

127 126 125 124 121 120 119 118 115 114 113 112 111 110 109 108 96
NS POI PII P G C skl
95 64
63 56 55 32
Base address
31 \(m\) \(m-1\) 12 11 10 9 8 7 6 5 2 1 0
Base address
NSE or nG or FnXS
AF SH
nDirty
nT AttrIndex 1
Leaf Descriptor Format

The changes to the leaf descriptors follow a similar pattern, with fewer feature additions. The skip-level bits are in the same position and continue to replace the function of bit 1. The NS bit is moved to bit 127, consistent with the table descriptor format above. Since the 128-bit descriptors are used only with permission indirection enabled, the AP[2:1] bits are no longer necessary, and bit 6 is reassigned to nT while bit 7 unconditionally reprises its role as the dirty bit. The memory attribute index is extended to 4 bits at Stage 1, as well as Stage 2, taking over the NS bit's former position, and the permission overlay index is also extended to 4 bits. The permission indirection bits are consolidated into bits 115 to 118, rather than being spread throught bits 6, 51, 53, and 54 as they were before. The Protected bit occupies the same position as it does for table descriptors; the Guarded bit (part of FEAT_GCS) is next to it in bit 113, and the Contiguous bit is moved to bit 111. Altogether rather pedestrian changes in return for doubling the size of descriptors!