1. Overview
2. Tagged Words
3. Basic Structure Layouts
4. Numbers
5. Procedures, code, and constants
6. Extensions
Larceny bases all its data types on a few basic representations. Type tags accessible to Scheme code distinguish between actual data types which are mapped onto the basic representations.
There are six basic representations: fixnums, immediates, and pointers to pairs, vector-like structures, bytevector-like structures, and procedures. Each representation consists of a tagged word and, in the case of pointer types, heap-allocated memory.
A tagged word has a 2 or 3 bit type tag in the low order bits:
xxxx xxxx xxxx xxxx xxxx xxxx xxxx xx00 fixnum xxxx xxxx xxxx xxxx xxxx xxxx xxxx xx10 immediate pppp pppp pppp pppp pppp pppp pppp p001 pointer to pair pppp pppp pppp pppp pppp pppp pppp p011 pointer to vector struct pppp pppp pppp pppp pppp pppp pppp p101 pointer to bytevector struct pppp pppp pppp pppp pppp pppp pppp p111 pointer to procedure struct
Immediates are further distinguished by more bits in the low-order byte;
they are roughly divided into constants (booleans, empty list, characters,
and miscellaneous system constants) and headers:
0000 0000 0000 0000 0000 0000 0000 0010 #f 0000 0000 0000 0000 0000 0000 0000 0110 #t 0000 0000 0000 0000 0000 0000 0000 1010 empty list xxxx xxxx xxxx xxxx xxxx xxxx 0001 0110 miscellaneous 0000 0000 cccc cccc 0000 0000 0010 0110 character 0sss ssss ssss ssss ssss ssss 100x xx10 reserved header 0sss ssss ssss ssss ssss ssss 101x xx10 vector-like header 0sss ssss ssss ssss ssss ssss 110x xx10 bytevector-like header 0sss ssss ssss ssss ssss ssss 1111 1110 procedure header
The bits marked xxx
in the low byte of a header are fields
for type tags: they are used by Scheme code to distinguish between
different data types mapped onto the basic structures. The type tags
can be manipulated with the typetag
and
typetag-set!
primitives.
The bits marked s
in a header contain the size of the data
structure in bytes, not including the header word, and not including
any padding. For vector-like structures and procedures, the two low
bits of the length field will be 0.
Miscellaneous constants include #!unspecified,
,
#!undefined
, and #!eof
.
+------------------------------+--+ | fixnum |00| +------------------------------+--+Figure 1: fixnum.
Bignums (large exact integers) are bytevector-like with the sign in the first two bytes (0 for positive, 1 for negative), followed by a digit count (two bytes) and then base-$2^{32}$ digits in the next words, with the least significant word first; each word is stored big-endian (figure 2). While bignums cannot be 0 in user code, system code sometimes creates bignums of value 0 in an internal calculation. A bignum with value 0 is distinguished by a digitcount of 0; the sign is immaterial.
+------------------------+--------+ | length | hdrtag | +------------------------+--------+ | sign | digitcount | +---------------------------------+ | lsd | +---------------------------------+ ...Figure 2: Bignum with 32-bit bigits.
The rationale for keeping a digit count which is different from the vector length is that while (in particular) the multiplication routine must create a vector whose length is the sum of the digit counts, some of the leading digits may be 0, and hence we can have a lower digit count without having to reallocate memory or use a temporary buffer.
Bignums can also be considered with a different logical layout: Each 32-bit digit can be interpreted as two 16-bit digits, also in big-endian fashion within the word; interpreted this way, the bignum gets a funny access pattern (figure 3). The digit count is still the number of 32-bit digits used; see above discussion for sign and bignums of value 0.
+------------------------+--------+ | length | hdrtag | +------------------------+--------+ | sign | digitcount | +---------------------------------+ | nlsd | lsd | +---------------------------------+ ...Figure 3: Bignum with 16-bit bigits.
Ratnums (exact rationals), shown in figure 4, are vector-like, with the first word of the vector being the numerator as a Scheme object (fixnum or bignum), and the second word being the denominator (greater than 1), also a Scheme object.
+------------------------+--------+ | vectorlength | hdrtag | +------------------------+--------+ | numerator | +---------------------------------+ | denominator | +---------------------------------+Figure 4: ratnum.
Rectnums (exact complexes), shown in figure 5, look like ratnums, except that the first word is the real part (an integer or ratnum) and the second word is the imaginary part (ditto). Both parts are exact reals, and the imaginary part is nonzero.
+------------------------+--------+ | vectorlength | hdrtag | +------------------------+--------+ | real-part | +---------------------------------+ | imag-part | +---------------------------------+Figure 5: Rectnum.
Flonums (inexact reals) are bytevector-like. The first word is unused, and the two next words contain the double in IEEE double precision format. The rationale for the unused word is this: since everything in the system is aligned on an 8-byte boundary, one word will be wasted anyway. By putting it before the number rather than after, the number becomes 8-byte aligned, and we can use a ``load double'' instruction to load it. (Figure 6.)
+------------------------+--------+ | length | hdrtag | +------------------------+--------+ | unused | +---------------------------------+ | IEEE double precision | | | +---------------------------------+Figure 6: Flonum.
Compnums (inexact complexes) are bytevector-like. The first word is unused (see the description of the flonum for a rationale). The two next words contain the real part. The two last words contain the imaginary part. (Figure 7.) Both parts are IEEE double precision numbers.
+------------------------+--------+ | length | hdrtag | +------------------------+--------+ | unused | +---------------------------------+ | IEEE double precision | | (real part) | +---------------------------------+ | IEEE double precision | | (imaginary part) | +---------------------------------+Figure 8: Compnum.