Difference between revisions of "Contended memory"
(Copy page from tech wiki)
Revision as of 09:58, 20 October 2010
Contended memory is a quirk of the ZX Spectrum's hardware design which means that it is on average slower to access those memory areas which are shared with the ULA than it is to access other memory areas. This occurs because the RAM cannot be read by two devices (the ULA and the processor) at once, and the ULA is given higher priority so it can drawn the screen correctly. Therefore, programs which access this "contended memory" (which is from 0x4000 to 0x7fff on the 48K machine, which is the actual 16k version) or try to read from an IO port where the result is provided by the ULA (any port with the low bit reset on the 48K machine) will be slowed if the ULA is reading the screen. This effect occurs only when the actual screen is being drawn; when the border is being drawn or the TV is in either horizontal or vertical refresh, the ULA does not need to access memory and therefore no delays occur.
In order for the ULA to be able to access the memory it needs without problems being caused by the Z80 attempting to access the memory at the same time, the ULA arranges for the Z80 to be temporarily paused if the Z80 attempts to access the appropriate memory or IO ports; the exact details of which memory is affected and at which times is given in the Details section below. For memory access, this happens on the first tstate (T1) of any instruction fetch, memory read or memory write operation. For IO operations, this can happen on all tstates; see the Contended IO article for the specifics. The table below gives the pattern of contention that is applied for each opcode, which is essentially equivalent to when T1 operations happen in each instruction.
On the 48K machine, the memory from 0x4000 to 0x7fff is contended. If the contended memory is accessed 14335 or 14336 tstates after an interrupt (see the timing differences section below for information on the 14335/14336 issue), the Z80 will be delayed for 6 tstates. After 14336 tstates, the delay is 5 tstates. The pattern continues as follows:
|14335||6 (until 14341)|
|14336||5 (until 14341)|
|14337||4 (until 14341)|
|14338||3 (until 14341)|
|14339||2 (until 14341)|
|14340||1 (until 14341)|
|14343||6 (until 14349)|
|14344||5 (until 14349)|
|14345||4 (until 14349)|
|14346||3 (until 14349)|
|14347||2 (until 14349)|
|14348||1 (until 14349)|
This pattern (6,5,4,3,2,1,0,0) continues until 14463 tstates after interrupt, at which point there is no delay for 96 tstates while the border and horizontal refresh are drawn. The pattern starts again at 14559 tstates and continues for all 192 lines of screen data. After this, there is no delay until the end of the frame as the bottom border and vertical refresh happen, and no delay until 14335 tstates after the start of the next frame as the top border is drawn.
128K / +2
On the 128K and +2 Spectrums, memory pages 1, 3, 5 and 7 are contended. This means that RAM from 0x4000 to 0x7fff is always contended (as memory page 5 is always mapped in there) and RAM from 0xc000 to 0xffff can be contended if page 1, 3, 5 or 7 is paged in there. The 128K and +2 Spectrums also have a different timing pattern from the 48K machine due to their different line and frame lengths: the 6,5,4,3,2,1,0,0 pattern starts 14361 tstates after interrupt, and repeats every 228 tstates rather than 224.
+2A / +3
The +2A / +3 ULA differs more significantly in that it applies less contention than the 48K or 128K ULAs. Specifically, it applies contention only if the MREQ line is active, whereas the 48K ULA applies it under all circumstances. In the table below, contention patterns which differ on the +3 are shown in italics. The timing pattern also differs significantly:
|14361||1 (until 14362)|
|14363||7 (until 14370)|
|14364||6 (until 14370)|
|14365||5 (until 14370)|
|14366||4 (until 14370)|
|14367||3 (until 14370)|
|14368||2 (until 14370)|
|14369||1 (until 14370)|
|14371||7 (until 14378)|
|14372||6 (until 14378)|
|14373||5 (until 14378)|
|14374||4 (until 14378)|
|14375||3 (until 14378)|
|14376||2 (until 14378)|
The pattern repeats until 14990 tstates, when the first scanline has been finished, after which no delays are inserted until 14589 tstates when the pattern begins again.
The NTSC Spectrum has the same 6,5,4,3,2,1,0,0 contention pattern as the 48K machine, but starting at tstate 8959 rather than 14335.
Both these refer to a 48K Spectrum.
Example 1: if PC = 25000, HL = 26000, the instruction at address 25000 is LD (HL),A and we're at tstate 14335:
- Delay for 6 tstates (the contention delay for tstate 14335); now at tstate 14341.
- 4 tstates fetching the opcode; now at tstate 14345.
- Delay for 4 tstates (delay for tstate 14345); now at tstate 14349.
- 3 tstates storing the byte; now at tstate 14352.
The next opcode will then be read at tstate 14352.
Example 2: the same setup as example 1, except with PC=40000 (not contended):
- No delay because PC is not contended.
- 4 tstates fetching the opcode; now at cycle 14339.
- Delay for 2 tstates (for tstate 14339); now at tstate 14341;
- 3 tstates storing the byte; now at tstate 14344.
It has been observed that on some machines, the timings have been observed to be consistently one tstate later than on other machines. All timings given in this document are for "early timing" machines; for late timing machines, simply add one to add tstate counts given.
The physical reason for this difference is not well understood; in some emulators, the option for changing this behaviour refers to Zilog or clone CPUs, but both behaviours have been seen with both genuine Zilog and clone CPUs.
In this below:
- dd is any of the registers BC,DE,HL,SP
- qq is any of the registers BC,DE,HL,AF
- ss is any of the registers BC,DE,HL
- ii is either of the index registers IX or IY.
- ir is the IR (Interrupt and Refresh) register pair
- cc is any (applicable) condition NZ,Z,NC,C,PO,PE,P,M
- nn is a 16-bit number
- n is an 8-bit number
- b is a number from 0 to 7 (BIT/SET/RES instructions)
- r and r' are any of the registers A,B,C,D,E,H,L
- alo is an arithmetic or logical operation: ADD/ADC/SUB/SBC/AND/XOR/OR and CP
- sro is a shift/rotate operation: RLC/RRC/RL/RR/SLA/SRA/SRL and SLL (undocumented)
- The values for the registers listed in the table below are relative to the starting value of the register when the instruction is about to be executed.
- For conditional instructions, entries in [square brackets] are applied only if the condition is met. If the instruction is not conditional (e.g. CALL nn) the entries in  always apply.
- The replacement of HL by either IX or IY does not affect the timings, except for the addition of an initial pc:4 for the DD or FD prefix; similarly, a DD or FD prefix on an instruction which does not involve HL just adds an initial pc:4.
- The undocumented variants of the doubly shifted DDCB and FDCB opcodes have the same timings as the documented versions.
- In some read-modify-write operations (like INC (HL)), the write operation is always the last one. That may be important to know the exact point in which video is updated, for example. In such instructions that point is annotated for clarity as "(write)" after the address.
- Access to I/O ports is treated differently to access to memory; full details are given Contended IO. The delays specified there should be applied when an I/O port is accessed; this is designated by "IO" in the table below.
|Opcode||48K/128 ULA||+3 ULA|
|INC/DEC dd||pc:4,ir:1 x 2||pc:6|
|ADD HL,dd||pc:4,ir:1 x 7||pc:11|
|ADC HL,dd||pc:4,pc+1:4,ir:1 x 7||pc:4,pc+1:11|
|LD r,(ii+n)||pc:4,pc+1:4,pc+2:3,pc+2:1 x 5,ii+n:3||pc:4,pc+1:4,pc+2:8,ii+n:3|
|BIT b,(ii+n)||pc+1:4,pc+2:3,pc+3:3,pc+3:1 x 2,ii+n:3,ii+n:1||pc+1:4,pc+2:3,pc+3:5,ii+n:4|
|LD (ii+n),n||pc:4,pc+1:4,pc+2:3,pc+3:3,pc+3:1 x 2,ii+n:3||pc:4,pc+1:4,pc+2:3,pc+3:5,ii+n:3|
|INC/DEC (ii+n)||pc:4,pc+1:4,pc+2:3,pc+2:1 x 5,ii+n:3,ii+n:1,ii+n(write):3||pc:4,pc+1:4,pc+2:8,ii+n:4,ii+n(write):3|
|SET b,(ii+n)||pc:4,pc+1:4,pc+2:3,pc+3:3,pc+3:1 x 2,ii+n:3,ii+n:1,ii+n(write):3||pc:4,pc+1:4,pc+2:3,pc+3:5,ii+n:4,ii+n(write):3|
|JR n||pc:4,pc+1:3,[pc+1:1 x 5]||pc:4,pc+1:3,|
|DJNZ n||pc:4,ir:1,pc+1:3,[pc+1:1 x 5]||pc:5,pc+1:3,|
|RLD||pc:4,pc+1:4,hl:3,hl:1 x 4,hl(write):3||pc:4,pc+1:4,hl:7,hl(write):3|
|EX (SP),HL||pc:4,sp:3,sp+1:3,sp+1:1,sp+1(write):3,sp(write):3,sp(write):1 x 2||pc:4,sp:3,sp+1:4,sp+1(write):3,sp(write):5|
|LDI/LDIR||pc:4,pc+1:4,hl:3,de:3,de:1 x 2,[de:1 x 5]||pc:4,pc+1:4,hl:3,de:5,|
|CPI/CPIR||pc:4,pc+1:4,hl:3,hl:1 x 5,[hl:1 x 5]||pc:4,pc+1:4,hl:8,|
|INI/INIR||pc:4,pc+1:4,ir:1,IO,hl:3,[hl:1 x 5]||pc:4,pc+1:5,IO,hl:3,|
|OUTI/OTIR||pc:4,pc+1:4,ir:1,hl:3,IO,[bc:1 x 5]||pc:4,pc+1:5,hl:3,IO,|
- In this document, we label the first tstate which begins with /INT low as tstate 0; some other resources label this tstate as tstate 1, which means that all tstate counts will be one greater. Note that this is purely a notational difference, and is not the same as the effect observed in the timing differences section, which is a actual difference in behaviour between different machines; when using the notation which labels the first /INT low tstate as tstate 1, the first contended memory cycle is at either 14336 or 14337 tstates.
- Applies to the unprefixed version of these opcodes (22 and 2A)
- Applies to the prefixed version of these opcodes (ED43, ED4B, ED53, ED5B, ED63, ED6B, ED73 and ED7B)
Article license information