Sunday, 29 January 2017

Frame Rates and I76 Nitro

A Problem with Old Games on Modern Systems

One of the persistent problems the I76 & Nitro games have had on modern systems is the frame rate.
The game uses the graphics frame rate to control physics and timing. Unfortunately on a modern system the frame rate is much higher than the original authors expected and the assumptions of the engine break. This makes for some odd behaviours, and impossible jumps.

However the I76 community have solved this problem by manually modifying the Nitro binaries to add a frame rate limiter.

Wait, they've done what?!

The Files

There are a few different patch files available on the interstate 76 site.

  • Nitro Patch I
  • Nitro Patch II
  • Nitro Patch III
  • Nitro 30fps Patch

Patches I - III provide updated executables and libraries; they're replacements for the versions shipped with the game.

The 30fps patch file is slightly different; it contains a single replacement nitro executable provided by the developer known as BofH, who is is also responsible for the Patch III release.

The executables in patch II & III are identical, and the 30fps version of executable is based on these.

The Change

There's a couple of changes in the binary, but the particularly interesting one for the frame rate affects the main render loop and modifying some zero data at the end of the file into a block of code, effectively implementing a binary trampoline. This change by BofH, welding into the existing binary, is (IMO) incredibly clever.

Looking at the decompiled binary code using Snowman then the giveaway to locating the completed frame is the frame dump function. This is located at 0x488ed0 and is called from 0x433069. It's easy to spot by the reference to the string literal "SCRDUMP.BMP". It's part of a monster of a function starting at 0x431760, that includes the main render loop.

In this section there's a change at 0x4327c8, where we redirect to the the trampoline.

The original code has a compare followed by a conditional jump

4327c8: cmp dword [0x4f30cc], 0x5
4327cf: jnz dword 0x43311d

The new code is different - it has an unconditional call to 0x4c02a0 replacing the compare.

4327c8: jmp dword 0x4c02a0
4327cd: nop
4327ce: nop
4327cf: jnz dword 0x43311d

So, what's at 0x4c02a0? In the original nitro executable Snowman thinks this is filled with "add [eax], al", which is basically a chunk of binary filled with 0 data.

In the new nitro executable this has been replaced with something more interesting. Snowman actually messes up the opcodes slightly - it should look like this though...

4c02a0: push ebx
4c02a1: push ecx
4c02a2: push edx
4c02a3: call dword [0x4c111c]
4c02a9: mov ebx, [0x4cc4f8]
4c02af: test ebx, ebx
4c02b1: jnz 0x4c02b5
4c02b3: jmp 0x4c02de
4c02b5: mov edx, eax
4c02b7: sub edx, ebx
4c02b9: mov ecx, [0x4c02f4]
4c02bf: sub ecx, edx
4c02c1: mov ebx, [0x4cc4f4]
4c02c7: add ebx, ecx
4c02c9: test ebx, ebx
4c02cb: js 0x4c0300
4c02cd: mov [0x4cc4f4], ebx
4c02d3: push ebx
4c02d4: mov ebx, eax
4c02d6: call dword [0x675139]
4c02dc: mov eax, ebx
4c02de: mov [0x4cc4f8], eax
4c02e3: cmp dword [0x4f30cc], 0x5
4c02ea: pop edx
4c02eb: pop ecx
4c02ec: pop ebx
4c02ed: jmp dword 0x4327cf
4c02f2: add [eax], al
4c02f4: and [eax], eax

The short version:

This loop makes two external calls; one to GetTickCount() and one to Sleep() - it's basically a time check and delay calculation which inserts a delay between frame renders.

The value at 0x4c02f4 is actually a data value of "33" - this sets the optimal frame delay of 33mS, or ~30fps. Changing this value winds the delay target value up or down.

The tail end of this code restores the context and duplicates the test that this modification removed before returning to the jnz, so the subsequent graphics loop processing is otherwise unchanged.

The long version

Let's break this down:

4c02a0: push ebx
4c02a1: push ecx
4c02a2: push edx

Entry context save

4c02a3: call dword [0x4c111c]

This is a call to GetTickCount() - so we retrieve the number of milliseconds.

4c02a9: mov ebx, [0x4cc4f8]
4c02af: test ebx, ebx
4c02b1: jnz 0x4c02b5
4c02b3: jmp 0x4c02de

This recovers a value from 0x4cc4f8 and skips the next part of the processing if it's zero; tracing through the code we can see this value gets the latest timer tick count stored back in it, so it's a simple test for a previous tick value, and obviously if there isn't one then the frame delay calculation is pointless.

4c02b5: mov edx, eax
4c02b7: sub edx, ebx

And this subtracts the current from the previous tick value - i.e. it tells us how many mS between renders.

4c02b9: mov ecx, [0x4c02f4]
4c02bf: sub ecx, edx
4c02c1: mov ebx, [0x4cc4f4]
4c02c7: add ebx, ecx

Looking forwards then there are two values here - 0x4cc4f4 is set to the "previous sleep time", and 0x4c02f4 is a constant value, 33. So the calculation is (mS throughout): "new sleep" = "previous sleep" + (33 - "time difference").
So, ideally "previous sleep" will stabilise at the target inter-frame delay for a given frame render time: think of it as "new delay = delay + error". Carrying on:

4c02c9: test ebx, ebx
4c02cb: js 0x4c0300

This tests for a delay below zero, in which case it jumps to this code fragment:

...
4c0300: mov ebx, 0x0
4c0305: jmp 0x4c02cd

This sets a zero delay value and jumps straight back. In this case we want to render as fast as possible because we're under the target frame rate.

4c02cd: mov [0x4cc4f4], ebx

Stores the requested delay value in 0x4cc4f4 for the next loop.

4c02d3: push ebx
4c02d4: mov ebx, eax

Sets up the delay and stores off the tick count.

4c02d6: call dword [0x675139]

This is a call to sleep, and so we delay for a time that limits the outgoing frame rate.

4c02dc: mov eax, ebx
4c02de: mov [0x4cc4f8], eax

restores the tick count and saves it to 0x4cc4f8

4c02e3: cmp dword [0x4f30cc], 0x5

This is the test we replaced in the original binary

4c02ea: pop edx
4c02eb: pop ecx
4c02ec: pop ebx
4c02ed: jmp dword 0x4327cf

Restore context, and return to the main loop at the jnz, which leaves the code back in the state we started.

Hacking up the binary some more

The magic number is the data value of "33". We can just change this directly in the new nitro.exe to modify the frame rate up or down.

So opening the nitro.exe in a hex editor and going to offset 784116 we can tune the value to verify we're understanding this correctly.

  • 0x21 => 33mS, or 30 fps
  • 0x28 => 40mS, or 25 fps
  • 0x64 => 100mS, or 10 fps
  • 0xFF => 256mS or ~4 fps

Annoyingly this doesn't appear to affect the -recordLoads debug values, but you can't have everything. For the 10fps values & below the lower frame rate is very visible.

Monday, 23 January 2017

I76 Parser Code

I'd been planning to do a little more to clean this up, but keep getting distracted, so for now here's some code that can do I76 game file translation.

This is a dump of the current I76 exporter as it's sitting on my hard disk. It compiles with Qt5.6 (or better), lzo and opencv on Linux. LZO is needed to unpack the compressed file resources in the Nitro Pack, and OpenCV is used to handle some of the higher bitdepth images that are created when level dumps are processed. Qt 5.6 is needed because of some of the utility bitmap functions I used in QImage. These dependencies should be replaceable for the motivated who only need a subset of the functionality.

It's been built up as I went along figuring out the formats, so it's very hacky in places, and could be much nicer if rewritten from scratch and knowing what's available in which files. However it works for me as is (tm).

Just qmake in the source directory, then make and the resulting binary is i76_converter. Run this with the name of a file to process on the command line.

Run it with the ZFS file from the game itself and it will unpack the individual resource files from the ZFS. If run with one of the resource files from the game (either from the ZFS or game disks) then it will report information and possibly create a more usable export, depending on the type of file (and how much of the debug is commented out).

Monday, 2 January 2017

I76: The Rest of the FSM Processor Op-Codes

The Rest of the Op-Codes

We can run through the rest of the op-codes in the way we described in the previous posting. Skipping over some of the detail the results are:

The Jumps

Op-code 8 (Case 7)

This was the operation we confirmed last time. There are two other jump instructions that our initial guesswork picked up: 9 & 10.

Op-code 10 (Case 9)

This is also an unconditional jump, and the assembler sequence is pretty much identical to op-code 8, but this also stops the running machine by returning from the execution function.

When we halt the execution we appear to go to the next available machine as described in the "Block 1" Machine list we pulled out of the mission file.
So for A01 then the first machine described starts at Location 2180, and is the machine we looked at last time. As soon as we break out of the loop running this machine then we go to run the next machine in the list (which starts at 2238), and following that the next (also at 2238 with a different stack initialiser) etc. The game runs around the list, and then returns to the first running machine and resumes execution. Basically the machines are co-operatively multitasking.

Op-code 9 (Case 8)

We thought this was a conditional jump, and this is confirmed in the code. Examining this code shows us that the value held in "edi+0x10a0" is a reference to the return value location used by actions.

Stack Pop

Op-code 7 (Case 6) is a Stack pop, with the parameter providing the depth to pop. It simply subtracts the argument from the SP. It only changes the pointer value though, and will not clean up the associated memory.

Modify SP

Op-code 6 (Case 5) is a stack modify - the stack pointer is adjusted by the argument value.

Negate

This is Op-code 14 (Case 13) and seems to be a complement of the result code. The argument isn't used.

Copy Operations

Op-code 4 and 5 are copy operations, from the "n'th" stack element to a temporary stack, referenced at "edi+0x10b4", and then the "edi+0x10b4" head is moved forward.
Actions consume values from this stack reference, and the value is reset after action calls, so I believe this is actually used as the parameter stack for the action executor.
The main difference between op-codes 4 & 5 is that 5 copes with signed parameters, which loads from below the stack "base". So let's cover the stack layout.

Stack Layout

As mentioned the running machine has a reference to the stack base, and a current stack pointer. In the initial state stack item #0 is set to the first bytecode location for this machine, and stack push moves the pointer upward, pop back downward.

The "below the stack" values are populated using a combination of the Block1 and Block2 tables. The values from block 2 are placed in memory, and the block 1 values are used to select from the list of block 2 entries and stored below the stack base.

As an example a01 has the following machine definition in Block 1:

Block1/1 Start Address: 2238 Initial stack Sz: 6 Stack: [1 7 8 9 10 14 ]

And the corresponding Block 2 in this file is:

Block2 table has 16 (0x10) Entries
0:1 1:2 2:3 3:4 4:5 5:6 6:7 7:3 8:4 9:5 10:6 11:7 12:8 13:9 14:0 15:0

Now when running this machine we see an SP base @ 0x05f62e84, and dumping the memory around this point gets:
Wine-dbg>x/16x 0x05f62e64
0x05f62e64: 04455355 05f6c574 05f6c58c 05f6c590
0x05f62e74: 05f6c594 05f6c598 05f6c5a8 00000000
0x05f62e84: 05f70ba8 00000000 00000000 00000000
0x05f62e94: 00000000 00000000 00000000 00000000

And dereferencing through these numbers we should see the values for indexes "1, 7, 8, 9, 10, 14 ", or looking those up in Block 2: "2,3,4,5,6,0"
Wine-dbg>x 0x05f6c574
00000002
Wine-dbg>x 0x05f6c58c
00000003
Wine-dbg>x 0x05f6c590
00000004
Wine-dbg>x 0x05f6c594
00000005
Wine-dbg>x 0x05f6c598
00000006
Wine-dbg>x 0x05f6c5a8
00000000

Now when we go into the next machine we have the definition
Block1/2 Start Address: 2238 Initial stack Sz: 6 Stack: [2 8 9 10 7 14 ]

And we have the stack in memory:
Wine-dbg>x/16x 0x05f63f2c
0x05f63f2c: 04455355 05f6c578 05f6c590 05f6c594
0x05f63f3c: 05f6c598 05f6c58c 05f6c5a8 00000000
0x05f63f4c: 05f70ba8 00000000 00000000 00000000
0x05f63f5c: 00000000 00000000 00000000 00000000

And de-referencing again we should see the index values of "2, 8, 9, 10, 7, 14" which works out as "3,4,5,6,3,0"
Wine-dbg>x 0x05f6c578
00000003
Wine-dbg>x 0x05f6c590
00000004
Wine-dbg>x 0x05f6c594
00000005
Wine-dbg>x 0x05f6c598
00000006
Wine-dbg>x 0x05f6c58c
00000003
Wine-dbg>x 0x05f6c5a8
00000000

Ta-da!

Obviously there's an off-by one thing, where the stack is placed one location in memory below where the values would be if things were tightly packed, but it's close enough that I won't worry (for now anyway).

When looking at these copy operations it's worth mentioning that operation 4 copies a pointer reference to the stack location (not a value), and 5 copies the value, but it expects this to be a pointer (and invariant, since there's no way to change it). The result of this is that the copy operations are pass by reference, not value. So pointers are being passed to the actions, rather than a mix of values and pointers. I don't currently know if any bytecode/actions try to exploit this.

Actions

The action executor is op-code 13 (Case 12). This code marshals parameters to the execution function - in this case 0x40d0d0. Inside this function we move forward the bytecode instruction pointer and also reset the "+0x10b4" (parameter) stack.

One important note is that the case numbers for actions don't (directly) match the values supplied in the bytecode. There appears to be a function at 0x0040bf40 called during the mission load which maps the action names to a jump table, and the values in the case reflect the mapped values in this table. So for example the op-code:arg pair of "13:52" represents "ACTION(true)", but this is actually mapped to and handled in case 25 of the execution function.
(This function was probably called match_prototype(), given that's in the error string it reports when it fails).

The internals of the action executor are a little involved, so any more detail will have to wait for a separate post.

Op-code 12: RST

This is a slightly unexpected/odd op-code; it appears to change the running bytecode pointer to the default location and reset the stack pointer & base, and can also break out of the running loop. It looks to be used to halt the execution of a running machine.

My immediate suspicions are that this is more often a Halt than a Reset. More investigation into the use cases is required for this one....

Summaries

Memory Layout

We have a stack based machine which has loaded the microcode into memory as 4 byte opcode & 4 byte argument pairs and is used to run the mission scripts.

The Stacks themselves are 4 bytes per entry.

The main stack has a base location, containing the machine start address, and grows upward. Initial parameter values are loaded below the stack base and can be referenced with a stack copy taking a negative argument. Stack Pop and Push is non destructive; the pointers are adjusted but any values in memory are not cleared.

There is an argument stack for passing data into actions, and a dedicated Result value which can be tested.

Each running machine tracks:

  • The Last Action Result (AR)
  • The Base of the loaded microcode (IB)
  • The Current Instruction Pointer (IP)
  • The Base of the Stack (SB)
  • The Current Stack Pointer (SP)
  • An Argument Stack for passing to Actions (AS)

The game keeps a list of machine details loaded from the mission file, and loops through this list, with each machine being run in turn. A machine relinquishes control using a suitable op-code, but the engine also has a harcoded upper limit on the number of instruction loops a machine can perform without being de-scheduled.

OpCode Summaries

What we have so far:
Opcode Name Description
1 PUSH Stack Push. Argument written at SP, Increment SP
2 Unused
3 Unused
4 COPY_S Copy Argument From Stack. Copy from "BP+argument" to AS
5 COPY_B Copy Argument from Parameter. Copy from "BP + argument-1" to AS. Negative Arguments Expected.
6 STACK_MOD Add Elements to stack. SP = SP + argument.
7 POP POP Elements from stack. SP = SP - argument
8 JMP Unconditional Jump. Set IP to argument
9 JZ Conditional Jump. If AR is 0 then set IP to argument
10 JMP_I Unconditional Jump and De-schedule. Set IP to argument, then halt the running instance
11 Unused
12 RST Reload IP and Reset SP
13 ACTION Perform Action given by argument. Consumes AS and Modifies AR
14 NEG Complement AR
15 Unused

A View from Tutorial Land

The first machine in the tutorial mission, A01, sets up the camera views for the introduction: Let's break that down some.

Initial Stack State

Initially the stack has the base of this machine bytecode, but is otherwise empty.
[0:0x05f709d8]
>>

The sub-stack region is
[-1: 0]
[-2: "idx 0"]

Where "idx 0" is a pointer to the value "1", from the initial value list.

Running the bytecode

The bytecode opens:
Block3/2180 PUSH: 0x0:[0]
Block3/2181 PUSH: 0x0:[0]
Block3/2182 PUSH: 0x1:[1]
Block3/2183 STACK_MOD: 0x1:[1]
Block3/2184 JMP: 2185

So after this we have the following stack:
[0: 0x05f709d8]
[1: 0]
[2: 0]
[3: 1]
[4: 0]
>>

Next up:
Block3/2185 ACTION(true)
Block3/2186 JZ: 2191

i.e. A call to true, followed by a JZ. The true will set the Action Result register to "1" and therefore the JZ will not trip and we will not take the jump.

The logic of this instruction sequence escapes me, since we just seem to be introducing an expensive NOP and there's no way to hit the JZ without having called true. However having seen the underling engine ASM in action I'm pretty confident this is what it does.

Next we make an argument copy:
Block3/2187 COPY_S: 4

And as a result have the argument stack:
[0: Stack #4]
>>

This is followed by an action call:
Block3/2188 ACTION(startTimer)

This will start the timer referenced by the argument stack (Timer 0?), and following this the argument stack is cleared.

And next
Block3/2189 ACTION(pushCam)
Block3/2190 JMP: 2193
which is an action to setup a Camera, and an unconditional JMP to 2193

Checking for Timers and Keypresses

Going to 2193 we see
Block3/2193 COPY_S: 4
Block3/2194 PUSH: 0xa:[10]
Block3/2195 COPY_S: 5

Which leaves us with the stack of:
[0: 0x05f709d8]
[1: 0]
[2: 0]
[3: 1]
[4: 0]
[5: 10]
>>

And the arg stack of
[0: Stack #4]
[1: Stack #5]
>>

Using this setup we check the time
Block3/2196 ACTION(timeGreater)

This will adjust the result register. Based on the parameter stack it looks like we're waiting for 10 seconds.

Next
Block3/2197 POP: 0x1:[1]

This is a bit of clean-up which removes the timer parameter from the top of the stack. It does not affect the result register and the stack is now:
[0: 0x05f709d8]
[1: 0]
[2: 0]
[3: 1]
[4: 0]
>>

For now following the case that the timer has not expired the result register will be zero, and the next instruction:
Block3/2198 JZ: 2202

Is triggered and goes to...
Block3/2202 ACTION(isKeypress)
Block3/2203 JZ: 2205

A keypress check. If we trip this (i.e. do not take the jump) then the next instruction is a "RST", so this is the "interrupt the view" handler.

Setting up the View

However if we go with a zero result, and have not pressed the key, then next up is a process of push/copies
Block3/2205 COPY_S: 1
Block3/2206 PUSH: 0x118:[280]
Block3/2207 COPY_S: 5
Block3/2208 PUSH: 0xff38:[-200]
Block3/2209 COPY_S: 6
Block3/2210 PUSH: 0xfa:[250]
Block3/2211 COPY_S: 7
Block3/2212 PUSH: 0xf830:[-2000]
Block3/2213 COPY_S: 8
Block3/2214 PUSH: 0x0:[0]
Block3/2215 COPY_S: 9
Block3/2216 PUSH: 0x38a4:[14500]
Block3/2217 COPY_S: 10

So after this lot we have a stack of:
[0: 0x05f709d8]
[1: 0]
[2: 0]
[3: 1]
[4: 0]
[5: 280]
[6: -200]
[7: 250]
[8: -2000]
[9: 0]
[10: 14500]
>>

And the arg stack of
[0: Stack #1]
[1: Stack #5]
[2: Stack #6]
[3: Stack #7]
[4: Stack #8]
[5: Stack #9]
[6: Stack #10]
>>


And it passes this to:
Block3/2218 ACTION(camObjDir)

Which looks a lot like a camera view setup. Basically a camera position and camera view as a pair of triplets.

Clean-up, and sleep

Next is clean-up. We drop the view parameters we pushed onto the stack in the last step:
Block3/2219 POP: 0x6:[6]

So the stack goes back to
[0: 0x05f709d8]
[1: 0]
[2: 0]
[3: 1]
[4: 0]
>>


And we have the "halting jump"
Block3/2220 JMP_I: 2193
This goes back to the timer check location, and pauses this machine while the game engine goes on to the next one.

What happens on timer expiry

The logic for this case is similar to the logic we have followed so far; When the initial timer expires (the check at 2198) the bytecode calls into the startTimer action to ensure the timer is running and then checks for the timer value against a 15 second limit or a keypress to halt. Otherwise it uses the camPosObj action to focus the Camera view onto the Piranha and loops again. The arguments are marshalled using the copy operations we've seen so far, and also there's an instance of COPY_B being used to grab the supplied argument to determine the target object ID.

However this post is getting a little long, and the detail isn't particularly informative over the instructions we've seen so far, so I'll skip it for now.

So, In Summary

This code sets an initial camera view on the level, and runs a timer. It then holds the view until either the timer expires, or a keypress comes in.

When the initial timer expires then the bytecode changes the camera view to focus on the Piranha, and holds this view until the new timer limit (at 15 seconds) or a keypress is seen.

When a keypress is seen, or all the timer expiry cases have been run through, this machine halts.

All of this ties up with what we initially see in terms of cutscene camera management: the view of the sign being held for ~10 seconds, then a timed switch to a view of the Piranha for ~5 seconds. The views either timeout, or we hit a key at which point we wind up in car.

To verify some of this we can copy over a01.msn to ADDONS, edit the parameters pushed to camObjDir, and change the initial views around. For example: Edit the a01.msn at 79268, and change 0x38 to 0x28 for the initial view to the left of the sign and 0x48 to move right. This is changing the argument pushed at Block3/2216. Or raise or lower the value at 79019 (0xa) to change the timeout on the initial view, etc.

So where does all that leave us?

The remaining big unknown it the behaviour of the actions, and I need to dig into the actions list more to try and get a basic understanding of the API available there and the parameters. But this is a lot of the information we need to edit missions and debug them, or at least start to build the tools to do so.