This project requires you to implement the simulator for an out-of-order processor using register renaming, similar to that of Variation 2 in the lecture notes. The ISA of this processor is identical to the processor for Project 1.
The datapath of the simulated processor, excluding the physical and architectural register files, is as follows:
Note again that the above diagram does not show other components like the architectural and physical register files, branch instruction stack, rename table etc. and their relevant connections, as well as forwarding paths to the inputs of the function units (for supporting back-to-back execution).
The specific details of the datapath are as follows:
The instruction fetch stage and the decoding/renaming stage (D/RN) each have a delay of one cycle.
The issue queue (IQ) has a maximum of 8 entries. Register operands are read at the time of issue. It takes one cycle for an IQ entry to wake up, be granted a FU and move to the required function unit.
The LSQ has a maximum capacity of 6 entries.
The physical register file has 24 registers. Each physical register has an extension that holds any flag values that were generated along with the contents of that register. When the simulation starts
physical registers are allocated. Physical registers are always allocated in increasing order of their addresses. Assume that a physical register that is freed up in a cycle can be reallocated in the same cycle.
The ROB has a capacity of 12 entries.
The function unit for the multiply operations is pipelined into three stages (each with a delay of a single cycle).
The integer function unit is pipelined into two stages (each with a delay of single cycle). All integer operations (excluding multiplication), logical operations and address calculations for loads and stores are performed in the integer function unit.
The branch function unit has its own adder to calculate a target address and has a single stage (one cycle delay). The issue of a branch instruction checkpoints the allocation list of physical registers and the rename table to speed up recovery on a branch misprediction.
Memory operations, once initiated, take 3 cycles to complete and memory and is implemented by a memory function unit as shown. Addresses for the loads and stores are calculated in the integer function unit and written directly to the associated LSQ entry. The write of a calculated memory address to the LSQ takes one cycle, after which the LSQ can begin memory operation as long other conditions for starting memory operations are valid. Memory operations cannot be overlapped.
Instructions can be issued from the issue queue to multiple function units simultaneously and the physical register file has sufficient number of ports to allow all register operands for simultaneously-issued instructions to be read out in parallel.
The processor supports back-to-back execution by implementing wakeup tag broadcasts (from all FUs, including the memory FU) one cycle before the availability of the corresponding result, which can be forwarded to an instruction as it issues.
The processor supports speculative execution. A speculation depth of 2 is supported, allowing for at most two unresolved branch instructions at any time. As soon as the branch instruction discovers a mispredicted branch, it takes one cycle to flush all instructions and resume execution along the correct path. A JUMP instruction requires all following instructions that entered the pipeline to be flushed.
Some other assumptions to be made for designing the simulator are as follows:
A sufficient number of forwarding buses exist to permit simultaneous forwarding from all function units that can forward a
If two or more instructions become eligible for issue to the same FU in the same cycle, the instruction with the lowest associated PC value (that is, address) is chosen for issue. (This tie breaking solution is complex to implement in real designs, but we need this to facilitate grading!)
A physical register can be freed up in the same cycle it is written to (and reallocated in the same cycle), as long as all conditions for freeing it up are
An IQ entry is freed up at the end of the cycle in which the instruction it held was
Loads cannot bypass earlier
At the time of dispatching, IQ, ROB and LSQ entries can be set up
Instruction commitment takes one
Updates to the rename table take place at the beginning of the cycle in which a physical register is being updated, so the dispatching logic is aware of the validity of the physical
A few things to note:
You need to think of an appropriate way of determining when a physical register can be freed up. For the purpose of simulation, even complex things that are difficult to implement in reality can be implemented in your simulator!
The simulation of a HALT is easy with a ROB in place: when D/RF encounters a HALT, it stalls the D/RF stage and adds ONLY a ROB entry for the HALT at the end of the ROB. An IQ entry is not needed. When the ROB entry for the HALT reaches the head of the ROB, simply return control to the user prompt!
On stalls in the decode/rename stage due to the lack of any of the resources needed for dispatch, a similar approach is needed.
NOTES
The “simulate” command have to be modified if needed to support simulation for N cycles (“simulate N”) and then enable the display of the simulated processor/memory status using “Display”. A subsequent “simulate M”, say, should be able to continue simulation for the next M cycles from the state where it left off at the end of the N-th cycle and then have the ability to display the state etc.
I SA FROM PROJ 1
Registers and Memory:
Assume that there are 16 architectural registers, R0 through R15. The code to be simulated is stored in a text file with one ASCII string representing an instruction (in the symbolic form, such as ADD R1, R4, R6 or ADDL R2, R1, #10) in each line of the file. Memory for data is viewed as a linear array of integer values (4 Bytes wide). Data memory ranges from 0 through 3999 and memory addresses correspond to a Byte address that begins the first Byte of the 4-Byte group that makes up a 4 Byte data item. Instructions are also 4 Bytes wide, but stored as strings in a file, as noted above for this project. Registers are also 4 Bytes wide.
Instruction Set:
The instructions supported are:
Register-to-register instructions: ADD, ADDL (add register with a literal), SUB, SUBL (subtract literal from a register), MOVC (move constant or literal value into a register), AND, OR, EX-OR and MUL. As stated earlier, you can assume that the result of multiplying two registers will fit into a single register.
MOVC <register> <literal>, moves literal value into specified register. The MOVC uses the EX1 and EX2 stages to add 0 to the literal and updates the destination register from the WB
stage.
Memory instructions: LOAD, LDR, STORE, STR: LOAD and STORE both include a literal value whose content is added to a register to compute the memory address, while LDR and STR instructions calculate a memory address by adding the contents of two registers. The semantics of LDR and STR are as follows, using the notation described in the class:
LDR <dest>, <src1>, <src2>: <dest> ← Mem[<src1> +
<src2>]
STR <src1>, <src2>, <src3>: Mem[<src2> + <src3>] ←
<src1>
Control flow instructions: BZ, BNZ, JUMP and HALT. Instructions following a BZ, BNZ and JUMP instruction in the pipeline should be flushed on a taken branch. The zero flag (Z) is set only by arithmetic instructions when they are in the WB stage. The dependency that the BZ or BNZ instruction has with the immediately prior arithmetic instruction (ADD, MUL, SUB) that sets the Z flag has to be implemented
The semantics of BZ, BNZ, JUMP and HALT instructions are as follows:
The BZ <literal> instruction cause in a control transfer to the address specified using PC-relative addressing if the Z flag is set. The decision to take a branch is made in the Integer ALU stage. BNZ <literal> is similar but causes branching if the Z flag is not set. BZ and BNZ target 4-Byte boundaries (see example below).
JUMP specifies a register and a literal and transfers control to the address obtained by adding the contents of the register to the literal. The decision to take a branch is made in the Integer ALU stage. JUMP also targets a 4-Byte
The HALT instruction stops instruction fetching as soon as it is decoded but allows all prior instructions in the pipeline to complete before returning to the command line for the simulator.
As soon as a target memory address is calculated, instruction following a taken branch or a JUMP are squashed immediately, in the same cycle as the address calculation is completed in the EX2 stage.
The instruction memory starts at Byte address 4000. You need to handle target addresses of JUMP correctly - what these instructions compute is a memory address. However, all your instructions are stored as ASCII strings, one instruction per line in a SINGLE text file and there is no concept of an instruction memory that can be directly accessed using a computed address. To get the instruction at the target of a BZ, BNZ or JUMP, a fixed mapping is defined between an instruction address and a line number in the text file containing ALL instructions:
Physical Line 1 (the very first line) in the text file contains a 4 Byte instruction that is addressed with the Byte address 4000 and occupies Bytes 4000, 4001, 4002,
Physical Line 2 in the text file contains a 4 Byte instruction that is addressed with the Byte address 4004 and occupies Bytes 4004, 4005, 4006,
Physical Line 3 in the text file contains a 4 Byte instruction that is addressed with the Byte address 4008 and occupies Bytes 4008, 4009, 4010, 4011
3
The targets of all control flow instructions thus have to target a 4_byte boundary. So when you simulate a JUMP instruction whose computed target has the address 4012, you are jumping to the instruction at physical line 4 in the text file for the code to be simulated. Register contents and literals used for computing the target of a branch should therefore target one of the lines in the text file. Your text input file should also be designed to have instructions at the target to start on the appropriate line in the text file.
Instructions are stored in the following format in the text file, one per line:
<OPCODE characters><comma><argument1><comma><argument2>
<comma><argument3>
Where arguments are registers or literals. Registers are specified using two or three characters (for example, R5 or R14). Literal operands, if any, appear at the end, preceded by an optional negative sign. This format is different from the notation used elsewhere (and in this problem description), as it uses commas to separate arguments.
SimulatorCommands:
Your simulator is invoked by specifying the name of the executable file for the simulator and the name of the ASCII file that contains the code to be simulated. Your simulator should have a command interface that allows users to execute the following commands:
Initialize: Initializes the simulator state, sets the PC of the fetch stage to point to the first instruction in the ASCII code file, which is assumed to be at address 4000. Each instruction takes 4 bytes of space, so the next instruction is at address 4004, as memory words are 4 Bytes long, just like the integer data items.
Simulate <n>: simulates the number of cycles specified as <n> and waits. Simulation can stop earlier if a HALT instruction is encountered and when the HALT instruction is in the WB stage.
Display: Displays the contents of each stage in the pipeline, all registers, the flag register and
the contents of the first 100 memory locations containing data, starting with address 0.
A skeleton code in C for this simulator that simulates only two instructions is included in a separate file. This simulator has most of the major data structures that you will need to use and also implements the simulated memory for instructions and data. You can extend this code by modifying it appropriately for this assignment.
For the simulated pipeline, add the complete set of necessary forwarding logic to forward register values. Assume that forwarding can take place only from the output of EX2 and MEM2 stages and that any dependency stall causes the dependent instruction to wait in the D/RF stage, however, a stalled instruction can pick up a forwarded value as it is entering the EX1 stage.
DescriptionIn this final assignment, the students will demonstrate their ability to apply two ma
Path finding involves finding a path from A to B. Typically we want the path to have certain properties,such as being the shortest or to avoid going t
Develop a program to emulate a purchase transaction at a retail store. Thisprogram will have two classes, a LineItem class and a Transaction class. Th
1 Project 1 Introduction - the SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of
1 Project 2 Introduction - the SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of