CS355 Syllabus & Progress

Material covered are displayed in blue.

You can run the circuit examples by first saving the file in your own directory and then use logic-sim to run the circuit. To save an example from a webpage, do the following in Netscape:

     Click File -> Save As
     Complete the file name in the "Selection" box
       (Make sure you specify the right directory name and file name !)
     Click OK when the file name is right

Logic elements and Boolean Algebra

Types of circuits: click here
Intro to Digital circuits: click here
Elementary digital circuits: click here
The Logic Simulator used in this course (by Richard Reid of Michigan State Univ):

How to setup your account to use logic-sim: click here
How to define a digital circuit for logic-sim: click here
Grid coordinate system used in logic-sim: click here

Intro to combinatorial circuits: click here
Boolean Algebra and digital circuits: click here
Combinatorial circuit design: click here
Assign project 1 to re-inforce circuit design:

You will need to learn about the logic-simimular from this handout: click here
And do the project assignment 1: click here...

Example circuit design: Add 1 to a two bit binary numbers - click here
Example circuit design: Adding two 2-bits binary numbers - click here
Switching Circuits... how is the CPU wired...
- You may have heard that a computer is one big switch... you will soon find out why.....
- Many-to-one digital switching circuit - multiplexor: click here
- Example use of a (many-to-one) multiplexor: switching registers to ALU - click here
- One-to-many digital switching circuit - decoder/demultiplexor: click here
- Example use of a decoder: select a register for writing: click here
- Register → ALU → Register transfer: click here
- Little detour, the real one-to-many switching circuit: click here
Arithmetic (and Logic) Circuits.... see what the ALU look like
- Arithmetic circuits: click here
- The adder circuit: click here
- The multiply circuit: click here
  Assign project 2 to re-inforce arithmetic circuit: click here
- Shifter: click here
- Finally, I can now show you what the ALU look like: click here
Sequential Circuits
- Introduction: click here
- The SR-latch: click here
- 1-bit-memory and register: click here
- Circuit timing and D-flipflops: click here
Finite State Machines (class notes, not in book)
- Constructing FSA with digital circuitry: click here
- Side note:
  - A computer is a FSA, with a huge number of states
  - The number of states is equal to 2^N where N is the number of bits of memory in the main memory and other storage (disks, tapes, CD-roms, etc)
Bi-directional Transfer (class notes, not in book)
- Introduction: click here
- From tri-state-buffer to a bi-directional bus: click here
Memory Organization
- Structure of a computer memory made with Dff's: click here
- Building larger memories: click here
CPU Micro Architecture
- Introduction: click here
- The various data forwarding paths within the simple datapath: click here
- Controlling the datapath: click here
- Sequencing control: click here
- Examples of excution of micro-instructions: click here
  Demo: cs355-demo-dp1
  Demo: cs355-demo-dp2
Micro-programming:
- Introduction to micro-programming: click here
- Micro-program flow control: click here
- Register numbering: click here
- The complete micro-program: click here
- Step 1 of the instruction execution cyle (fetch instruction): click here
- Steps 2, 3 and 4 of the instruction execution cyle (decode to execute ): click here
- Demo: run the complete computer with command "cs355-demo-computer"
- Final notes: click here
Is you understand everything so far, congrats... You now know exactly how the computer works when it executes a program. The only thing that you don't know about the computer is how the CPU and memory (and IO devices) communicate with each other. We will fix that next....
The system bus:
- Introduction: click here
- Closer look at the CPU: click here
- Computer buses: click here
- Synchronous Buses: click here
- Asynchronous Buses: click here
- Bus Arbitration: click here
IO Communication
- Introduction: click here
- Addressing IO device: click here
- IO Data Transfer with the CPU: click here
- IO Data transfer with a DMA: click here
- IO Synchronization with the DMA:
  - Part 1: releasing the CPU: click here
  - Part 2: reclaiming the CPU: click here
- Other uses of interrupt: multi-programming - click here
- Identifying the interrupting device: vector interrupt - click here
You should now know how the entire computer works. The only things left to discuss are the bells and whistles that make the computer runs faster (and safer)...
Pipeline design
- The RISC phylosophy
- Modification to simplify pipeline design:
  - Fixed size instruction
  - Limited number of instructions that access memory. Only the following instructions will access memory:
    - ld: load (read data from memory)
    - st: store (write data to memory)
  - Reduced number of addressing modes, supports on immediate, direct and indirect with one index. Multiple indices can't be used.
- Computer instructions can be broadly categorized as follows:
  - Memory instructions: ld and st (see above)
    - Distinguishing feature: they access the memory
  - Branching instructions: bra, bne, call, ret, etc., etc.
    - Distinguishing feature: they change the PC so that the next instruction fetch is not the one at the "next memory location".
  - ALU instructions: add, sub, mult, div, and, or, etc.
    - Distinguishing feature: they can complete execution without accessing memory and the next instruction is located right after the currect one.
- The Basic Pipelined CPU: click here
- How the Basic Pipelined CPU executes an ALU instruction:
  - Two register operands: click here
  - 1 register and 1 constant operand: click here
- Notice the speed-up: click here
- How the Basic Pipelined CPU executes a Memory access instruction:
  - Load: click here
  - Store: click here
- How the Basic Pipelined CPU executes a Branching instruction: click here
- Some problems you see in the Basic Pipelined CPU:
  1. Data Hazard: old value of registers can be used in computation.
  2. Control Hazard: branch delay of three instructions is unacceptable
- Read after Write Data Hazard in ALU instructions:
  - The Read after Write Data Hazard phenomenom in ALU instructions: click here
  - Solving the Read after Write Data Hazard in ALU instruction with Data Forwarding hardware: click here
- The Read after Write Data Hazard in LOAD instructions:
  - Recall how the load instruction is executed: click here
  - The Read after Write Data Hazard phenomenom in LOAD instructions: click here
  - Solving the Read after Write Data Hazard in LOAD instruction: click here
- Control Hazard:
  - Recall that the basic pipeline will fetch (and execute) three instructions before it actually branches: click here
  - Reducing the branch delay: click here
  - Executing unconditional branch instructions: click here
    Demo: /home/cs355000/bin/sparc-bra.s
  - Executing conditional branch instructions: click here
  - Recall from CS255 techniques to remove the dummy NOP instructions: click here
  - An abbreviated version for CS355: click here
Cache memory:
- Cache: a very fast memory module placed near the CPU: click here.
- The cache memory can hold a part of the content of the memory.
- Cache can speed up program execution significantly due to program locality.
- Example speedup: click here.
- Type of caches:
  1. Associative: very flexible and very expensive to make.
  2. Direct-Mapped: cheap, but absolutely no flexibility.
  3. Set-Associative: economical and some flexibility.
- The Associative Cache:
  - The memory is divided up into "words".
  - each "word" is 32 bits.
  - The address of a "word" is called a "block number".
  - Each entry or "slot" can cache a word (32 bits).
  - Use the "block number" to identify entries cached: click here for an example.
  - When CPU sends out the read instruction, the cache will use the address values sent out by the CPU to look the block number up in the cache. If the block number is found, the cache returns the value to the CPU, otherwise, the cache will start a memory read cycle to get the data for the CPU.
  - One problem: a serial search will take too long.
  - Architecture of the Associative Cache: click here.
  - Strength: a slot can cache any word from any memory location (flexible).
  - Weakness: uses one compare circuit per slot. This make the Associative Cache very expensive....
- The Direct-Mapped Cache:
  - The memory is also divided into "blocks"
  - But each "block" is the same size as the entire direct mapped cache.
  - Each entry or "slot" in cache can only cache a word (32 bits) at the same location in a page: click here.
  - Click here. for an example.
  - Architecture of the Direct-Mapped Cache: click here.
  - Strength: lower cost. Main cost is the multiplexor circuits, which is relatively cheaper to make.
  - Weakness: a slot in the cache can only cache a set of specific word from memory (not flexible).
- The Set-Associative Cache: combination of Associative & Direct-Mapped
  - Consists of K Direct-Mapped caches.
  - Each entry or "slot" in cache can cache K words at the same location in K different pages: click here.
  - Click here. for an example.
  - Architecture of the Set-Associative Cache: click here.
  - Strength: lower cost. Main cost is the multiplexor circuits, which is relatively cheaper to make.
  - Weakness: a slot in the cache can only cache a set of specific word from memory (not flexible).
The Virtual Memory technique
- Introduction: click here
- Paging: click here
- Page fault: click here
- Page Replacement Policies: click here
  - FIFO: click here
  - LRU: click here
  - Second Chance: click here
Parallel Computers
- Introduction: click here
- SIMD - Single Instruction (stream) and Multiple Data (stream) computers: click here
- Interconnecting MIMD computers: click here
- Shared-Memory MIMD Interconnection networks:
  - Cross Bar Switch: click here
  - The Delta Multi-stage Interconnection Network (MIN): click here
  - The Omega Multi-stage Interconnection Network: click here
- Programming Shared Memory MIMD - posix threads:
  - Introduction to parallel programming: click here
  - Introduction to threads: click here
  - Creating posix-threads: click here
  - A commonly used parallel program structure (find min): click here
  - Accessing shared memory among threads (compute Pi): click here
  - The reader and write problem (Semaphore, advanced material - skip): click here
- Programming Shared Memory MIMD - OpenMP API:
  - The OpenMP API: click here
- Message-Passing MIMD Interconnection networks: click here
- Algorithms on Message Passing MIMD: click here
- Programming Message Passing MIMD - MPI:
  - Intro to MPI: click here
  - Blocking send and receive operations: click here
  - NON-Blocking send and receive operations: click here
  - Group communication - broadcast, scatter, reduce and gather operations: click here
The end.... hope that you have learned a lot in this course... don't stop here, keep learning... knowledge is power...