CS355 Syllabus & Progress

Material covered are displayed in blue.

You can run the circuit examples by first saving the file in your own directory and then use logic-sim to run the circuit. To save an example from a webpage, do the following in Netscape:

     Click File -> Save As
     Complete the file name in the "Selection" box
       (Make sure you specify the right directory name and file name !)
     Click OK when the file name is right

The entire cs355 syllabus in a zip file: click here

Search the syllabus:

Logic elements and Boolean Algebra

Types of circuits: click here ---- slides
Intro to Digital circuits: click here ---- slides
Elementary digital circuits: click here ---- slides
The Emory Digital Simulator (EDiSim) used in this course is based on the simulator by Richard Reid of Michigan State Univ):
- The EDiSim website: http://cs.emory.edu/~edisim/
- How to use EDiSim: ---- slides
Intro to combinatorial circuits: click here ---- slides
Combinatorial circuit design: click here ---- slides
Another example of circuit design: Add 1 to a two bit binary numbers - click here ---- slides
Yet another example of circuit design: Adding two 2-bits binary numbers - click here ---- slides
Assign project 1 to re-inforce circuit design:
- You will need to use the basic components of the logic-simimular: click here
- And do the project assignment 1: click here...

Switching Circuits... how the CPU transports values from one place to another...

Comment: you may have heard that a computer is one big switch... you will soon find out why.....
Filtering property of the AND gate: click here ---- slides
Collating property of the OR gates: click here ---- slides
Multiplexor circuit - the many-to-one digital switching circuit: click here ---- slides
Defining your own components in EDiSim: click here ---- slides
Using multiplexors to switch registers to ALU's input: click here ---- slides
One-from-many digital selection circuit - the decoder: click here ---- slides
Using decoder to select a register for writing: click here ---- slides
Register → ALU → Register transfer: click here ---- slides
Enrichment material: Demultiplexor - the one-to-many switching circuit: click here ---- slides

Arithmetic (and Logic) Circuits.... see what the ALU look like

Intro to arithmetic circuits (circuits that perform computations): click here ---- slides
The "ripple-carry" adder using "full" adder circuits: click here ---- slides
Array notation in EDiSim: click here ---- slides
The subtract circuit: click here ---- slides
The multiply circuit: click here ---- slides
Assign project 2 to re-inforce arithmetic circuit: click here
Postscript - what about more complex functions like sin(x): click here ---- slides
Finally, I can now show you what the ALU look like: click here ---- slides ---- a simplified ALU slides
Shifter: click here ---- slides

Sequential Circuits

Introduction: click here ---- slides
The SR-latch: click here ---- slides
1-bit-memory and register: click here ---- slides
Circuit timing and D-flipflops: click here ---- slides

Finite State Machines

Constructing FSA with digital circuitry: click here ---- slides
Side note:
- A computer is a FSA, with a huge number of states
- The number of states is equal to 2^N where N is the number of bits of memory in the main memory and other storage (disks, tapes, CD-roms, etc)

Bi-directional Transfer

Review of the rules to connect (digital) circuit components: click here ---- slides
From tri-state-buffer to a bi-directional bus: click here ---- slides
A more efficient way to multiplex registers to the ALU input (using tri-state-buffers): click here ---- slides

Memory Organization

Structure of a computer memory made with D-latches/Dff's: click here ---- slides
Building larger memories: click here ---- slides

CPU Architecture: data path

Introduction: click here ---- slides
The various data forwarding paths within the simple datapath: click here ---- slides
Controlling the datapath: click here ---- slides
Controlling the ordering of events/actions in the datapath: click here ---- slides 1 ---- slides 2
Demo: /home/cs355001/demo/datapath/cs355-demo-dp1
Demo: /home/cs355001/demo/datapath/cs355-demo-dp2

How the CPU communicate with (reads/writes) the memory - the system bus and its bus protocol:

Introduction: click here --- slides
Closer look at the CPU: click here --- slides
Computer buses: click here --- slides
Asynchronous Buses: click here --- slides
Synchronous Buses: click here --- slides
Bus Arbitration: click here --- slides
M68000 data sheet (for background info only - not part of syllabus): click here

How computer programs performs IO operations - IO communication:

Introduction: click here --- slides
Addressing IO devices: click here --- slides
How to transfer data between I/O device and memory:
- IO Data Transfer using the CPU ("programmed IO"): click here --- slides
- IO Data transfer using a DMA: click here --- slides
IO Synchronization when we use DMA:
- Intro: process (= running program) and process state click here --- slides
- Part 1: releasing the CPU: click here --- slides
- Part 2: reclaiming the CPU: click here --- slides
Other uses of interrupt: multi-programming - click here --- slides
Identifying the interrupting device: vector interrupt - click here --- slides

The remaining material discusses how to make a computer runs faster...

Midterm covers material upto this point

Cache memory:

Intro: click here --- slides
Motivation for using caches: click here --- slides
Type of cache architectures:
1. Associative: very flexible and very expensive to make.
2. Direct-Mapped: cheap, but absolutely no flexibility.
3. Set-Associative: economical and some flexibility.
The Associative Cache: click here --- slides 1 (intro) --- slides 2 (circuit)
The Direct-Mapped Cache: click here --- slides 1 (intro) --- slides 2 (circuit)
The Set-Associative Cache: Hybrid of Associative & Direct-Mapped: click here --- slides

Pipeline design

The pipelined CPU approach: click here --- slides
Instruction encoding (to make things more concrete): click here --- slides
The Basic Pipelined CPU: click here --- slides
How the Basic Pipelined CPU executes an ALU instruction:
- Two register operands: click here --- slides
- 1 register and 1 constant operand: click here --- slides
- Notice the speed-up: click here --- slides
How the Basic Pipelined CPU executes a Memory access instruction:
- Load: click here --- slides
- Store: click here --- slides
How the Basic Pipelined CPU executes a Branching instruction: click here --- slides
Some problems you see in the Basic Pipelined CPU:
1. Data Hazard: old value of registers can be used in computation.
2. Control Hazard: branch delay of three instructions is unacceptable
The Read after Write Data Hazard in ALU instructions:
- The Read after Write Data Hazard phenomenom in ALU instructions: click here --- slides
- Solving the Read after Write Data Hazard in ALU instruction with Data Forwarding hardware: click here --- slides#1 --- slides#2
The Read after Write Data Hazard in LOAD instructions:
- Recall how the load instruction is executed by the basic pipelined CPU: click here --- slides
- The Read after Write Data Hazard phenomenom in LOAD instructions: click here --- slides
- Solving the Read after Write Data Hazard in LOAD instruction: click here --- slides
Control Hazard:
- Recall that the basic pipeline will fetch (and execute) three instructions before it actually branches: click here --- slides
- Reducing the branch delay: click here --- slides
- Executing unconditional branch instructions: click here --- slides
- Executing conditional branch instructions: click here --- slides
- How to program using a delay branch instruction: click here --- skipped (no existing CPU uses delayed branching any more...)

SIMD Parallel Computers

Overview of computer processors: click here --- slides
SIMD - Single Instruction (stream) and Multiple Data (stream) computers:
- Intro: click here --- slides
- The SIMD Vector Processor: click here --- slides
- The SIMD Graphics Processor (GPU): click here --- slides

GPU-programming using the CUDA programming language:

The GPU-programming environment: click here --- slides

The CUDA architecture:

Intro (general GPU architecture): click here --- slides
The NVidia GPU architecture: click here --- slides
Thread organization (and how thread, thread block, and grid map to the GPU architecture): click here --- slides

Intro to CUDA C-programming:

My "Hello World" CUDA program: click here --- slides
The different kinds of functions in a CUDA program: click here --- slides
Synchronous and Asynchronous kernel calls: click here --- slides

Execution configuration - how to specify threads, blocks and grids dimensions: click here --- slides

More complicated execution configurations - a 2 Dim shape configuration: click here --- slides

Error handling in CUDA kernel launch: click here --- slides

Sharing variables between CPU and GPU functions using CUDA unified memory:

CUDA Unified Memory: click here --- slides
Sharing global and allocated variables (using Unified Memory): click here --- globals --- C:malloc --- C:pointer arith --- C:[ ] op --- allocated

A simple parallel (vector addition) CUDA program: click here --- slides
Handout CUDA project

Matrix multiplication algorithm in CUDA C: click here --- slides --- (2 dim grid) slides

Execution configuration and performance: click here --- slides

A problem with parallel execution: simultaneous updates to same variable -- click here --- slides
Another problem with parallel execution: when threads must coorperate --- click here --- slides
A parallel sort algorithm (odd-even sort) --- __shared__ variables: --- slides 1 --- slides 2

Helpful CUDA material:

CUDA C programming guide from NVidia: click here
An Even Easier Introduction to CUDA: click here
Intro tutorial: click here
Tutorial on CUDA unified memory (starting CUDA 6):

Tutorial 1: click here
Tutorial 2: click here
MIMD (general multi-processors) computers
- The 2 different MIMD computers (shared memory MIMD and message-passing MIMD): click here --- slides
- CPU-to-Memory interconnection networks of Shared Memory MIMD computers:
  - Cross Bar Switch: click here --- slides
  - The Delta Multi-stage Interconnection Network (MIN): click here --- slides
  - The Omega Multi-stage Interconnection Network: click here --- slides
Programming Shared Memory MIMD using Posix threads:
- Introduction to MIMD (parallel) programming: click here
- Introduction to threads: click here
- Creating posix-threads: click here
- Passing a parameter to a thread: click here
- Waiting for a thread to terminate: click here
- A commonly used parallel program structure (find min):
  - Intro: click here
  - Finding the min in an array: click here
  - Another way to distribute the work load: click here
- Accessing shared memory among threads (compute Pi):
  - Synchrination between threads: click here
  - The mutex-lock synchronization primitive: click here
  - Computing Pi (with numeric integration): click here
  - The read/write-lock synchronization primitive: click here
- The reader and write problem (Semaphore): click here --- Skipped
Programming Shared Memory MIMD using the OpenMP API:
- Intro to the OpenMP API: click here
- Shared/non-shared variables in OpenMP: click here
- The OpenMP support functions: click here
- Example OpenMP prog: find min in array --- (no synchronization): click here
- The OpenMP synchronization structure (pragma critical) and example (compute Pi): click here
- Other (advanced) OpenMP parallel structures (for loop): click here
Message-Passing MIMD computers:
- Interconnection networks for message-passing MIMD computers: click here
- Programming Message Passing MIMD using MPI:
  - Intro to MPI: click here
  - Blocking send and receive operations: click here
    - Advanced sending and receiving operations: click here
  - NON-Blocking send and receive operations: click here
  - Group communication - broadcast, scatter, reduce and gather operations: click here

The END...