CS355 Syllabus & Progress
CS355 Syllabus & Progress
Material covered are displayed in
blue.
You can run the circuit examples by first saving the file in your own directory
and then use logic-sim to run the circuit.
To save an example from a webpage, do the following in Netscape:
Click File -> Save As
Complete the file name in the "Selection" box
(Make sure you specify the right directory name and file name !)
Click OK when the file name is right
The entire cs355 syllabus in a zip file:
click here
Search the syllabus:
- Logic elements and Boolean Algebra
- Switching Circuits...
how the CPU transports values from one place to another...
- Comment: you may have heard that a computer is one big switch...
you will soon find out why.....
- Filtering property of the AND gate:
click here
---- slides
- Collating property of the OR gates:
click here
---- slides
- Multiplexor circuit -
the many-to-one digital switching circuit:
click here
---- slides
- Defining your own components in EDiSim:
click here
---- slides
- Using multiplexors to switch registers to ALU's input:
click here
---- slides
- One-from-many digital selection circuit - the decoder:
click here
---- slides
- Using decoder to select a register for writing:
click here
---- slides
- Register → ALU → Register transfer:
click here
---- slides
- Enrichment material: Demultiplexor -
the one-to-many switching circuit:
click here
---- slides
- Arithmetic (and Logic) Circuits.... see what the ALU look like
- Sequential Circuits
- Finite State Machines
- Constructing FSA with digital circuitry:
click here
---- slides
- Side note:
- A computer is a FSA, with a huge number of states
- The number of states is equal to 2N
where N is the number of bits of memory in the main memory
and other storage (disks, tapes, CD-roms, etc)
- Bi-directional Transfer
- Review of the rules to connect (digital) circuit components:
click here
---- slides
- From tri-state-buffer to a bi-directional bus:
click here
---- slides
- A more efficient way to multiplex registers to the ALU input
(using tri-state-buffers):
click here
---- slides
- Memory Organization
- CPU Architecture: data path
- How the CPU communicate with (reads/writes) the memory -
the system bus and its bus protocol:
- How computer programs performs IO operations -
IO communication:
The remaining material discusses how to make a computer runs
faster...
Midterm covers
material upto this point
- Cache memory:
- Pipeline design
- The pipelined CPU approach:
click here
--- slides
- Instruction encoding
(to make things more concrete):
click here
--- slides
- The
Basic Pipelined CPU:
click here
--- slides
- How the Basic Pipelined CPU executes an ALU instruction:
- How the Basic Pipelined CPU executes a Memory access instruction:
- How the Basic Pipelined CPU executes a Branching instruction:
click here
--- slides
- Some problems you see in the Basic Pipelined CPU:
- Data Hazard: old value of registers can be used
in computation.
- Control Hazard: branch delay of three instructions
is unacceptable
- The Read after Write Data Hazard
in ALU instructions:
- The Read after Write Data Hazard phenomenom in ALU instructions:
click here
--- slides
- Solving the Read after Write Data Hazard in ALU instruction
with Data Forwarding hardware:
click here
--- slides#1
--- slides#2
- The Read after Write Data Hazard
in LOAD instructions:
- Recall how the load instruction is executed
by the basic pipelined CPU:
click here
--- slides
- The Read after Write Data Hazard phenomenom in LOAD instructions:
click here
--- slides
- Solving the Read after Write Data Hazard in LOAD instruction:
click here
--- slides
- Control Hazard:
- Recall that the basic pipeline will fetch (and execute)
three instructions before it actually branches:
click here
--- slides
- Reducing the branch delay:
click here
--- slides
- Executing unconditional branch instructions:
click here
--- slides
- Executing conditional branch instructions:
click here
--- slides
- How to program using a delay branch instruction:
click here
--- skipped (no existing CPU uses delayed branching any more...)
- SIMD Parallel Computers
- Overview of computer processors:
click here
--- slides
- SIMD - Single Instruction (stream) and Multiple Data (stream) computers:
- GPU-programming using the CUDA programming language:
- The GPU-programming environment:
click here
--- slides
- The CUDA architecture:
- Intro to CUDA C-programming:
- Sharing variables between CPU and GPU functions using
CUDA unified memory:
A simple parallel (vector addition) CUDA program:
click here
--- slides
Handout CUDA project
Matrix multiplication algorithm in CUDA C:
click here
--- slides
---
(2 dim grid) slides
Execution configuration and performance:
click here
--- slides
A problem with parallel execution: simultaneous updates
to same variable --
click here
--- slides
Another problem with parallel execution: when threads
must coorperate ---
click here
--- slides
A parallel sort algorithm (odd-even sort) ---
__shared__ variables:
--- slides 1
--- slides 2
Helpful CUDA material:
- CUDA C programming guide from NVidia:
click here
- An Even Easier Introduction to CUDA:
click here
- Intro tutorial:
click here
- Tutorial on CUDA unified memory (starting CUDA 6):
MIMD (general multi-processors) computers
- The 2 different MIMD computers (shared memory MIMD and
message-passing MIMD):
click here
--- slides
- CPU-to-Memory interconnection networks of
Shared Memory MIMD computers:
Programming Shared Memory MIMD using Posix threads:
- Introduction to MIMD (parallel) programming:
click here
- Introduction to threads:
click here
- Creating posix-threads:
click here
- Passing a parameter to a thread:
click here
- Waiting for a thread to terminate:
click here
- A commonly used parallel program structure (find min):
- Accessing shared memory among threads (compute Pi):
- Synchrination between threads:
click here
- The mutex-lock synchronization primitive:
click here
- Computing Pi (with numeric integration):
click here
- The read/write-lock synchronization primitive:
click here
- The reader and write problem
(Semaphore):
click here --- Skipped
Programming Shared Memory MIMD using the OpenMP API:
- Intro to the OpenMP API:
click here
- Shared/non-shared variables in OpenMP:
click here
- The OpenMP support functions:
click here
- Example OpenMP prog: find min in array --- (no synchronization):
click here
- The OpenMP synchronization structure (pragma critical) and
example (compute Pi):
click here
- Other (advanced) OpenMP parallel structures (for loop):
click here
Message-Passing MIMD computers:
- Interconnection networks for message-passing MIMD computers:
click here
- Programming Message Passing MIMD using MPI:
- Intro to MPI:
click here
- Blocking send and receive operations:
click here
- Advanced sending and receiving operations:
click here
- NON-Blocking send and receive operations:
click here
- Group communication
- broadcast, scatter, reduce and gather operations:
click here
The END...