CS355 Sylabus

The Direct Memory Access device

The DMA is a simple special purpose (i.e., a specialized - can only do one specific task) device.
It can be manufactured very inexpensively.
It's sole purpose is to transfer data between the memory and an IO device.
You have to remember that the DMA is a helper processor for the CPU. Thus: the CPU must be able to tell the DMA what data to transfer. The only way that the CPU can do that is when the DMA is accessible on the system bus
It is therefore of no surprise that the DMA is connected onto the system bus....
The internal structure of a DMA is as follows:

The structure resembles somewhat the structure of an IO device (with the command, data and status registers), but it has two additional registers that allow the DMA to perform the data transfer operation.
The registers in the DMA are used as follows:
- Status register: readable by the CPU to determine the status of the DMA device (idle, busy, etc)
- Command register: writable by the CPU to issue a command to the DMA
- Data register: readable and writable. It is the buffering place for data that is being transfered between the memory and the IO device.
- Address register: contains the starting location of memory where from or where to the data will be transfered.
  The Address register must be programmed by the CPU before issueing a "start" command to the DMA.
- Count register: contains the number of bytes that need to be transfered.
  The information in the address and the count register combined will specify exactly what information need to be transfered.
The IO communication using DMA will proceed as follows:
1. The CPU first programs the DMA's address register and count register
2. The CPU then write a read or a write command into the DMA's command register
3. Let's assume the CPU wants to perform an IO write operation that transfers data from the memory to the IO device.... (for an IO read operation (reading data from an IO device), the process is similar, except the DMA will do write operations to memory).
4. At this moment, the DMA become active and will repeatedly execute the following steps:
  - Acquire (through the bus arbiter) the right to use the system bus
  - Send out a read request for address given by the value in the address register
  - Go through the system bus cycle (see click here) to obtain a word from memory into DMA's data register.
  - Write the data from the data register to the IO device
  - Increment the address register
  - Decrement the count register
  - Repeat as long as count register is not zero
After the CPU has programmed DMA with the necessary information to perform the transfer, and started the DMA, the DMA will perform the transfer for the CPU.
So you clearly see that the DMA is a helper device to the CPU.
Let me give you an example of IO communication using a DMA device first before going into the more difficult issues surrounding IO communication with an DMA device....
First, I have to make some assumptions on how the DMA is connected to the system bus:
- I will again use M68000 to illustrate with memory mapped IO
- I will assume that the registers of the DMA are mapped into the following addresses:
  - Status register: address 2000
  - Command register: address 2004
  - Data register: address 2008
  - Address register: address 2012
  - Count register: address 2016
The following is a assembler code fragment that will instruct the DMA to transfer 1000 bytes of data from memory to the IO device:
```
    (Assume "buffer" = starting address of the data in memory)

    move.l  #buffer, 2012       // Program Address register of DMA
    move.l  #1000, 2016         // Program Count register of DMA
    move.l  #1, 2004            // Assumed: command code for IO write
				// operation is 1
```
After these 3 simple move assembler instructions, the CPU has programmed the DMA and started the DMA to perform the IO transfer operation.
OK, now you know what must be done to get the DMA going. The 6-million dollar (i.e., very important) question that remains is:
- What is the CPU gonna do now ???
Maybe you did not get the issue addressed by the above question.... So make the issue more explicit, let me summarize all the relevant facts:
- The (only) job that the CPU does is:
  - to fetch next instruction at location given by the program counter and execute it (by decoding it, fetching the operands, compute the result and update the destination) - i.e., the CPU always performs the "instruction execution cycle (see click here.
- The program (whose assembler instructions are being executing by the CPU) is performing an IO operation.
- And in the process, the DMA is instructed to do the IO transfer
- The CPU is now free to continue executing instructions (this is a huge difference from the programmed IO technique where we tie up the very expensive CPU to do the IO operation).
The 6-million dollar questions posed above is asking the following:
- Can the CPU continue to execute instructions of the currently running program that has performed an IO operation ???
The answer to this question is a resounding no...
Consider the following simple program that read in 1000 grade values and computes its average:
```
     main()
     {
        int grades[1000];
        int i;
	double sum, average;

        read(grades, grade-file, 1000);  // Start DMA to do transfer
        sum = 0;
        for (i = 0; i < 1000; i++)
           sum = sum + grades[i];
        average = sum/1000;
     }
 
```
The read() subroutine will translate into assembler code to instruct the DMA to transfer the data to the memory location given by the variable "grades". But it takes some time for the data to be transfered to memory... So if the CPU preceed without waiting with the instructions following the read() subroutine, it will use incorrect values (because the data has not yet been transfered into the "grades" array in memory) !!!
So now you know the problem, what is the solution to the 6-million dollar question ?
Well, one straight forward solution is to execute the following program after the CPU started the DMA:
```
  WaitLoop:
            move.l  2000, D0     // Read status register
            cmp.l  #0, D0        // Test if CPU is idle (done)
	    bne     WaitLoop
  
```
This solution is clearly unattractive: we are again making the very expensive CPU do nothing at all (the loop accomplishes absolutely nothing useful).
In fact, the soultion is dumb... why would you want to make an extra helper device (the DMA) for the CPU and then make the CPU wait for the DMA to finish ? You could just as well make the CPU do the transfer and you don't need an extra DMA device....
The real answer to the above 6-million dollar question is to keep the CPU doing useful work.
Clearly, the CPU may not be able to continue with the currently running program.
Fear not, there are other users/programs running in the computer (just telnet to dooley.cc.emory.edu and execute a "w" command and you will see tons of other users)
So the answer to the 6-million dollar question is:
- Make the current program give up the CPU
- Make the CPU run another program (that is not waiting on an IO operation to complete)
- And when the IO transfer operation finishes, switch the CPU back to the original program.
To realise the solution, we need to solve two major problems:
- How to make a program give up the CPU in such a way that the program can be restarted at a later time from the same state as when it was halted.
- How can the stopped program reclaim the CPU back when the IO operation is completed (it need to reclaim the use of the CPU so that progress can be made with its execution...)
From the point of view of the programs, the releasing and reclaiming of the CPU is the act of one program giving the control of the CPU to another program.
In Computer Science jargon, when two (or more) programs try to interact with one another, we say that the program must synchronize with one each other. The technique to make them agree is called a synchronization method.