Python Pandas - DataFrame  

  • A Data frame is a two-dimensional data structure:

  • Features of DataFrames:

    • Columns can be of different data types
    • Size is mutable
    • Axes (rows and columns) are labeled   (rows only with positional index)
    • Can perform arithmetic operations on rows and columns

  Constructor  

  • A DataFrame is created using the following constructor:

     pandas.DataFrame( data, index, columns, dtype, copy)
    
        data:    Various forms... 
                 e.g.: series, map, lists, dict and DataFrame.
    
        index:   a list of row labels.
                 Default is np.arange(n) 
    
        columns: a list of column labels.
                 Default is np.arange(n)
    
        dtype:   the data type of each column.
    
        copy:    command (or whatever it is) used for copying of data, 
                 Default is False.
    


  • Examples that show the effects of various parameters in next slide...

  Examples of the use of the DataFrame( ) constructor  

 Data = a list of numbers

    data = [10,20,30]

    df = pd.DataFrame(data)

        0
    0  10
    1  20
    2  30


Data = a list of pairs data = [ ['Alex',10], ['Bob',12], ['Clarke',13] ] df = pd.DataFrame(data) 0 1 0 Alex 10 1 Bob 12 2 Clarke 13

DEMO: progs/df01.py

  Examples of the use of the DataFrame( ) constructor  

 Data = a list of lists

    data = [ ['Alex',10, 1, 1], ['Bob',12, 1, 1], ['Clarke',13, 1, 1] ]

    df = pd.DataFrame(data)

            0   1  2  3
    0    Alex  10  1  1
    1     Bob  12  1  1
    2  Clarke  13  1  1


DEMO: progs/df01.py

  Examples of the use of the DataFrame( ) constructor  

 index specified:

    data = [ ['Alex',10], ['Bob',12], ['Clarke',13] ]

    df = pd.DataFrame(data, index=['a','b','c'])

            0   1
    a    Alex  10
    b     Bob  12
    c  Clarke  13


columns specified: data = [ ['Alex',10], ['Bob',12], ['Clarke',13] ] df = pd.DataFrame(data, columns=['x','y']) x y 0 Alex 10 1 Bob 12 2 Clarke 13

DEMO: progs/df01.py

  Create a DataFrame from Dict of Lists  

 No index specified:

    data = {'Name':['Tom', 'Jack', 'Steve'], 'Age':[28,34,29]}    # Dict of lists

    df = pd.DataFrame(data)

        Name  Age
    0    Tom   28
    1   Jack   34
    2  Steve   29


With index specified: data = {'Name':['Tom', 'Jack', 'Steve'], 'Age':[28,34,29]} # Dict of lists df = pd.DataFrame(data, index=['x','y', 'z']) Name Age x Tom 28 y Jack 34 z Steve 29

DEMO: progs/df02.py

  Create a DataFrame from Dict of Lists with a columns parameter  

 The columns parameter can change the column order:

    data = {'Name':['Tom', 'Jack', 'Steve'], 'Age':[28,34,29]}    # Dict of lists

    df = pd.DataFrame(data, columns=['Age','Name'])

       Age   Name
    0   28    Tom
    1   34   Jack
    2   29  Steve


Or select some columns: data = {'Name':['Tom', 'Jack', 'Steve'], 'Age':[28,34,29]} # Dict of lists df = pd.DataFrame(data, columns=['Name']) Name 0 Tom 1 Jack 2 Steve

DEMO: progs/df02.py

  Create a DataFrame from a List of Dicts  

 No index specified:

    data = [ {'a': 1, 'b': 2}, {'a': 5, 'b': 10, 'c': 20} ]   # List of dict

    df = pd.DataFrame(data)

       a     b     c
    0  1     2   NaN
    1  5    10  20.0



With index specified: data = [ {'a': 1, 'b': 2}, {'a': 5, 'b': 10, 'c': 20} ] # List of dict df = pd.DataFrame(data, index=['x','y']) a b c x 1 2 NaN y 5 10 20.0

DEMO: progs/df03.py

  Create a DataFrame from a List of Dicts with a columns parameter  

 The columns parameter can change the column order:

    data = [ {'a': 1, 'b': 2}, {'a': 5, 'b': 10, 'c': 20} ]   # List of dict

    df = pd.DataFrame(data, columns=['c','b','a'])

          c    b    a
    0   NaN    2    1
    1  20.0   10    5



Or select some columns: data = [ {'a': 1, 'b': 2}, {'a': 5, 'b': 10, 'c': 20} ] # List of dict df = pd.DataFrame(data, columns=['c','a']) c a 0 NaN 1 1 20.0 5

DEMO: progs/df03.py

  Selecting one or more columns  

 Use: df['colName']

    data = [ ['Alex',10], ['Bob',12], ['Clarke',13] ]

    df = pd.DataFrame(data, columns=['x','y'])

    print( df['x'] )

    0      Alex
    1       Bob
    2    Clarke


Or: df[['col1', 'col2', ..]] print( df[ ['y','x'] ] ) y x 0 10 Alex 1 12 Bob 2 13 Clarke

DEMO: progs/df04.py

  Adding and deleting a column  

 Add a column:   df['newCol'] = [ x1, x2, ... ]  # Must have correct length

    data = [ ['Alex',10], ['Bob',12], ['Clarke',13] ]

    df = pd.DataFrame(data, columns=['x','y'])

    df['z'] = [10,20,30]

            x   y   z
    0    Alex  10  10
    1     Bob  12  20
    2  Clarke  13  30


Delete a column: del df['colName'] or df.pop('colName') del df['y'] x z 0 Alex 10 1 Bob 20 2 Clarke 30

DEMO: progs/df04.py

  Selecting a row by label  

 Use: df.loc['rowIndex']   or  df.loc[['row1', 'row2', ...]]

    data = [ ['Alex',10], ['Bob',12], ['Clarke',13] ]

    df = pd.DataFrame(data, columns=['x','y'], index=['a','b','c'])

    df:
            x   y
    a    Alex  10
    b     Bob  12
    c  Clarke  13

    print( df.loc['b'] )              print( df.loc[ ['b','a'] ] )

    df.loc['b']:                      df.loc[ ['b','a'] ]:
                x    Bob			              x   y
                y     12                                b   Bob  12
                                                        a  Alex  10


  Selecting a row by iloc  

 Use: df.iloc[n]    or   df.iloc[[n1, n2, ...]]

    data = [ ['Alex',10], ['Bob',12], ['Clarke',13] ]

    df = pd.DataFrame(data, columns=['x','y'], index=['a','b','c'])

    df:
            x   y
    a    Alex  10
    b     Bob  12
    c  Clarke  13

    print( df.iloc[1] )               print( df.iloc[ [1,0] ] )

    df.iloc[1]:                       df.iloc[ [1,0] ]:
                x    Bob			              x   y
                y     12                                b   Bob  12
                                                        a  Alex  10


Rows can be sliced because it uses numeric indexes: df.iloc[0:2]: x y a Alex 10 b Bob 12

  Adding and deleting a row  

 Add a new row to a DataFrame:

    (1) Create a new DataFrame:

         a = {'x':['SY'], 'y':[99]}          # New row
         b = pd.DataFrame(a, index=['d'])    # Create DataFrame with row

    (2) Concat to existing DataFrame:

         df = pd.concat([df, b])

    df:
            x   y
    a    Alex  10
    b     Bob  12
    c  Clarke  13
    d      SY  99


Delete a row from a DataFrame: (1) Use label: df = df.drop( 'b' ) (2) Use name at iloc[ ]: df = df.drop( df.iloc[1].name )