Indexing and Selecting Data  

  • Operators used in indexing and selecting data in a Data Frame:

      [  ]   =   indexing operator
      .      =   attribute operator
    

  • Pandas supports 3 types of Multi-axes indexing:

    • dfObj.loc( .. ):     label based indexing

    • dfObj.iloc( .. ):   integer rank based indexing

  .loc[ ]: indexing using labels - selecting rows  

df = pd.DataFrame(np.random.randn(3, 4),
                  index = ['a','b','c'],
                  columns = ['A', 'B', 'C', 'D'])

print( df )

               A         B         C         D
      a -0.605034 -0.739834 -0.251246  0.585301
      b  1.522427 -0.322365 -0.168772  1.641832
      c -0.521770 -1.004734  0.757724 -0.698973


Selecting rows: 1 row slices of rows df.loc['b'] = df.loc['b':'c'] = ending index ≤ 'c' !! A 1.522427 A B C D B -0.322365 b 1.522427 -0.322365 -0.168772 1.641832 C -0.168772 c -0.521770 -1.004734 0.757724 -0.698973 D 1.641832 Selected rows df.loc[ ['a','c'] ] = (use a list) A B C D a -0.605034 -0.739834 -0.251246 0.585301 c -0.521770 -1.004734 0.757724 -0.698973

  .loc[ ]: indexing using labels - selecting columns  

df = pd.DataFrame(np.random.randn(3, 4),
                  index = ['a','b','c'],
                  columns = ['A', 'B', 'C', 'D'])

print( df )

               A         B         C         D
      a  0.126892 -0.207192 -0.024788  0.463379
      b -0.270875  0.393623  0.292516  0.192060
      c -0.241555 -0.105059  0.220437 -0.517377


Selecting columns: 1 column slices of columns df.loc[:,'B'] = df.loc[:,'B':'D'] = a -0.207192 B C D b 0.393623 a -0.207192 -0.024788 0.463379 c -0.105059 b 0.393623 0.292516 0.192060 c -0.105059 0.220437 -0.517377 Selected columns df.loc[ :,['B','D'] ] = (use a list) B D a -0.207192 0.463379 b 0.393623 0.192060 c -0.105059 -0.517377

  .iloc[ ]: indexing using integer ranks - selecting rows  

df = pd.DataFrame(np.random.randn(3, 4),
                  index = ['a','b','c'],
                  columns = ['A', 'B', 'C', 'D'])

print( df )

               A         B         C         D
      a -0.605034 -0.739834 -0.251246  0.585301
      b  1.522427 -0.322365 -0.168772  1.641832
      c -0.521770 -1.004734  0.757724 -0.698973


Selecting rows: 1 row slices of rows df.iloc[ 1 ] = df.iloc[ 1 : 3 ] = ending index < 3 !! A 1.522427 A B C D B -0.322365 b 1.522427 -0.322365 -0.168772 1.641832 C -0.168772 c -0.521770 -1.004734 0.757724 -0.698973 D 1.641832 Selected rows df.iloc[ [ 0 , 2 ] ] = (use a list) A B C D a -0.605034 -0.739834 -0.251246 0.585301 c -0.521770 -1.004734 0.757724 -0.698973

  .iloc[ ]: indexing using integer rank - selecting columns  

df = pd.DataFrame(np.random.randn(3, 4),
                  index = ['a','b','c'],
                  columns = ['A', 'B', 'C', 'D'])

print( df )

               A         B         C         D
      a  0.126892 -0.207192 -0.024788  0.463379
      b -0.270875  0.393623  0.292516  0.192060
      c -0.241555 -0.105059  0.220437 -0.517377


Selecting columns: 1 column slices of columns df.iloc[:, 1 ] = df.iloc[:, 1 : 4 ] = ending index < 4 !! a -0.207192 B C D b 0.393623 a -0.207192 -0.024788 0.463379 c -0.105059 b 0.393623 0.292516 0.192060 c -0.105059 0.220437 -0.517377 Selected columns df.iloc[ :,[ 1 , 3 ] ] = (use a list) B D a -0.207192 0.463379 b 0.393623 0.192060 c -0.105059 -0.517377