Series
You can convert a list,numpy array, or dictionary to a Series:
labels = ['a','b','c']
my_list = [10,20,30]
arr = np.array([10,20,30])
d = {'a':10,'b':20,'c':30}
pd.Series(data=my_list)
pd.Series(data=my_list,index=labels)
pd.Series(d)
#series can even hold function, though its very unlikely that we will use it
Using an index
ser1 = pd.Series([1,2,3,4],index = ['USA', 'Germany','USSR', 'Japan'])
USA 1
Germany 2
USSR 3
Japan 4
dtype: int64
ser2 = pd.Series([1,2,5,4],index = ['USA', 'Germany','Italy', 'Japan'])
ser1 + ser2
Germany 4.0
Italy NaN
Japan 8.0
USA 2.0
USSR NaN
Data Frames
We can think of a DataFrame as a bunch of Series objects
df = pd.DataFrame(randn(5,4),index='A B C D E'.split(),columns='W X Y Z'.split())
W | X | Y | Z |
A | 2.706850 | 0.628133 | 0.907969 | 0.503826 |
B | 0.651118 | -0.319318 | -0.848077 | 0.605965 |
C | -2.018168 | 0.740122 | 0.528813 | -0.589001 |
D | 0.188695 | -0.758872 | -0.933237 | 0.955057 |
E | 0.190794 | 1.978757 | 2.605967 | 0.683509 |
df['new'] = df['W'] + df['Y']
| W | X | Y | Z | new |
A | 2.706850 | 0.628133 | 0.907969 | 0.503826 | 3.614819 |
B | 0.651118 | -0.319318 | -0.848077 | 0.605965 | -0.196959 |
C | -2.018168 | 0.740122 | 0.528813 | -0.589001 | -1.489355 |
D | 0.188695 | -0.758872 | -0.933237 | 0.955057 | -0.744542 |
E | 0.190794 | 1.978757 | 2.605967 | 0.683509 | 2.796762 |
df>0
| W | X | Y | Z |
A | 2.706850 | 0.628133 | 0.907969 | 0.503826 |
B | 0.651118 | NaN | NaN | 0.605965 |
C | NaN | 0.740122 | 0.528813 | NaN |
D | 0.188695 | NaN | NaN | 0.955057 |
E | 0.190794 | 1.978757 | 2.605967 | 0.683509
|
df[df['W']>0]
| W | X | Y | Z |
A | 2.706850 | 0.628133 | 0.907969 | 0.503826 |
B | 0.651118 | -0.319318 | -0.848077 | 0.605965 |
D | 0.188695 | -0.758872 | -0.933237 | 0.955057 |
E | 0.190794 | 1.978757 | 2.605967 | 0.683509 |
df[df['W']>0][['Y','X']]
Y | X |
A | 0.907969 | 0.628133 |
B | -0.848077 | -0.319318 |
D | -0.933237 | -0.758872 |
E | 2.605967 | 1.978757 |
df[(df['W']>0) & (df['Y'] > 1)]
| W | X | Y | Z |
E | 0.190794 | 1.978757 | 2.605967 | 0.683509 |
# Reset to default 0,1...n index
df.reset_index()
| index | W | X | Y | Z |
0 | A | 2.706850 | 0.628133 | 0.907969 | 0.503826 |
1 | B | 0.651118 | -0.319318 | -0.848077 | 0.605965 |
2 | C | -2.018168 | 0.740122 | 0.528813 | -0.589001 |
3 | D | 0.188695 | -0.758872 | -0.933237 | 0.955057 |
4 | E | 0.190794 | 1.978757 | 2.605967 | 0.683509 |
newind = 'CA NY WY OR CO'.split()
df['States'] = newind
df
| W | X | Y | Z | States |
A | 2.706850 | 0.628133 | 0.907969 | 0.503826 | CA |
B | 0.651118 | -0.319318 | -0.848077 | 0.605965 | NY |
C | -2.018168 | 0.740122 | 0.528813 | -0.589001 | WY |
D | 0.188695 | -0.758872 | -0.933237 | 0.955057 | OR |
E | 0.190794 | 1.978757 | 2.605967 | 0.683509 | CO |
df.set_index('States') # but this is temperoray ... to make permanent change
df.set_index('States',inplace=True) #for permanent change
| W | X | Y | Z |
States | | | | |
CA | 2.706850 | 0.628133 | 0.907969 | 0.503826 |
NY | 0.651118 | -0.319318 | -0.848077 | 0.605965 |
WY | -2.018168 | 0.740122 | 0.528813 | -0.589001 |
OR | 0.188695 | -0.758872 | -0.933237 | 0.955057 |
CO | 0.190794 | 1.978757 | 2.605967 | 0.683509 |
Multi-Index and Index Hierarchy
Comments
Post a Comment