You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Auto-generated via `{sandpaper}`
Source : e8ab4fb
Branch : main
Author : Andrew Gait <andrew.gait@manchester.ac.uk>
Time : 2026-04-13 10:26:37 +0000
Message : Merge pull request #58 from UoMResearchIT/39-typo-in-units-deg-is-not-a-callable
deg is not callable
Copy file name to clipboardExpand all lines: 07-pandas_essential.md
+38-38Lines changed: 38 additions & 38 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -41,15 +41,15 @@ with a text editor and look at the data layout.
41
41
The data within this file is organised much as you'd expect the data within a spreadsheet. The first row of the file contains the headers for each of the columns. The first column contains the name of the countries, while the remaining columns contain the GDP values for these countries for each year. Pandas has the `read_csv` function for reading structured data such as this, which makes reading the file easy:
Here we specify that the `country` column should be used as the index column (`index_col`).
48
48
49
49
This creates a `DataFrame` object containing the dataset. This is similar to a numpy array, but has a number of significant differences. The first is that there are more ways to quickly understand a pandas dataframe. For example, the `info` function gives an overview of the data types and layout of the DataFrame:
50
50
51
51
```python
52
-
data.info()
52
+
df.info()
53
53
```
54
54
55
55
```output
@@ -74,10 +74,10 @@ dtypes: float64(12)
74
74
memory usage: 3.0+ KB
75
75
```
76
76
77
-
You can also carry out quick analysis of the data using the `describe` function:
77
+
You can also carry out quick analysis of the DataFrame using the `describe` function:
78
78
79
79
```python
80
-
data.describe()
80
+
df.describe()
81
81
```
82
82
83
83
```output
@@ -94,58 +94,58 @@ max 14734.232750 17909.489730 20431.092700 ...
94
94
95
95
## Accessing elements, rows, and columns
96
96
97
-
The other major difference to numpy arrays is that we cannot directly access the array elements using numerical indices such as `data[0,0]`. It is possible to access columns of data using the column headers as indices (for example, `data['gdpPercap_1952']`), but this is not recommended. Instead you should use the `iloc` and `loc` methods.
97
+
The other major difference to numpy arrays is that we cannot directly access the array elements using numerical indices such as `df[0,0]`. It is possible to access columns of data using the column headers as indices (for example, `df['gdpPercap_1952']`), but this is not recommended. Instead you should use the `iloc` and `loc` methods.
98
98
99
99
The `iloc` method enables us to access the DataFrame as we would a numpy array:
100
100
101
101
```python
102
-
print(data.iloc[0,0])
102
+
print(df.iloc[0,0])
103
103
```
104
104
105
105
while the `loc` method enables the same access using the index and column headers:
106
106
107
107
```python
108
-
print(data.loc["Albania", "gdpPercap_1952"])
108
+
print(df.loc["Albania", "gdpPercap_1952"])
109
109
```
110
110
111
111
For both of these methods, we can leave out the column indexes, and these will all be returned for the specified index row:
112
112
113
113
```python
114
-
print(data.loc["Albania"])
114
+
print(df.loc["Albania"])
115
115
```
116
116
117
-
This will not work for column headings (in the inverse of the `data['gdpPercap_1952']` method) however. While it is quick to type, we recommend trying to avoid using this method of slicing the DataFrame, in favour of the methods described below.
117
+
This will not work for column headings (in the inverse of the `df['gdpPercap_1952']` method) however. While it is quick to type, we recommend trying to avoid using this method of slicing the DataFrame, in favour of the methods described below.
118
118
119
119
For both of these methods we can use the `:` character to select all elements in a row or column. For example, to get all information for Albania:
120
120
121
121
```python
122
-
print(data.loc["Albania", :])
122
+
print(df.loc["Albania", :])
123
123
```
124
124
125
125
or:
126
126
127
127
```python
128
-
print(data.iloc[0, :])
128
+
print(df.iloc[0, :])
129
129
```
130
130
131
131
The `:` character by itself is shorthand to indicate all elements across that indice, but it can also be combined with index values or column headers to specify a slice of the DataArray:
Pandas data arrays are based on numpy arrays, and retain some of the numpy tools, such as masked arrays. This enables us to apply selection criteria to the datasets, so that only the values that we require are shown. For example, the following selects all data where the GDP is above $10,000:
160
+
Pandas Dataframes are based on numpy arrays, and retain some of the numpy tools, such as masked arrays. This enables us to apply selection criteria to the Dataframes, so that only the values that we require are shown. For example, the following selects all data where the GDP is above $10,000:
Note that the x-tick labels have been taken directly from the index values of the transposed DataFrame (which were the original column labels). These don't really need to be more than the year of the GDP values, so we could change the column labels to reflect this.
253
253
254
-
First we make a new copy of the dataframe (in case anything goes wrong):
254
+
First we make a new copy of the DataFrame (in case anything goes wrong):
255
255
256
256
```python
257
-
gdpPercap=data.copy(deep=True)
257
+
df_gdpPercap=df.copy(deep=True)
258
258
```
259
259
260
-
We have given this new dataframe a more appropriate name, replacing the information that will be removed from the column headers.
260
+
We have given this new DataFrame a more appropriate name, replacing the information that will be removed from the column headers.
261
261
262
262
Now we will use the inbuilt `str.strip` method to clean up our column labels for the new
0 commit comments