You are given a Pandas DataFrame called weather
with the following structure:
weather
+-------------+--------+
| Column Name | Type |
+-------------+--------+
| city | object |
| month | object |
| temperature | int |
+
Write a function to pivot the DataFrame such that each row represents temperatures for a specific month, and each city is a separate column. The order of the months in the output should be sorted alphabetically.
Example:
Input:
weather
+--------------+----------+-------------+
| city | month | temperature |
+--------------+----------+-------------+
| Jacksonville | January | 13 |
| Jacksonville | February | 23 |
| Jacksonville | March | 38 |
| ElPaso | January | 20 |
| ElPaso | February | 6 |
| ElPaso | March | 26 |
+
Expected Output:
output
+----------+--------+--------------+
| month | ElPaso | Jacksonville |
+----------+--------+--------------+
| February | 6 | 23 |
| January | 20 | 13 |
| March | 26 | 38 |
+
Constraints:
city
, month
, and temperature
.month
column will contain unique month names.city
column will contain city names.temperature
column will contain integer values.Explain the time and space complexity of your solution.
This problem focuses on reshaping a DataFrame to transform rows into columns based on specific values. We're given a table of weather data and need to pivot it so that each row represents temperatures for a specific month, and each city becomes a separate column.
A straightforward but less efficient approach involves iterating through the DataFrame and manually constructing the pivoted table. We can use dictionaries or lists to store the temperatures for each city and month, then combine them into the desired format.
Algorithm:
Code (Python):
import pandas as pd
def pivot_weather_naive(weather: pd.DataFrame) -> pd.DataFrame:
pivoted_data = {}
for index, row in weather.iterrows():
city = row['city']
month = row['month']
temperature = row['temperature']
if month not in pivoted_data:
pivoted_data[month] = {}
pivoted_data[month][city] = temperature
# Create a list of dictionaries for the DataFrame
data = []
for month in pivoted_data:
row = {'month': month}
row.update(pivoted_data[month])
data.append(row)
return pd.DataFrame(data).set_index('month').sort_index()
Big O Analysis:
pivot
functionThe Pandas library provides a built-in pivot
function that efficiently performs this transformation.
Algorithm:
pivot
function with month
as the index, city
as the columns, and temperature
as the values.Code (Python):
import pandas as pd
def pivot_weather(weather: pd.DataFrame) -> pd.DataFrame:
pivoted = weather.pivot(index='month', columns='city', values='temperature')
pivoted = pivoted.rename_axis(None, axis=1).reset_index()
pivoted = pivoted.set_index('month').sort_index()
return pivoted
Big O Analysis:
pivot
function is generally optimized for performance.pivot
function will insert NaN
values. Handle NaN
values by filling them with appropriate values (e.g., 0, the average temperature for that month, etc.) if necessary.pivot
function will raise an error. You may need to preprocess the data to handle duplicates (e.g., by taking the average temperature).pivot
function will return an empty DataFrame, handle this case if required.Let's consider a DataFrame
data = {
'city': ['Jacksonville', 'Jacksonville', 'ElPaso', 'ElPaso'],
'month': ['January', 'February', 'January', 'February'],
'temperature': [10, 20, 15, 25]
}
weather_df = pd.DataFrame(data)
print(pivot_weather(weather_df))
The code using pivot
will produce the output DataFrame:
ElPaso Jacksonville
month
February 25 20
January 15 10