Write a solution to create a DataFrame from a 2D list called student_data
. This 2D list contains the IDs and ages of some students.
The DataFrame should have two columns, student_id
and age
, and be in the same order as the original 2D list.
The result format is in the following example.
Example 1:
Input: student_data:[ [1, 15], [2, 11], [3, 11], [4, 20] ]
Output: +------------+-----+ | student_id | age | +------------+-----+ | 1 | 15 | | 2 | 11 | | 3 | 11 | | 4 | 20 | +------------+-----+ Explanation: A DataFrame was created on top of student_data, with two columns namedstudent_id
andage
.
When you get asked this question in a real-life environment, it will often be ambiguous (especially at FAANG). Make sure to ask these questions in that case:
The brute force way to create a DataFrame from a list is like manually building a table. We will examine all possible arrangements of the given data to find the correct one. We'll go row by row, filling each entry until we have a complete DataFrame.
Here's how the algorithm would work step-by-step:
def create_dataframe_brute_force(data, number_of_rows, number_of_columns):
dataframe = []
current_index = 0
# Construct the dataframe row by row
for row_index in range(number_of_rows):
row = []
for column_index in range(number_of_columns):
# Placing each data element into the next cell
if current_index < len(data):
row.append(data[current_index])
current_index += 1
else:
return "Data insufficient for the specified shape"
dataframe.append(row)
# Ensuring all data elements have been placed
if current_index != len(data):
return "Shape does not fit the data"
return dataframe
The goal is to efficiently organize data from a simple list into a structured table-like format, commonly called a DataFrame. We'll achieve this by systematically taking data from the list and arranging it into columns based on the headers provided.
Here's how the algorithm would work step-by-step:
def create_dataframe_from_list(data_list, column_names, desired_rows):
dataframe = []
number_of_columns = len(column_names)
# Iterate through the desired number of rows.
for row_index in range(desired_rows):
row = {}
# Assign data to columns based on the header structure.
for column_index, column_name in enumerate(column_names):
list_index = (row_index * number_of_columns) + column_index
# Ensure we don't exceed the bounds of the input list.
if list_index < len(data_list):
row[column_name] = data_list[list_index]
# Use 'empty' as a placeholder if data is missing.
else:
row[column_name] = 'empty'
dataframe.append(row)
# Returns a list of dictionaries representing the DataFrame.
return dataframe
Case | How to Handle |
---|---|
data is null or None | Return an empty DataFrame or raise an appropriate exception such as ValueError. |
columns is null or None | Return an empty DataFrame or raise an appropriate exception such as ValueError. |
data is an empty list | Return an empty DataFrame with the provided column names. |
columns is an empty list | If data is not empty, raise an exception since no column names are provided. |
Number of columns does not match the number of elements in each row of data | Raise an exception such as ValueError indicating a mismatch between data and columns. |
data contains rows of varying lengths | Raise an exception such as ValueError indicating inconsistent data row lengths. |
columns contains duplicate column names | Raise an exception or rename the duplicate columns with suffixes like _1, _2, etc. |
data contains non-primitive data types or mixed data types in a column | The DataFrame should handle different data types (strings, integers, floats, booleans), and type coercion should be explicit if necessary. |