Activity 2: Python - CSV handling
Python is a versatile and widely-used programming language known for its simplicity and readability. It's a great choice for beginners and experienced developers alike. For this activity we will focus on CSV file manipulation using the pandas library. Pandas is a powerful library in Python for data manipulation and analysis. It provides a convenient and efficient way to handle CSV files and work with tabular data.
The CSV will be read from a string using the method read_csv
as
df = read_csv(’4568_activity_shuffled.csv’)
df = read_csv(’4568_activity_deleted.csv’)
<aside> 💡 The above method is provided in the template. Populate it with the correct logic. This method will be used in the auto-grading as well
</aside>
Tasks | Expected Output | Grade |
---|---|---|
1.1. Use the pandas library and read the 4568_activity_shuffled.csv |
||
1.2. Pass this DataFrame into the return_first_n_rows method and return the first n rows of the DataFrame (n is an integer number of rows) |
When a source DataFrame and number of rows is passed to the method, the method returns a DataFrame containing the passed number of rows. | 10 |
2.1. Use the get_df_descriptions method to create a python list and return the count, mean and standard deviation of the ‘x’ column of the DataFrame. |
||
eg. | ||
vals = [count,mean,std] |
The method should return a python list containing the 3 values in the defined order. | 20 |
3.1. Use the gen_sorted_dataframe method to return a DataFrame sorted by ‘time’ column in ascending order |
The returned DataFrame is sorted by the time column. | 20 |
4.1 4568_activity_deleted.csv contains rows where either the x or the y value contains an np.nan value (considered an invalid entry) |
||
4.2 Replace each occurrence of this np.nan with an average of the previous and the next value of the same column. |
||
eg. if x[6]==np.nan then |
||
x[6] = (x[5]+x[7])/2 |
||
4.3 Return a completed pandas DataFrame containing no np.nan values. |
Some values (either x or y are replaced by np.nan ). These values must be replaced by the average of the previous and next row. |
|
Return a DataFrame with the newly computed values | 50 |
Use the following code template and populate the functions with your logic. The auto-grader will use the same functions
import numpy as np
import pandas as pd
def read_csv(csv_filename: str) -> pd.DataFrame:
"""
Input
-----
csv_filename: a string representing the name of the csv file
Output
------
A Pandas DataFrame created from the csv file
"""
return
# Task 1: Print the first n rows of the shuffled_csv dataframe
def return_first_n_rows(src_df: pd.DataFrame, n: int) -> pd.DataFrame:
"""
Input
-----
src_df: a Pandas DataFrame created from the 4568_activity_shuffled.csv file
n: integer representing the number of rows to return
Output
------
A Pandas DataFrame containing only the first n rows of src_df
"""
return
# Task 2: Get the count, mean and std of the x column of shuffled_csv and return a python list containing these values
def get_df_descriptions(src_df: pd.DataFrame) -> list:
"""
Input
-----
src_df: a Pandas DataFrame created from the 4568_activity_shuffled.csv file
Output
------
vals: A python list containing the count, mean and std of the x column of src_df
vals = [count,mean,std]
"""
vals = []
return vals
# Task 3: Return a pandas DataFrame sorted by the time column
def gen_sorted_dataframe(src_df: pd.DataFrame) -> pd.DataFrame:
"""
Input
-----
src_df: a Pandas DataFrame created from the 4568_activity_shuffled.csv file
Output
------
A Pandas DataFrame sorted by the time column
"""
return
# Task 4: Return a pandas DataFrame with the rows with NaN values averaged with the previous and next rows
def gen_restored_dataframe(src_df: pd.DataFrame) -> pd.DataFrame:
"""
Input
-----
src_df: a Pandas DataFrame created from the 4568_activity_deleted.csv file
Output
------
A Pandas DataFrame with the rows with NaN values averaged
"""
return
if __name__ == "__main__":
# Main function for the activity
# Use this method to write the code for the activity and to test your functions
<aside> 💡 Do NOT change any of the function names or arguments. The auto-grader will import your code and run the functions you define with the same name. Changing the names will cause the auto-grader to fail, resulting in a 0 for the task.
</aside>