Published and synced page for Students

Activity 2: Python - CSV handling

Intro

Python is a versatile and widely-used programming language known for its simplicity and readability. It's a great choice for beginners and experienced developers alike. For this activity we will focus on CSV file manipulation using the pandas library. Pandas is a powerful library in Python for data manipulation and analysis. It provides a convenient and efficient way to handle CSV files and work with tabular data.

Prerequisites

Task

The CSV will be read from a string using the method read_csv as

df = read_csv(’4568_activity_shuffled.csv’)
df = read_csv(’4568_activity_deleted.csv’)

<aside> 💡 The above method is provided in the template. Populate it with the correct logic. This method will be used in the auto-grading as well

</aside>

Tasks Expected Output Grade
1.1. Use the pandas library and read the 4568_activity_shuffled.csv
1.2. Pass this DataFrame into the return_first_n_rows method and return the first n rows of the DataFrame (n is an integer number of rows) When a source DataFrame and number of rows is passed to the method, the method returns a DataFrame containing the passed number of rows. 10
2.1. Use the get_df_descriptions method to create a python list and return the count, mean and standard deviation of the ‘x’ column of the DataFrame.
eg.
vals = [count,mean,std] The method should return a python list containing the 3 values in the defined order. 20
3.1. Use the gen_sorted_dataframe method to return a DataFrame sorted by ‘time’ column in ascending order The returned DataFrame is sorted by the time column. 20
4.1 4568_activity_deleted.csv contains rows where either the x or the y value contains an np.nan value (considered an invalid entry)
4.2 Replace each occurrence of this np.nan with an average of the previous and the next value of the same column.
eg. if x[6]==np.nan then
x[6] = (x[5]+x[7])/2
4.3 Return a completed pandas DataFrame containing no np.nan values. Some values (either x or y are replaced by np.nan ). These values must be replaced by the average of the previous and next row.
Return a DataFrame with the newly computed values 50

Code Template

Use the following code template and populate the functions with your logic. The auto-grader will use the same functions

import numpy as np
import pandas as pd

def read_csv(csv_filename: str) -> pd.DataFrame:
    """
    Input
    -----
    csv_filename: a string representing the name of the csv file

    Output
    ------
    A Pandas DataFrame created from the csv file
    """
    return

# Task 1: Print the first n rows of the shuffled_csv dataframe
def return_first_n_rows(src_df: pd.DataFrame, n: int) -> pd.DataFrame:
    """
    Input
    -----
    src_df: a Pandas DataFrame created from the 4568_activity_shuffled.csv file
    n: integer representing the number of rows to return

    Output
    ------
    A Pandas DataFrame containing only the first n rows of src_df
    """
    return

# Task 2: Get the count, mean and std of the x column of shuffled_csv and return a python list containing these values
def get_df_descriptions(src_df: pd.DataFrame) -> list:
    """
    Input
    -----
    src_df: a Pandas DataFrame created from the 4568_activity_shuffled.csv file

    Output
    ------
    vals: A python list containing the count, mean and std of the x column of src_df
    vals = [count,mean,std]
    """
    vals = []

    return vals

# Task 3: Return a pandas DataFrame sorted by the time column
def gen_sorted_dataframe(src_df: pd.DataFrame) -> pd.DataFrame:
    """
    Input
    -----
    src_df: a Pandas DataFrame created from the 4568_activity_shuffled.csv file

    Output
    ------
    A Pandas DataFrame sorted by the time column
    """
    return

# Task 4: Return a pandas DataFrame with the rows with NaN values averaged with the previous and next rows
def gen_restored_dataframe(src_df: pd.DataFrame) -> pd.DataFrame:
    """
    Input
    -----
    src_df: a Pandas DataFrame created from the 4568_activity_deleted.csv file

    Output
    ------
    A Pandas DataFrame with the rows with NaN values averaged
    """
    return

if __name__ == "__main__":
    # Main function for the activity
    # Use this method to write the code for the activity and to test your functions

<aside> 💡 Do NOT change any of the function names or arguments. The auto-grader will import your code and run the functions you define with the same name. Changing the names will cause the auto-grader to fail, resulting in a 0 for the task.

</aside>