Activity 2: Python - CSV handling

Published and synced page for Students

Intro

Python is a versatile and widely-used programming language known for its simplicity and readability. It's a great choice for beginners and experienced developers alike. For this activity we will focus on CSV file manipulation using the pandas library. Pandas is a powerful library in Python for data manipulation and analysis. It provides a convenient and efficient way to handle CSV files and work with tabular data.

Prerequisites

https://python.land/python-tutorial
https://docs.python.org/3/library/csv.html (We will not be using this but this is an in-built library)
https://www.w3schools.com/python/pandas/pandas_intro.asp
https://pythonbasics.org/read-csv-with-pandas/
https://numpy.org/doc/stable/user/quickstart.html
https://numpy.org/doc/stable/reference/constants.html#numpy.nan (”Task 4” uses this datatype from numpy)

Task

The CSV will be read from a string using the method read_csv as

df = read_csv(’4568_activity_shuffled.csv’)
df = read_csv(’4568_activity_deleted.csv’)

<aside> 💡 The above method is provided in the template. Populate it with the correct logic. This method will be used in the auto-grading as well

</aside>

Tasks	Expected Output	Grade
1.1. Use the pandas library and read the `4568_activity_shuffled.csv`
1.2. Pass this DataFrame into the `return_first_n_rows` method and return the first n rows of the DataFrame (n is an integer number of rows)	When a source DataFrame and number of rows is passed to the method, the method returns a DataFrame containing the passed number of rows.	10
2.1. Use the `get_df_descriptions` method to create a python list and return the count, mean and standard deviation of the ‘x’ column of the DataFrame.
eg.
`vals = [count,mean,std]`	The method should return a python list containing the 3 values in the defined order.	20
3.1. Use the `gen_sorted_dataframe` method to return a DataFrame sorted by ‘time’ column in ascending order	The returned DataFrame is sorted by the time column.	20
4.1 `4568_activity_deleted.csv` contains rows where either the x or the y value contains an `np.nan` value (considered an invalid entry)
4.2 Replace each occurrence of this `np.nan` with an average of the previous and the next value of the same column.
eg. if `x[6]==np.nan` then
`x[6] = (x[5]+x[7])/2`
4.3 Return a completed pandas DataFrame containing no `np.nan` values.	Some values (either x or y are replaced by `np.nan` ). These values must be replaced by the average of the previous and next row.
Return a DataFrame with the newly computed values	50

Code Template

Use the following code template and populate the functions with your logic. The auto-grader will use the same functions

import numpy as np
import pandas as pd

def read_csv(csv_filename: str) -> pd.DataFrame:
    """
    Input
    -----
    csv_filename: a string representing the name of the csv file

    Output
    ------
    A Pandas DataFrame created from the csv file
    """
    return

# Task 1: Print the first n rows of the shuffled_csv dataframe
def return_first_n_rows(src_df: pd.DataFrame, n: int) -> pd.DataFrame:
    """
    Input
    -----
    src_df: a Pandas DataFrame created from the 4568_activity_shuffled.csv file
    n: integer representing the number of rows to return

    Output
    ------
    A Pandas DataFrame containing only the first n rows of src_df
    """
    return

# Task 2: Get the count, mean and std of the x column of shuffled_csv and return a python list containing these values
def get_df_descriptions(src_df: pd.DataFrame) -> list:
    """
    Input
    -----
    src_df: a Pandas DataFrame created from the 4568_activity_shuffled.csv file

    Output
    ------
    vals: A python list containing the count, mean and std of the x column of src_df
    vals = [count,mean,std]
    """
    vals = []

    return vals

# Task 3: Return a pandas DataFrame sorted by the time column
def gen_sorted_dataframe(src_df: pd.DataFrame) -> pd.DataFrame:
    """
    Input
    -----
    src_df: a Pandas DataFrame created from the 4568_activity_shuffled.csv file

    Output
    ------
    A Pandas DataFrame sorted by the time column
    """
    return

# Task 4: Return a pandas DataFrame with the rows with NaN values averaged with the previous and next rows
def gen_restored_dataframe(src_df: pd.DataFrame) -> pd.DataFrame:
    """
    Input
    -----
    src_df: a Pandas DataFrame created from the 4568_activity_deleted.csv file

    Output
    ------
    A Pandas DataFrame with the rows with NaN values averaged
    """
    return

if __name__ == "__main__":
    # Main function for the activity
    # Use this method to write the code for the activity and to test your functions

<aside> 💡 Do NOT change any of the function names or arguments. The auto-grader will import your code and run the functions you define with the same name. Changing the names will cause the auto-grader to fail, resulting in a 0 for the task.

</aside>