Exercise Solutions: DataFrames in Julia

This section contains solutions to the end-of-section exercises found in this book for the DataFrames in Julia chapter.

2.1 DataFrames.jl

Exercise 2.1.1: Construct a DataFrame that encodes the following data table:

Name Year Midterm Grade
Jen Sophomore 93
Neil Junior 88
Lea Junior 86
Richard Sophomore 75
Penelope Sophomore 78

Hint: Remember that you can include underscores in symbols using the :a_b notation, or you can create a symbol from a string using Symbol("A String").

# SOLUTION:
df_1_1_1 = DataFrame(
    Name = ["Jen", "Neil", "Lea", "Richard", "Penelope"],
    Year = ["Sophomore", "Junior", "Junior", "Sophomore", "Sophomore"],
    Midterm_Grade = [93, 88, 86, 75, 78]
)

5 rows × 3 columns

NameYearMidterm_Grade
StringStringInt64
1JenSophomore93
2NeilJunior88
3LeaJunior86
4RichardSophomore75
5PenelopeSophomore78

Exercise 2.1.2: Create a DataFrameRow object that contains Lea’s midterm information.

# SOLUTION:
df_1_1_1[3,:]

DataFrameRow

1 rows × 3 columns

NameYearMidterm_Grade
StringStringInt64
3LeaJunior86

Exercise 2.1.3: Using the function Statistics.mean, compute the mean of the midterm scores. Do not hardcode the answer, but use a function which accepts an array.

# SOLUTION:
# # if you need to, import statistics
# using Statistics

mean(df_1_1_1[:Midterm_Grade])
84.0

2.2 Categortical Arrays

Exercise 2.2.1: Create a categorical array arr for the midterm grades data set from the last set of exericses.

Name Year Midterm Grade
Jen Sophomore 93
Neil Junior 88
Lea Junior 86
Richard Sophomore 75
Penelope Sophomore 78
# SOLUTION:
arr = CategoricalArray(df_1_1_1[:Year])
5-element CategoricalArray{String,1,UInt32}:
 "Sophomore"
 "Junior"   
 "Junior"   
 "Sophomore"
 "Sophomore"

Exercise 2.2.2: Add two Freshman individuals and three Senior individuals to the array (in that order). Recall that you can append arrays using the append! function.

# SOLUTION:
append!(arr, CategoricalArray(["Freshman", "Freshman", "Senior", "Senior", "Senior"]))
10-element CategoricalArray{String,1,UInt32}:
 "Sophomore"
 "Junior"   
 "Junior"   
 "Sophomore"
 "Sophomore"
 "Freshman" 
 "Freshman" 
 "Senior"   
 "Senior"   
 "Senior"   

Exercise 2.2.3: Order the levels of the array so that Freshman is less than Sophomore is less than Junior etc.

# SOLUTION:
levels!(arr, ["Freshman", "Sophomore", "Junior", "Senior"])
levels(arr)
4-element Array{String,1}:
 "Freshman" 
 "Sophomore"
 "Junior"   
 "Senior"   

Exercise 2.2.4: Test your ordering using arr[3] > arr[1] == arr[4] > arr[6]. (This should return true.)

# SOLUTION:
ordered!(arr, true)
arr[3] > arr[1] == arr[4] > arr[6]
true

2.3 Importing and Exporting Data

Exercise 2.3.1: Write code to open the follow text file as a DataFrame. The assume the filename is data.txt.

Col_1|Col_2|Col_3
1|2|3
4|5|6
# SOLUTION:
CSV.read("data.txt", delim="|")

Exercise 2.3.2: How would you adapt the load_json function if the data in your JSON file as oriented as a dictionary of columns to arrays of values, e.g.

{ 
    "Col_1" : [1, 4],
    "Col_2" : [2, 5],
    "Col_3" : [3, 6]
}
# SOLUTION:
function load_json(path)
    dictarr = JSON.parsefile(path)
    df_dict = Dict()
    for col in keys(dictarr)
        df_dict[col] = dictarr[col]
    end
    return DataFrame(df_dict)
end

2.4 Missing Data