13 Video Dashboard Functions

13.1 Wrangling Functions

13.1.1 wrangle_video

13.1.1.1 Main Documentation

Generates cleaned video data as a csv within a specified course
directory

Description:

     This function will automatically read files named
     'generalized_video_heat.csv' and 'generalized_video_axis.csv' from
     the specified course directory and output a csv named
     'wrangled_video_heat.csv' in the same directory

Usage:

     wrangle_video(input_course, testing = FALSE)
     
Arguments:

input_course: String of short name of course directory

Value:

     No value returned

Examples:

     wrangle_video(input_course = 'psyc1')
     

13.1.1.2 Additional Notes:

  • In order for this function to execute properly, there must be two files in the course directory named “generalized_video_heat.csv” and “generalized_video_axis.csv”. These files are obtained from Google BigQuery. Typically, these files are automatically obtained through the “populate_courses.py” script within the “exec” directory.
  • input_course corresponds to the “short name” within the “.config.json” file
  • The following are descriptions of the columns within the output csv file:
    • video_id: Video ID hash string
    • video_name: Name of the video
    • username: Username of the learner
    • min_into_video: Minute into video of the segment that the learner has watched
    • count: Number of times the learner has watched the segment
    • mode: Whether or not the learner is auditing or a verified student
    • certified: Whether or not the student has been certified
    • gender: Gender of the learner
    • activity_level: Length of time that the student has spent on the course
    • max_stop_position: The mode time at which video_stop events occur. The mode is used instead of the maximum because some videos have video_stop events that occur at incorrect times such as 3 days.
    • course_order: Order in which the video appears in the course
    • index_chapter: Index of the chapter in which the video appears in
    • chapter: Name of the chapter
  • Each video segment is 20 seconds in length. This can be adjusted by changing the global constant SEGMENT_SIZE in the video_wrangling.R file.
  • In order for a segment to be counted as being “viewed”, the user would have to watch the segment for at least 1 second before carrying out another event such as video_pause, video_seek, page_close etc. This threshold of 1 second can be adjusted via the global constant MIN_DURATION in the video_wrangling.R file.
  • The largest length of a video is set to be 1 hour. Any segments passed 1 hour will simply be ignored/truncated. This can be adjusted by changing the global constant MAX_DURATION in the video_wrangling.R file.

13.1.2 obtain_raw_video_data

13.1.2.1 Main Documentation

Reads raw uncleaned .csv into a dataframe

Description:

     Reads the raw generalized_video_heat.csv obtained through rbq.py
     into a dataframe.

Usage:

     obtain_raw_video_data(input_course, testing = FALSE)
     
Arguments:

input_course: Name of course directory (ex. psyc1, spd1, marketing,
          etc)

 testing: For developer use only. Boolean used to indicate to use
          testing data.

Value:

     'data': Dataframe containing raw student track log information

Examples:

     obtain_raw_video_data(input_course = 'psyc1')
     

13.1.3 obtain_video_axis_data

13.1.3.1 Main Documentation

Reads video_axis.csv file

Description:

     Reads the video_axis csv obtained through rbq.py into a dataframe.
     For documentation on how to use rbq.py, please see
     www.temporaryreferencelink.com

Usage:

     obtain_video_axis_data(input_course, testing = FALSE)
     
Arguments:

input_course: Name of course directory (ex. psyc1, spd1, marketing,
          etc)

 testing: For developer use only. Boolean used to indicate to use
          testing data.

Value:

     'video_axis': Dataframe containing video course structure
     information

Examples:

     obtain_video_axis_data(input_course = 'psyc1')
     

13.1.4 write_wrangled_video_data

13.1.4.1 Main Documentation

Outputs cleaned data as csv

Description:

     Writes cleaned data as a csv into the course correct directory

Usage:

     write_wrangled_video_data(input_course, cleaned_data, testing = FALSE)
     
Arguments:

input_course: Name of course directory (ex. psyc1, spd1, marketing,
          etc)

cleaned_data: Dataframe containing cleaned data. This cleaned data is
          typically obtained through

 testing: For developer use only. Boolean used to indicate to use
          testing data.  'make_tidy_segments()'

Value:

     No return value

Examples:

     write_wrangled_video_data(input_course = 'psyc1', cleaned_data=start_end_df)
     

13.1.5 prepare_video_data

13.1.5.1 Main Documentation

Converts columns into proper variable types and adds additional columns
with video information

Description:

     Additional columns added:

     - 'max_stop_times': proxy for video length

     - 'course_order': occurrence of video within course structure

     - 'index_chapter': occurrence of chapter within course structure

     - 'chapter_name': name of chapter

Usage:

     prepare_video_data(video_data, video_axis)
     
Arguments:

video_axis: A dataframe containing course structure information.
          Contains columns course_order, index_chapter, chapter_name

    data: Raw input dataframe to be transformed. 'data' is obtained
          through 'obtain_raw_video_data()'

Value:

     'prepared_data': The prepared data with converted variable types
     and extra columns

Examples:

     prepare_video_data(data)
     

13.1.6 get_start_end_df

13.1.6.1 Main Documentation

Obtains start and end times for video events

Description:

     Parses dataframe and adds columns 'start' and 'end' showing the
     start and end time that a user watched a video

Usage:

     get_start_end_df(data)
     
Arguments:

    data: Dataframe containing tracklog data of students. This is
          obtained typically through 'prepare_video_data()'

Value:

     'start_end_df': Original dataframe with 'start' and 'end' columns

Examples:

     get_start_end_df(data = data)
     

13.1.7 get_watched_segments

13.1.7.1 Main Documentation

Returns original dataframe with segment columns

Description:

     Returns original dataframe with segement columns. Segment columns
     are 0 if the segment is not located within the start and end
     values and 1 otherwise.

Usage:

     get_watched_segments(data)
     
Arguments:

    data: Dataframe containing start and end columns. This dataframe is
          typically obtained through 'get_start_end_df()'

Value:

     'data': Original input dataframe with new segment columns

Examples:

     get_watched_segments(data = start_end_df)
     

13.1.8 make_tidy_segments

13.1.8.1 Main Documentation

Returns tidy (more useable) version of input dataframe

Description:

     Returns a tidy, more usable, version of the input dataframe.
     Segment information is converted into a single column using
     'gather()'

Usage:

     make_tidy_segments(data)
     
Arguments:

    data: Dataframe containing segment information. This dataframe is
          typically obtained through 'get_watched_segments()'

Value:

     'data': Tidy version of input dataframe.

Examples:

     make_tidy_segments(data = start_end_df)
     

13.1.9 check_integrity

13.1.9.1 Main Documentation

Checks to make sure start and end data passes sanity checks

Description:

     Returns a boolean of whether or not start and end data makes
     sense. This checks for NA values, end times that are passed the
     maximum length of the video, and extremely long and short watch
     durations.  The threshold for watch durations can be adjusted in
     the global constants: 'MIN_DURATION' and 'MAX_DURATION'

Usage:

     check_integrity(start, end, max_stop_position)
     
Arguments:

   start: Time into video that the user has started watching the video

     end: Time into the video that the user has stopped watching the
          video

max_stop_position: Length of the video being watched

Value:

     'integrity': Boolean of whether or not the data passes integrity
     checks

Examples:

     check_integrity(start, end, max_stop_position)
     

13.1.10 get_end_time

13.1.10.1 Main Documentation

Calculates video end time for non-video events using time stamps

Description:

     Calculates video end time for non-video events using time stamps

Usage:

     get_end_time(start, time, time_ahead, latest_speed)
     
Arguments:

   start: Time into video that the user has started watching the video

    time: Time stamp of when the user started watching the video

time_ahead: Time stamp of next event following the play event

latest_speed: The speed at which the user was watching the video

Value:

     'end': Time into video that the user has stopped watching

Examples:

     get_end_time(start, time, time_ahead, latest_speed)
     

13.1.11 get_mode

13.1.11.1 Main Documentation

Obtain most common value from list

Description:

     Obtain most common value from list

Usage:

     get_mode(x)
     
Arguments:

       x: List containing integer values

Value:

     'mode': The most common value within the list

Examples:

     get_mode(x=c(0,1,2,2,2,3))
     

13.2 Server Functions

13.2.1 get_aggregated_df

13.2.1.1 Main Documentation

Aggregates dataframe by video and segment

Description:

     Aggregates input dataframe by video (video_id) and segment
     (min_into_video). Additionally, adds columns:

     - 'unique_views'/'`Students`' (number of learners who started the
     video),

     - 'watch_rate'/'`Views per Student`' (number of students who have
     watched the segment divided by unique_views),

     - 'avg_watch_rate' (average of watch_rate per video)

     - 'high_low' ('High Watch Rate', 'Low Watch Rate, or 'Normal')

     - 'up_until' (1 if the average learner had watched up until the
     particular min_into_video, 0 if they had not)

Usage:

     get_aggregated_df(filt_segs, top_selection)
     
Arguments:

filt_segs: Dataframe containing students that have been filtered by
          selected demographics. Typically obtained via
          'filter_demographics()'

top_selection: Value of the number of top segments to highlight.

Value:

     'aggregate_segment_df': Aggregated dataframe with additional
     columns

Examples:

     get_aggregated_df(filt_segs, 25)
     

13.2.1.2 Additional Notes:

  • This function will read the filtered data frame version of the output csv file from wrangle_video. As an example, this function can be used in the following way:

    tidy_segment_df <- read_csv("path/to/course/wrangled_video_heat.csv")
    filt_segs <- filter_demographics(tidy_segment_df)
    aggregated_df <- get_aggregated_df(filt_segs, 10)
  • The high_low segment classification is based off a linear model (using lm) using the following features:
  • course_order: Index of the video arranged by course structure
  • min_into_video: How far into the video the segment is
  • The up_until variable is simply obtained by looking at the maximum time that a video_stop event had occurred. As a consequence, if many students frequently skip to the end of the video without watching anything in between, this statistic may be misinterpreted. There are plans to change this in the future as it is very doable

13.2.2 get_ch_markers

13.2.2.1 Main Documentation

Obtains locations of chapter lines to be placed on visualizations

Description:

     Obtains locations of chapter lines to be placed on visualizations

Usage:

     get_ch_markers(filt_segs)
     
Arguments:

filt_segs: Dataframe containing students that have been filtered by
          selected demographics. Typically obtained via
          'filter_demographics()'

Value:

     'ch_markers': List of values of where to place chapter lines on
     visualizations

Examples:

     get_ch_markers(filt_segs)
     

13.2.3 get_video_lengths

13.2.3.1 Main Documentation

Obtains dataframe with length of videos

Description:

     Obtains dataframe with length of videos

Usage:

     get_video_lengths(filt_segs)
     
Arguments:

filt_segs: Dataframe containing students that have been filtered by
          selected demographics. Typically obtained via
          'filter_demographics()'

Value:

     'vid_lengths': Dataframe with the video lengths associated with
     each video ID.

Examples:

     get_video_lengths(filt_segs)
     

13.2.4 get_summary_table

13.2.4.1 Main Documentation

Obtains locations of chapter lines to be placed on visualizations

Description:

     Obtains locations of chapter lines to be placed on visualizations

Usage:

     get_ch_markers(filt_segs)
     
Arguments:

filt_segs: Dataframe containing students that have been filtered by
          selected demographics. Typically obtained via
          'filter_demographics()'

Value:

     'ch_markers': List of values of where to place chapter lines on
     visualizations

Examples:

     get_ch_markers(filt_segs)
     

13.2.5 get_video_comparison_plot

13.2.5.1 Main Documentation

Obtains heatmap plot comparing videos against each other

Description:

     Obtains heatmap plot comparing videos against each other

Usage:

     get_video_comparison_plot(filtered_segments, module, filtered_ch_markers)
     
Arguments:

filtered_segments: Dataframe of segments and corresponding watch counts
          filtered by demographics

  module: String of module (chapter) name to display

filtered_ch_markers: List of values containing locations of where to
          put chapter markers

Value:

     'g': ggplot heatmap object

Examples:

     get_video_comparison_plot(filtered_segments, module, filtered_ch_markers)
     

13.2.6 get_segment_comparison_plot

13.2.6.1 Main Documentation

Obtains heatmap plot comparing segments against each other

Description:

     Obtains heatmap plot comparing segments against each other

Usage:

     get_segment_comparison_plot(filtered_segments, module, filtered_ch_markers)
     
Arguments:

filtered_segments: Dataframe of segments and corresponding watch counts
          filtered by demographics

  module: String of module (chapter) name to display

filtered_ch_markers: List of values containing locations of where to
          put chapter markers

Value:

     'g': ggplot heatmap object

Examples:

     get_segment_comparison_plot(filtered_segments, module, filtered_ch_markers)
     

13.2.7 get_top_hotspots_plot

13.2.7.1 Main Documentation

Obtains heatmap with segments of highest watch rate highlighted

Description:

     Obtains heatmap with segments of highest watch rate highlighted

Usage:

     get_top_hotspots_plot(filtered_segments, module, filtered_ch_markers)
     
Arguments:

filtered_segments: Dataframe of segments and corresponding watch counts
          filtered by demographics

  module: String of module (chapter) name to display

filtered_ch_markers: List of values containing locations of where to
          put chapter markers

Value:

     'g': ggplot heatmap object

Examples:

     get_top_hotspots_plot(filtered_segments, module, filtered_ch_markers)
     

13.2.7.2 Additional Notes:

  • This function is no longer used the plot was discarded after usability testing.

13.2.8 get_high_low_plot

13.2.8.1 Main Documentation

Obtains heatmap plot highlighting which segments have abnormally high
or low watch rates

Description:

     Obtains heatmap plot highlighting which segments have abnormally
     high or low watch rates

Usage:

     get_high_low_plot(filtered_segments, module, filtered_ch_markers)
     
Arguments:

filtered_segments: Dataframe of segments and corresponding watch counts
          filtered by demographics

  module: String of module (chapter) name to display

filtered_ch_markers: List of values containing locations of where to
          put chapter markers

Value:

     'g': ggplot heatmap object

Examples:

     get_high_low_plot(filtered_segments, module, filtered_ch_markers)
     

13.2.8.2 Additional Notes:

  • This function returns a plot where segments with abnormally high and low watch rates are highlighted.
  • “High” and “low” watch rates are determined by the residuals from a linear model obtained via lm
  • Please see source code and documentation for get_aggregated_df for more details.

13.2.9 get_up_until_plot

13.2.9.1 Main Documentation

Obtains heatmap plot highlighting which segment has been watched up
until on average

Description:

     Obtains heatmap plot highlighting which segment has been watched
     up until on average

Usage:

     get_up_until_plot(filtered_segments, module, filtered_ch_markers)
     
Arguments:

filtered_segments: Dataframe of segments and corresponding watch counts
          filtered by demographics

  module: String of module (chapter) name to display

filtered_ch_markers: List of values containing locations of where to
          put chapter markers

Value:

     'g': ggplot heatmap object

Examples:

     get_up_until_plot(filtered_segments, module, filtered_ch_markers)
     

13.2.9.2 Additional Notes:

  • This function returns a plot in which segments are highlighted up until the average maximum stop time per student.
  • It should be noted that this diagram may be misleading. Please see documentation for get_aggregated_df for more details.

13.2.10 get_rank

13.2.10.1 Main Documentation

Returns the ranking of a vector x

Description:

     Returns the ranking of a vector x

Usage:

     get_rank(x)
     
Arguments:

       x: A vector of numeric values

Value:

     'g': The ranking of the values within x

Examples:

     get_rank(c(10, 20, 20, 22, 5))
     

13.2.10.2 Additional Notes:

  • This function returns a data frame in which the the duration watched per minute video is calculated.
  • This is calculated by (average time spent on video (minutes) by all learners who have started the video)/(length of video (minutes))
  • It should be noted that the average time spent on video is calculated via the count in which the segment has been watched multiplied by the segment length. As a result, if users are consistently only watching 3 seconds of a 20 second segment, this number may be artificially inflated. This is because if a student watches more than 1 second of a segment, it will count as a “view”/“count” of the segment. This 1 second threshold can be adjusted via adjusting the global constant MIN_DURATION found in the video_wrangling.R file.