RMIT finds two directions for the Text Analytics Pipeline (TAP) tool

RMIT has been exploring the use of the Text Analytics Pipeline (TAP) in two main directions.

Firstly, there is strong appetite to use TAP by support staff from the Study and Learning Centre, where study and learning advice is provided to students. TAP provides formative assistance that students can interact with and complement the feedback obtained by tutors.

Secondly, TAP has stimulated research to understand student feedback obtained through university surveys. RMIT Marketing will also be using TAP as an engine to understand the data obtained through focus groups. This team works with ethnographic research and translates it into insights, providing a useful case study for TAP.

Building a bespoke language model for student feedback

In the course of investigating the use of Natural Language Processing and machine learning tools to better extract and navigate our student comments data, it became apparent that many tools have some shortcomings when used on this type of text. Such writing has a very specific style, vocabulary and context which can hinder the effectiveness of generic tools and pre-trained machine learning models. Given our access to a large corpus of student feedback comments, we decided to turn the problem around and see if we could use our data to build new tools.

One of the issues at the very foundation of Natural Language Processing is how to represent words and groupings of words in a format to which mathematical algorithms can be applied. One such representation developed at Google is the Word2Vec model, which represents words as “high dimensional vectors”. That is, each word is represented by a set of numbers (usually a couple of hundred) and these numbers define how a word fits in with other words in the language.  This representation is found entirely from the words and their placement relative to other terms in the corpus of text used to train the model. It is a purely machine learning and data-driven approach with no input knowledge or rules about the language itself.

To train our own Word2Vec model we used 250,000 student comments from the past 4 years of subject-level student feedback survey data.  As a generic comparison model, we used a popular open-source Word2Vec model available in the Python Natural-Language-Tool-Kit (NLTK) package. This model was trained on a 100 Billion word corpus of Google News stories. While this is far larger than our bespoke model it is also less domain specific and we will demonstrate the effect of this domain-specific aspect in the examples below.

The mathematical representation of words in Word2Vec makes it straight-forward to compute a measure of the similarity between words (or terms) and we will use some examples of this computed similarity to make our comparison between models.

Table 1: Comparison of the most similar terms computed from the Word2Vec models to the terms “occupation” and “real”. This example clearly demonstrates an advantage of a domain specific model. In a university setting the correct context for “occupation” is as a reference to a job or work. This context is correctly picked up in our bespoke model. However, the more generic model relates these terms to military occupations with the most similar words being things like imperialism, tyranny, war etc.

Rank Most similar terms
(Our Model: trained on 250,000 comments from the Student Feedback Survey)
Most similar terms
(Comparison Model: trained on 100 Billion Words from Google News)
1 future_job occupation
2 work_environment war
3 work_place oppression
4 project_manager imperialism
5 legal_practitioner subjugation
6 workplaces genuine
7 may_face tyranny
8 working_environment imperialist
9 real_situation actual
10 global_business oppressors
11 real_world colonialism
12 care_women colonialist
13 intended_profession liberation
14 customers dispossession
15 real_project disengagement
16 managers postwar
17 insight_real_world Zionism
18 planners profoundest
19 humanity militarism
20 professional_career invasion

 

Table 2: In the case of less ambiguous terms, it could still be argued that the bespoke model can be superior to the larger but more generic one. In this case, we search for terms similar to “lecture” and “authentic”. While both models give good in-context results some more detailed and specific terms are returned by the purpose trained model (e.g. real-world_cases, class_debates, story_telling) which make sense in the context of student feedback.

Rank Most similar terms
(Our Model: trained on 250,000 comments from the Student Feedback Survey)
Most similar terms
(Comparison Model: trained on 100 Billion Words from Google News)
1 interaction_class lectures
2 real-world_cases authentically
3 class_debates lecture
4 practice_theory lectures
5 story_telling presentation
6 easy_remember contemporary
7 economic_models colloquium
8 us_think_critically authenticity
9 pragmatic seminar
10 class_discussion informative
11 lecture_tutorial presentations
12 indigenous_perspective symposium
13 relaxed_learning_environment enlightening
14 throughout_lecture seminar
15 active_learning oration
16 robust travelogue
17 inviting sermon
18 worldly intellectuality
19 real-world_experience storyteller
20 lectures_bit_boring timeless

 

 

A new tool for navigating feedback from student surveys

One of the three priorities for the HETA project is to progress technology and analytics for extracting information and insight from student comments obtained from feedback surveys. In general, the “quantitative and categorical data” and the “free text comments” from student surveys are analysed independently. The first often in terms of aggregate statistics and the later simply read by interested parties. A joint analysis that merges the quantitative (scale-from-one-to-five type) responses and the richer specific information in the associated open-ended text responses allows a more powerful and nuanced extraction of information.

As a first step toward this more integrated approach, we have developed an online navigation tool for comments from the UTS Student Feedback Survey. This tool joins together, for the first time at UTS, the quantitative and free-text responses and provides an assortment of filtering and navigation utilities.  This tool brings together the last 4 years of student feedback on their experience in individual subjects and contains a total of over a quarter-of-a-million written student comments.

Screenshot of part of the filtered and sorted output from Student Feedback Comment Navigator. It provides the student comments for both the standard free-text-response survey questions juxtaposed with their quantitative responses. The results are listed and enumerated in order and can be scrolled through.  It also provides the functionality to sort the comments by the score given to any of the quantitative questions.
Screenshot of part of the filtered and sorted output from Student Feedback Comment Navigator. It provides the student comments for both the standard free-text-response survey questions juxtaposed with their quantitative responses. The results are listed and enumerated in order and can be scrolled through. It also provides the functionality to sort the comments by the score given to any of the quantitative questions.