Co-designing automated feedback on reflective writing with the teacher

Building on a previous co-design session, Ming Liu (writing analytics research fellow) and Simon Buckingham Shum (project lead) recently ran a follow-up session with Cherie Lucas (Discipline of Pharmacy, Graduate School of Health). The task was to design the first version of the Feedback Tab in AcaWriter, for reflective writing.

The AcaWriter screen looked like this:

Zooming in, the feedback looked like this (click to enlarge), with sentences annotated using icons and font:

(Learn more about the underlying model of textual features, and a study to evaluate initial student reactions to it.)

PhD work by Shibani Antonette has added a new Feedback Tab for other genres of student writing in Law (essays) and  Accounting (business analyses), while Sophie Abel has designed feedback for PhD students’ on their research abstracts. As you can see from those examples, in addition to the Analytical Report Tab which annotates sentences in the student’s text, the Feedback Tab gives explicit summaries about the meaning of the highlighting, and suggesting what the student might do to improve their draft. Here’s an example from Law:

So, this is what we needed to do for reflective writing. The task was to define a set of rules, which will trigger feedback advice to students given the presence or absence of particular features. The 2 hour design session was set up as shown below, with a Google Doc template on the left screen, and AcaWriter on the right:

We switched attention continually between these as we worked through the different feature permutations that might be significant, which is what the  template scaffolded:

For a given feature (col.1), we considered what should be said to the student if it appeared (col.2) or was missing (col.3). You can also see that more complex patterns emerged:

Presence of one feature but absence of another:

(triangle without square) While it appears that you’ve reported on how you would change/prepare for the future, you don’t seem to have described your thoughts, feelings and/or reactions to an incident, or learning task.

(triangle without preceding circle) While it appears that you’ve reported on how you would change/prepare for the future, you don’t seem to have reported first on what you found challenging. Perhaps you’ve reflected only on the positive aspects in your report?

Repeated feature in successive sentences:

(double circles) Well done, it appears that you may have expanded the detail on the challenge you faced.

(double triangles) Well done, it appears that you have expanded the detail on how you would change/prepare for the future.

Location-specific features:

(triangle in para1) It appears that you have reflected on this very early on. Please ensure that you recap this in your conclusion about the outcomes of your reflection.

Note the qualified tone of the feedback: it appears that you have… you don’t seem to have… Writing is so complex that the machine will undoubtedly get things wrong (something we’ve quantified – as one measure of quality). However, as we’ve argued elsewhere, it may be that imperfect analytics have specific uses for scaffolding higher order competencies in students.

After 2 hours, we had a completed template, which we could hand over to our developer to be implemented. The Feedback Tab is no longer empty…

Header on all feedback:

Encouraging feedback when features are present:

Cautionary feedback when features are absent:

To summarise, co-design means giving voice and influence to the relevant stakeholders in the design process. Too often, it feels to academics and teachers as though they’re doing all the adjusting to educational technology products, rather than being able to shape them. Since we have complete control over our writing analytics infrastructure (and so can you, since we’ve released it open source), academics can shape the functionality and user experience of the tool in profound ways.

Ultimately, we need to build infrastructure that educators and students trust, and there are many ways to tackle this, co-design being just one.

How do students respond to this automated feedback? Trials are now being planned… We’ll let you know!…

Writing analytics: online training in rhetorical parsing

As part of building training resources to upskill the teams in different kinds of text analytics, UTS:CIC’s Honorary Associate Dr Ágnes Sándor (Naver Labs Europe) has been running approx monthly online sessions. In these, Ágnes introduces the  concept matching model, and its rule-based implementation in the open source Athanor server. This powers AcaWriter’s ability to detect academic rhetorical moves, which in combination with other text analysis services in TAP, is being tuned and evaluated with student writing of different sorts.

Check out the replays and presentation and exercise slides.

How can writing analytics researchers rapidly co-design feedback with educators?

This week we successfully piloted a new collaborative design methodology, designed to enable rapid design iterations through conversation between educators and text analytics teams. Two of the HETA project team, Andrew Gibson (technical lead) and Simon Buckingham Shum (project lead), worked with two UTS academics who use reflective writing as part of their teaching, Cherie Lucas (Discipline of Pharmacy, Grad. School of Health) and Adrian Kelly (Faculty of Engineering & IT).

The AcaWriter tool has a parser for the genre of reflective writing, which identifies a range of features that have been established as hallmarks of good writing (see right panel). The goal was to improve the performance of the first category shown, which depends on detection of emotion and affect. This was poorly calibrated in an early version, which meant that it was generating far too many ‘false positives’, that is, highlighting words that were not genuine cases, leading to visual noise and undermining the tool’s trustworthiness. An extract is shown below, where it is clear that many of the red underlined words are spurious:

This is due to the fact that dimensional data in the lexicon was indexed against a model of arousal, valence and dominance (Warriner,Kuperman & Brysbaert, 2013) whose values had not been calibrated for this context. At present this feature has been removed from the user interface, but it needed to be restored in an acceptable form.

The academics brought samples of student writing to use as test cases, and the writing analytics team used a Jupyter notebook (get the code) to rapidly prototype different thresholds for the emotion and affect module in TAP.

We spent a morning working through the examples, examining the words that were identified as we varied the thresholds on different dimensions. The results are now being implemented in AcaWriter, for review by the educators.

The key insights gained from a morning’s work cover both process and product:

The academics could work with the Jupyter notebook output. As a prototyping development tool it is far from pretty for non-technical users:

But with Andrew explaining what they were seeing, the whole group got into a ‘rhythm’ of loading a new document, testing thresholds for arousal, valence and dominance, reviewing the words, and agreeing on what seemed to be the optimum values. Over time, it became clear that only one dimension needed to be manipulated to make useful differences.

Rapid design iterations. As a writing analytics team, we were excited that the process we had envisaged did indeed enable rapid code iterations through focused conversations. This wold have taken many emails or less focused meetings, if the parser was not evolving ‘in the moment’.

Pedagogy and text analytics interwoven. We were also excited to see how conversation between the educators about how AcaWriter was being used, or might be used, might interweave around the coding. For instance, as it became clearer how the parser’s performance might impact the student experience, there was common agreement on the importance of erring on the side of caution, setting the thresholds higher at the acceptable price of increasing the false negatives). It also emerged that there were too many false positives for words not also embedded in sentences that were making one of the three rhetorical moves attended to in the tool (signified by the icons), thus providing a filtering rule that could be codified in the user interface.

The educators valued being involved, and seeing ‘under the hood’. The educators greatly appreciated being involved in this process, feeling that they now understood in much greater depth how the tool worked. They could see that their input was directly shaping the engine powering the vehicle: to pursue the metaphor, having been inducted into the world of the engineers, their understanding now went beyond the normal dashboard offered to drivers, as did their influence over the vehicle’s performance.

The Jupyter notebook evolved to support this process. As the workshop proceeded, we made a range of small changes to the way the notebook displayed its output, to assist the specific task at hand of reviewing the values assigned to words in the text. More ideas were generated which can now be implemented to enhance the experience. For example, it became apparent during the process that it might be helpful if analytics from different texts (Good reflection vs Poor reflection) could be compared.

The ergonomics of facilitating participatory design. The physical setup of a space can enhance or obstruct the intended process.

We started out as shown on the left, but could not see or access all the materials, so switched to the second setup. It proved effective to have the notebook up on the wall display as the primary working screen, and AcaWriter on a second display in order to see its output for a sample text. In the future I’d have this on a second, larger display, positioned under the main display to facilitate switching and gesturing.

It really helped to provide paper copies of the texts, plus highlighter pens for the educators to annotate each piece as they familiarised themselves with it, and for comparison with the two machine outputs. These shots show the kind of interaction that this setup afforded.

To conclude, this pilot has provided us with convincing evidence that this methodology worked as a practical way to tune a writing feedback tool in close dialogue with the educators who will use it in their practice. We broke for lunch, and had hoped to implement the changes in AcaWriter in the afternoon, for review by end of the day. We didn’t quite get there on this occasion, but will be reviewing the new version with the academics soon.

No doubt this process can be improved further, so we welcome your comments, and encourage you to adopt/adapt this to your own contexts.