DiscourseDB - Bridging discourse across platforms

As part of a broader effort to provide tools for enabling research and practice in the space of collaborative and Discussion based learning, DiscourseDB is an NSF funded data infrustructure project designed to bridge data sources from multiple platforms for hosting those learning experiences. Our vision is to provide a common data model designed to accommodate data from diverse sources including but not limited to Chat, Threaded Discussions, Blogs, Twitter, Wikis, and Text messaging.

We will make available analytics components related to constructs including role taking, help exchange, collaborative knowledge construction, showing openness, taking an authoritative stance, attitudes, confusion, alliance and opposition. In enabling application of such metrics across datasets from multiple platforms, research questions related to the mediating and moderating effect of these process and state measures on information transfer, learning, and attrition can be conducted, building on the earlier research of our team.

Current Capabilities

We have a few publically available datasets:

Openfl: consisting of online discussion of bugs and features in a set of related open source software projects, OpenFL.
Crito: The Crito dialogues by Plato, in English and Ancient Greek (from Project Gutenberg

Other datasets are available to researchers by request, subject to IRB approval.

These datasets can be viewed in the Data browser. Researchers can create their own annotations on this data using an integrated installation of the Brat annotation tool, and apply machine learning techniques to generalize these labels using LightSide.

Researchers can also query DiscourseDB datasets as part of Learnsphere’s Tigris workflows, apply its growing infrastructure of analyses to discourse data, and perform combined analyses with other data products under the Learnsphere umbrella.

Next Steps

This fall (Nov-Dec 2018) our priorities are: * Importing from a standardized but flexible CSV format, useful for one-time imports without having to write Java code * Batch imports for faster importing of large data * Import of live data streams for analysis of live evolving discourse

Research and Development Team

Carolyn P. Rose, Carnegie Mellon University
Chris Bogart, Carnegie Mellon University
Oliver Ferschke, Carnegie Mellon University

Documentation

How to annotate data
How to generalize annotations
DiscourseDB Wiki

Funding

The National Science Foundation

View on GitHub