Friday, September 20, 2013

National Science Foundation funds research that puts engineering design processes under a big data "microscope"

The National Science Foundation has awarded us $1.5 million to advance big data research on engineering design. In collaboration with Professors Şenay Purzer and Robin Adams at Purdue University, we will conduct a large-scale study involving over 3,000 students in Indiana and Massachusetts in the next five years.

This research will be based on our Energy3D CAD software that can automatically collect large process data behind the scenes while students are working on their designs. Fine-grained CAD logs possess all four characteristics of big data defined by IBM:
  1. High volume: Students can generate a large amount of process data in a complex open-ended engineering design project that involves many building blocks and variables; 
  2. High velocity: The data can be collected, processed, and visualized in real time to provide students and teachers with rapid feedback; 
  3. High variety: The data encompass any type of information provided by a rich CAD system such as all learner actions, events, components, properties, parameters, simulation data, and analysis results; 
  4. High veracity: The data must be accurate and comprehensive to ensure fair and trustworthy assessments of student performance.
These big data provide a powerful "microscope" that can reveal direct, measurable evidence of learning with extremely high resolution and at a statistically significant scale. Automation will make this research approach highly cost-effective and scalable. Automatic process analytics will also pave the road for building adaptive and predictive software systems for teaching and learning engineering design. Such systems, if successful, could become useful assistants to K-12 science teachers.

Why is big data needed in educational research and assessment? Because we all want students to learn more deeply and deep learning generates big data.

In the context of K-12 science education, engineering design is a complex cognitive process in which students learn and apply science concepts to solve open-ended problems with constraints to meet specified criteria. The complexity, open-endedness, and length of an engineering design process often create a large quantity of learner data that makes learning difficult to discern using traditional assessment methods. Engineering design assessment thus requires big data analytics that can track and analyze student learning trajectories over a significant period of time.
Deep learning generates big data.

This differs from research that does not require sophisticated computation to understand the data. For example, in typical pre/post-tests using multiple-choice assessment, the selection data of individual students are directly used as performance indices -- there is basically no depth in these self-evident data. I call this kind of data usage "data picking" -- analyzing them is just like picking up apples already fallen to the ground (as opposed to data mining that requires some computational efforts).

Process data, on the other hand, contain a lot of details that may be opaque to researchers at first glance. In the raw form, they often appear to be stochastic. But any seasoned teacher can tell you that they are able to judge learning by carefully watching how students solve problems. So here is the challenge: How can computer-based assessment accomplish what experienced teachers (human intelligence plus disciplinary knowledge plus some patience) can do based on observation data? This is the thesis of computational process analytics, an emerging subject that we are spearheading to transform educational research and assessment using computation. Thanks to NSF, we are now able to advance this subject.

No comments: