## Sunday, December 22, 2013

### Visual analytics for uncovering complex learning: Part I

All educational research and assessment are based on inference from evidence. Evidence is constructed from learner data. The quality of this construction is, therefore, fundamentally important. Many educational measurements have relied on eliciting, analyzing, and interpreting students' constructed responses to assessment questions. New types of data may engender new opportunities for improving the validity and reliability of educational measurements. In this series of articles, I will show how graph theory can be applied to educational research.

The process of inquiry-based learning with an interactive computer model can be imagined as a trajectory of exploring in the problem space spanned by the user interface of the model. Students use various widgets to control different variables, observe the corresponding emergent behaviors, take some data, and then reason with the data to draw a conclusion. This sounds obvious. But exactly how do we capture, visualize, and analyze this process?

From the point of view of computational science, the learning space is enormous: If we have 10 controls in the user interface and each control has five inputs, there are potentially 100,000 different ways of interacting with the model, assuming that the user interacts with each control once and only once. To be able to tackle a problem of this magnitude, we can use some mathematics. Graph theory is a trick that we are building into our process analytics. The publication of Leonhard Euler's Seven Bridges of Königsberg in 1736 is commonly considered as the birth of graph theory.

 Figure 1: A learning graph made of two subgraphs representing two ideas.
In graph theory, a graph is a collection of vertices connected by edges: G = (V, E). When applied to learning, a vertex represents an indicator that may be related to certain competency of a student, which can be logged by software. An edge represents the transition from one indicator to another. We call a graph that represents a learning process as a learning graph.

A learning graph is always a digraph G = (V, A) -- namely, it always has directed edges or arrows -- because of the temporal nature of learning. Most likely, it is a multigraph that has multiple directed edges between one or more than one pair of vertices (it is sometimes called a multidigraph) because the student often needs multiple transitions between indicators to learn their connections. A learning graph often has loops, edges that connect back to the same vertex, because the student may perform multiple actions related to an indicator consecutively before making a transition. Figure 1 shows a learning graph that includes two sets of indicators, each for an idea.

 Figure 2. The adjacency matrix of the graph in Figure 1.
The size of a learning graph is defined as the number of its arrows, denoted by |A(G)|. The size approximately represents the number of actions the student takes during learning. The multiplicity of an arrow is the number of multiple arrows sharing the same vertices. The multiplicity of a graph is the maximum multiplicity of its arrows. The multiplicity represents the most frequent transition between two indicators in a learning process. The degree dG(v) of a vertex v in a graph G is the number of edges incident to v, with loops being counted twice. A vertex of degree 0 is an isolated vertex. A vertex of degree 1 is a leaf. The degree of a vertex represents the times the action related to the corresponding indicator is performed. The maximum degree Δ(G) of a graph G is the largest degree over all vertices; the minimum degree δ(G), the smallest.

The distance dG(u, v) between two vertices u and v in a graph G is the length of a shortest path between them. When u and v are identical, their distance is 0. When u and v are unreachable from each other, their distance is defined to be infinity ∞. The distance between two indicators may reveal how the related constructs are connected in the learning process.

 Figure 3. A more crosscutting learning trajectory between two ideas.
Two vertices u and v are called adjacent if an edge exists between them, denoted by u ~ v. The square adjacency matrix is a means of representing which vertices of a graph are adjacent to which other vertices. Figure 2 is the adjacency matrix of the graph in Figure 1, the trace (the sum of all the diagonal elements in the matrix) of which represents the number of loops in the graph. Having known the adjacency matrix, we can apply the spectral graph theory to study the properties of a graph in relationship to the characteristic polynomial, eigenvalues, and eigenvectors of the matrix (because the adjacency matrix of a learning graph is a digraph, the eigenvalues are often complex numbers). For example, the eigenvalues of the adjacency matrix may be used to reduce the dimensionality of the dataset into clusters.

 Figure 4. The adjacency matrix of the graph in Figure 3.
How might learning graphs be useful for analyzing student learning? Figure 3 gives an example that shows a different behavior of exploration between two ideas (such as heat and temperature or pressure and temperature). In this hypothetical case, the student has more transitions between two subgraphs that represent the two ideas and their indicator domains. This pattern can potentially result in better understanding of the connections between the ideas. The adjacency matrix shown in Figure 4 has different block structures than that shown in Figure 2: The blocks A-B and B-A are much sparser in Figure 2 than in Figure 4. The spectra of these two matrices may be quite different and could be used to characterize the knowledge integration process that fosters the linkage between the two ideas.

Go to Part II.

## Saturday, December 14, 2013

### Season's greetings from Energy2D

I have been so swamped in fund raising these days that I haven't been able to update this blog for more than two months. Since it is the time of the year again, I thought I should just share a holiday video made by Matthew d'Alessio, a professor at California State University Northridge, using our signature software Energy2D.

The simulator currently attracts more than 5,000 unique visitors each month, a number that probably represents a sizable portion of engineering students studying the subject of heat transfer on the planet. Over the past year, I have received a lot of encouraging emails from Energy2D's worldwide users. Some of them even compared it with well-known engineering programs. Franco Landriscina at the University of Trieste has written Energy2D into his recent Springer book "Simulation and Learning: A Model-Centered Approach."

I am truly grateful for these positive reactions. I want to say "Thank You" for all your nice words. There is nothing more rewarding than hearing from you on this fascinating subject of fluid dynamics and heat transfer. Rest assured that the development of this program will resume irrespective of its funding. In 2014, I hope to come up with a better radiation solver, which I have been thinking for quite a long time. It turns out that simulating radiation is much more difficult than simulating convection!

Here is a tutorial video in Spanish made by Gabriel Concha.

## Wednesday, October 9, 2013

### Molecular modelers won Nobel Prize in Chemistry

Martin Karplus, Michael Levitt, and Arieh Warshel won the 2013 Nobel Prize For Chemistry today "for the development of multiscale models for complex chemical systems."

The Royal Swedish Academy of Sciences said the three scientists' research in the 1970s has helped scientists develop programs that unveil chemical processes. "The work of Karplus, Levitt and Warshel is ground-breaking in that they managed to make Newton's classical physics work side-by-side with the fundamentally different quantum physics," the academy said. "Previously, chemists had to choose to use either/or." Together with a few earlier Nobel Prizes in quantum chemistry, this award consecrates the field of computational chemistry.

Incidentally, Martin Karplus is my postdoc co-adviser Georgios Archontis's thesis adviser at Harvard. Georgios is one of the earlier contributors to CHARMM, a widely-used package of computational chemistry. CHARMM was the computational tool that I used when working with Georgios almost 15 years ago. In collaboration with Martin, Georgios and I were studying glycogen phosphorylase inhibitors based on a free energy perturbation analysis using CHARMM. In another project with Spyros Skourtis, I wrote a multi-scale simulation program that couples molecular dynamics and quantum dynamics to study electron transfer in proteins and DNA molecules (i.e., use Newton's Equation of Motion to predict the trajectories of atoms, construct the Hamiltonian time series, and solve the time-dependent Schrodinger equation using the Hamiltonian series as the input).

We are thrilled by this news because much of the computational kernels of our Molecular Workbench software was actually inspired by CHARMM. The Molecular Workbench also advocates a multiscale philosophy and pedagogical approach, but for linking concepts at different scales with simulations in order to help students connect the dots and build more unified pictures about science (see the image above).

We are glad to be part of the "Karplus genealogy tree," as Georgios put it when replying my congratulatory email. We hope that through our grassroots work in education, the power of molecular simulation from the top of the scientific research pyramid will enlighten millions of students and ignite their interest and curiosity in science.

## Saturday, October 5, 2013

### Computational process analytics: Compute-intensive educational research and assessment

 Trajectories of building movement (good)
Computational process analytics (CPA) differs from traditional research and assessment methods in that it is not only data-intensive, but also compute-intensive. A unique feature of CPA is that it automatically analyzes the performance of student artifacts (including all the intermediate products) using the same set of science-based computational engines that students used to solve problems. The computational engines encompass every single details in the artifacts and their complex interactions that are highly relevant to the nature of the problems students solved. They also recreate the scenarios and contexts of student learning (e.g., the calculated results in such a post-processing analysis are exactly the same as those presented as feedback to students while they were solving the problems). As such, the computational engines provide holistic, high-fidelity assessments of students' work that no human evaluator can ever beat -- while no one can track numerous variables students might have created in long and deep learning processes in a short evaluation time, a computer program can easily do the job. Utilizing disciplinarily intelligent computational engines to do performance assessment was a major breakthrough in CPA as this approach really has the potential to revolutionize computer-based assessment.

To give an example, this weekend I am busy running all the analysis jobs on my computer to process 1 GB of data logged by our Energy3D CAD software. I am trying to reconstruct and visualize the learning and design trajectories of all the students, projected onto many
different axes and planes of the state space. To do that, an estimate of 30-40 hours of CPU time on my Lenovo X230 tablet, which is a pretty fast machine, is needed. Each step loads up a sequence of artifacts, runs a solar simulation for each artifact, and analyzes the results (since I have automated the entire process, this is actually not as bad as it sounds). Our assumption is that the time evolution of the performance of these artifacts would approximately reflect the time evolution of the performance of their designers. We should be able to tell how well a student was learning by examining if the performance of her artifacts shows a systematic trend of improvement, or is just random. This is way better than the performance assessment based on just looking at students' final products.

After all the intermediate performance data were retrieved through post-processing the artifacts, we can then analyze them using our Process Analyzer -- a visual mining tool being developed to show the analysis results in various visualizations (it is our hope that the Process Analyzer will eventually become a powerful assessment assistant to teachers as it would free teachers from having to deal with an enormous amount of raw data or complicated data mining algorithms). For example, the two images in this post show that one student went through a lot of optimization in her design and the other did not (there is no trajectory in the second image).

## Friday, September 20, 2013

### National Science Foundation funds research that puts engineering design processes under a big data "microscope"

The National Science Foundation has awarded us \$1.5 million to advance big data research on engineering design. In collaboration with Professors Şenay Purzer and Robin Adams at Purdue University, we will conduct a large-scale study involving over 3,000 students in Indiana and Massachusetts in the next five years.

This research will be based on our Energy3D CAD software that can automatically collect large process data behind the scenes while students are working on their designs. Fine-grained CAD logs possess all four characteristics of big data defined by IBM:
1. High volume: Students can generate a large amount of process data in a complex open-ended engineering design project that involves many building blocks and variables;
2. High velocity: The data can be collected, processed, and visualized in real time to provide students and teachers with rapid feedback;
3. High variety: The data encompass any type of information provided by a rich CAD system such as all learner actions, events, components, properties, parameters, simulation data, and analysis results;
4. High veracity: The data must be accurate and comprehensive to ensure fair and trustworthy assessments of student performance.
These big data provide a powerful "microscope" that can reveal direct, measurable evidence of learning with extremely high resolution and at a statistically significant scale. Automation will make this research approach highly cost-effective and scalable. Automatic process analytics will also pave the road for building adaptive and predictive software systems for teaching and learning engineering design. Such systems, if successful, could become useful assistants to K-12 science teachers.

Why is big data needed in educational research and assessment? Because we all want students to learn more deeply and deep learning generates big data.

In the context of K-12 science education, engineering design is a complex cognitive process in which students learn and apply science concepts to solve open-ended problems with constraints to meet specified criteria. The complexity, open-endedness, and length of an engineering design process often create a large quantity of learner data that makes learning difficult to discern using traditional assessment methods. Engineering design assessment thus requires big data analytics that can track and analyze student learning trajectories over a significant period of time.
 Deep learning generates big data.

This differs from research that does not require sophisticated computation to understand the data. For example, in typical pre/post-tests using multiple-choice assessment, the selection data of individual students are directly used as performance indices -- there is basically no depth in these self-evident data. I call this kind of data usage "data picking" -- analyzing them is just like picking up apples already fallen to the ground (as opposed to data mining that requires some computational efforts).

Process data, on the other hand, contain a lot of details that may be opaque to researchers at first glance. In the raw form, they often appear to be stochastic. But any seasoned teacher can tell you that they are able to judge learning by carefully watching how students solve problems. So here is the challenge: How can computer-based assessment accomplish what experienced teachers (human intelligence plus disciplinary knowledge plus some patience) can do based on observation data? This is the thesis of computational process analytics, an emerging subject that we are spearheading to transform educational research and assessment using computation. Thanks to NSF, we are now able to advance this subject.

## Sunday, September 15, 2013

### Measuring the effects of an intervention using computational process analytics

"At its core, scientific inquiry is the same in all fields. Scientific research, whether in education, physics, anthropology, molecular biology, or economics, is a continual process of rigorous reasoning supported by a dynamic interplay among methods, theories, and findings. It builds understanding in the form of models or theories that can be tested."  —— Scientific Research in Education, National Research Council, 2002
 Actions caused by the intervention
Computational process analytics (CPA) is a research method that we are developing in the spirit of the above quote from the National Research Council report. It is a whole class of data mining methods for quantitatively studying the learning dynamics in complex scientific inquiry or engineering design projects that are digitally implemented. CPA views performance assessment as detecting signals from the noisy background often present in large learner datasets due to many uncontrollable and unpredictable factors in classrooms. It borrows many computational techniques from engineering fields such as signal processing and pattern recognition. Some of these analytics can be considered as the computational counterparts of traditional assessment methods based on student articulation, classroom observation, or video analysis.

 Actions unaffected by the intervention
Computational process analytics has wide applications in education assessments. High-quality assessments of deep learning hold a critical key to improving learning and teaching. Their strategic importance has been highlighted in President Obama’s remarks in March 2009: “I am calling on our nation’s Governors and state education chiefs to develop standards and assessments that don’t simply measure whether students can fill in a bubble on a test, but whether they possess 21st century skills like problem-solving and critical thinking, entrepreneurship, and creativity.” However, the kinds of assessments the President wished for often require careful human scoring that is far more expensive to administer than multiple-choice tests. Computer-based assessments, which rely on the learning software to automatically collect and sift learner data through unobtrusive logging, are viewed as a promising solution to assessing increasingly prevalent digital learning.

While there have been a lot of work on computer-based assessments for STEM education, one foundational question has rarely been explored: How sensitive can the logged learner data be to instructions?

 Actions caused by the intervention.
According to the assessment guru Popham, there are two main categories of evidence for determining the instructional sensitivity of an assessment tool: judgmental evidence and empirical evidence. Computer logs provide empirical evidence based on user data recording—the logs themselves provide empirical data for assessment and their differentials before and after instructions provide empirical data for evaluating the instructional sensitivity. Like any other assessment tools, computer logs must be instructionally sensitive if they are to provide reliable data sources for gauging student learning under intervention.

 Actions unaffected by the intervention.
Earlier studies have used CAD logs to capture the designer’s operational knowledge and reasoning processes. Those studies were not designed to understand the learning dynamics occurring within a CAD system and, therefore, did not need to assess students’ acquisition and application of knowledge and skills through CAD activities. Different from them, we are studying the instructional sensitivity of CAD logs, which describes how students react to interventions with CAD actions. Although interventions can be either carried out by human (such as teacher instruction or group discussion) or generated by the computer (such as adaptive feedback or intelligent tutoring), we have focused on human interventions in this phase of our research. Studying the instructional sensitivity to human interventions will enlighten the development of effective computer-generated interventions for teaching engineering design in the future (which is another reason, besides cost effectiveness, why research on automatic assessment using learning software logs is so promising).

The study of instructional effects on design behavior and performance is particularly important, viewing from the perspective of teaching science through engineering design, a practice now mandated by the newly established Next Generation Science Standards of the United States. A problem commonly observed in K-12 engineering projects, however, is that students often reduce engineering design challenges to construction or craft activities that may not truly involve the application of science. This suggests that other driving forces acting
 Distribution of intervention effect across 65 students.
on learners, such as hunches and desires for how the design artifacts should look, may overwhelm the effects of instructions on how to use science in design work. Hence, the research on the sensitivity of design behavior to science instruction requires careful analyses using innovative data analytics such as CPA to detect the changes, however slight they might be. The insights obtained from studying this instructional sensitivity may result in the actionable knowledge for developing effective instructions that can reproduce or amplify those changes.

Our preliminary CPA results have shown that CAD logs created using our Energy3D CAD tool are instructionally sensitive. The first four figures embedded in this post show two pairs of opposite cases with one type of action sensitive to an instruction that occurred outside the CAD tool and the other not. This is because the instruction was related to one type of action and had nothing to do with the other type. The last figure shows that the distribution of instructional sensitivity across 65 students. In this figure, the largest number means higher instructional sensitivity. A number close to one means that the instruction has no effect. From the graph, you can see that the three types of actions that are not related to the instruction fluctuate around one whereas the fourth type of action is strongly sensitive to the instruction.

These results demonstrate that software logs can not only record what students do with the software but also capture the effects of what happen outside the software.

## Wednesday, August 28, 2013

### Modeling the hydrophobic effect of a polymer

There are many concepts in biochemistry that are not as simple as they appear to be. These are things that tend to confuse you if you mull over them. Over the years, I have found osmosis such a thing. Another such thing is hydrophobicity. (As a physicist, I love these puzzles!)

 Figure 1: More "polar" solvent on the right.
In our NSF-funded Constructive Chemistry project with Bowling Green State University, Prof. Andrew Torelli and I have identified that the hydrophobic effect may be one of the concepts that would benefit the most from a constructionism approach, which requires students to think more deeply as they must construct a sequence of simulations that explain the origin of this elusive effect. Most students can tell you that hydrophobicity is "water-hating" as their textbooks simply have so written. But this layman's term itself is not accurate and might lend itself to a misconception as if there existed some kind of repulsive force between a solute molecule and the solvent molecules that makes them "hate" each other. An explanation of the hydrophobic effect involves quite a few fundamental concepts such as intermolecular potential and entropy that are cornerstones of chemistry. We would like to see if students can develop a deeper and more coherent understanding while challenged to use these concepts to create an explanatory simulation using our Molecular Workbench software.

Andrew and I spent a couple of weeks doing research and designing simulations to figure out how to make such a complex modeling challenge realistic for his biochemistry students to do. This blog post summarizes our initial findings.

 Figure 2. The radii of gyration of the two polymers.
First we decided that we would like to set this challenge on the stage of protein folding. There are few problems in biochemistry that are more fundamental than protein folding. So this would be a good brain teaser that could stimulate student interest. But protein folding is such a complex problem. So we would like to start with a simple 2D polymer that is made of identical monomers. This polymer is just a chain of Lennard-Jones particles linked by elastic bonds. The repulsion core of the Lennard-Jones potential models the excluded volume of each monomer and the elastic bonds link them together as a chain. There is no force that maintains the angles of the chain. So the particles can rotate freely. This model is very rough, but it is already an order of magnitude better than the ideal chain, which assumes a polymer as a random walk and neglects any kind of interactions among monomers.

 Figure 3. Identical solvents (weakly polar).
Next we need a solvent model. For simplicity, each solvent molecule is represented by a Lennard-Jones particle. Again, this is a very rough model for water as solvent as it neglects the angular dependence of hydrogen bonds among water molecules. A better 2D model for water is the Mercedes-Benz model, so called because its three-arm model for hydrogen bonding resembles the Mercedes-Benz logo. We will probably include this hydrogen bonding model in our simulation engine in the future, but for now, the angular effect may be secondary for the purpose of this modeling project.

As with themselves, the polymer and solvent molecules interact with each other through a Lennard-Jones potential. Now, the question is: Are these interactions we have in hands sufficient to model the hydrophobic effect? In other words, can the nature of hydrophobicity be explained by using this simple picture of interactions? Would Occam's razor be good in this case? I feel that this is a crucial key to our Constructive Chemistry project: If a knowledge system can be reduced to only a handful of rules students can learn, master, and apply in a short time without being too frustrated, the chance of succeeding in guiding them towards learning through construction-based inquiry and discovery would be much higher. Think about all those successful products out there: LEGO, Minecraft, Algodoo, and so on. Many of them share a striking similarity: They are all based on a set of simple building blocks and rules that even young children can quickly learn and use to construct meaningful objects. Yet, from the simplicity rises extremely complex systems and phenomena. We want to learn from their tremendous successes and invent the overdue equivalents for chemistry and biology. The Constructive Chemistry project should pave the road for that vision.
 Figure 4. Identical solvents (strongly polar).

Back to modeling the hydrophobic effect: Does our simple-minded model work? To answer this question, we must be able to investigate the effect of each factor. To do so, we set up two compartments separated by a barrier in the middle. Then we put a 24-bead polymer chain into one of them and then copy it to another. In order for them not to move to the edges or corners of the simulation box (if they stay near the edges then they are not fully solvated), we pin their centers down using an elastic constraint. Next we will put different types of solvent particles into the two compartments. We also use some scripts to keep the temperatures on both sides identical all the time and export the radii of gyration of the two polymers to a graph. The radius of gyration of a polymer approximately describes its dimension.

By keeping everything else but one factor identical in the two compartments, we can investigate exactly what is responsible for the hydrophobic effect for the polymers (or its relative importance). Our hypothesis at this point is that the hydrophobic effect would be more pronounced if the solvent-solvent interaction is stronger. To test this, we set the Lennard-Jones attraction between solvent B (right) particles to be three times stronger than that between solvent A particles, while keeping everything else such as mass and size exactly the same. Figure 1 shows a series of snapshots taken from a nanosecond-long simulation (this model has 550 particles in total, but on my Lenovo X230 tablet it runs speedily). The results show that the polymer on the right folds into a hairpin-like conformation with its two freely-moving terminals pointing outwards from the solvent, suggesting that it attempts to leave the solvent (but cannot because it is pinned down). And this conformation and location last for a long time (in fact most of the time during the simulated nanosecond). In comparison, the polymer on the left has no stable conformation or location -- it is randomly stretched in the solvent most of the time and does not prefer any specific location. I think this is the evidence for the hydrophobic effect in two senses: 1) The polymer attempts to separate from the solvent; and 2) the polymer curls up to make room for more contacts among the solvent particles (this is related to the so-called hydrophobic collapse in the study of protein folding). The second can be further visualized by comparing the radii of gyration (Figure 2), which consistently differ by 2-3 angstroms.

Note that we did not introduce any special interaction between the polymers and the solvent particles of either type. The interaction between the polymer with a solvent particle is exactly the same in both compartments. The only difference is the solvent-solvent interaction. The difference in the simulation results for the two polymers is all because it is energetically more favorable for the solvent particles in the right compartment to stay closer. After numerous collisions (this is sometimes called entropy-driven), the hairpin conformation emerges as the winner for the polymer on the right.
 Figure 5: Higher temperatures.

To make sure that there is no mistake, we ran another simulation in which the two solvents were set to be identically weak-polar. Figure 3 shows that there was no clear formation of a stable conformation for either polymer in a nanosecond-long simulation. Neither polymer curled up.

Next we set the two solvents to be identically strong-polar. Figure 4 shows that the two polymers both ended up in a hairpin conformation in a nanosecond-long simulation.

Another test is to raise the temperature but keep the solvent-solvent interaction in the right compartment three times stronger than that in the left compartment. Can the polymer on the right keep its hairpin conformation when heated? Negative, as shown in Figure 5. This actually is related to denaturation, a process in which a protein loses its stable conformation due to heat (or other external stimuli).

These simulations suggest that our simple-minded model might be able to explain the hydrophobic effect and allow students to explore a variety of variables and concepts that are of fundamental importance in biochemistry. Our next steps are to transfer the modeling work we have done to something students can also do. To accomplish this goal, we will have to figure out how to scaffold the modeling steps to provide some guidance.

## Wednesday, August 14, 2013

### Some thoughts and variations of the Gas Frame (a natural user interface for learning gas laws)

A natural user interface (NUI) is the user interface that is based on natural elements or natural actions. Interacting with computer software through a NUI simulates everyday experiences (such as swiping a finger across a touch screen to move a photo in display or just "asking" a computer to do something through voice commands). Because of this resemblance, a NUI is intuitive to use and requires little or no time to learn. NUIs such as touch screen and speech recognition have become commonplace on new computers.

As the sensing capability of computers becomes more powerful and versatile, new types of NUI emerge. The last three years have witnessed the birth and growth of sophisticated 3D motion sensors such as Microsoft Kinect and Leap Motion. These infrared-based sensors are capable of detecting the user's body language within a physical space near a computer with varied degrees of resolution. The rest is how to use the data to create meaningful interactions between the user and a certain piece of computer software.

Think about how STEM education can benefit from this wave of technological innovations. Being scientists, we are especially interested in how these capabilities can be leveraged to improve learning experiences in science education. Thirty years of development, mostly funded by federal agencies such as the National Science Foundation, have produced a wealth of virtual laboratories (aka computational models or simulations) that are currently being used by millions of students. These virtual labs, however, are often criticized for not being physically relevant and not providing hands-on experiences commonly viewed as necessary in practicing science. We now have an opportunity to partially remedy these problems by connecting virtual labs to physical realities through NUIs.

What would a future NUI for a science simulation look like? For example, if you teach physical sciences, you may have seen many versions of gas simulations that allow students to interact with them through some kind of graphical user interface (GUI). What would a NUI for interacting with a gas simulation look like? How would that transform learning? Our Gas Frame provides an example of implementation that may give you something concrete to think about.

 Figure 1: The Gas Frame (the default configuration).
In the default implementation (Figure 1), the Gas Frame uses three different kinds of "props" as the natural elements to control three independent variables related to a gas: A warm or cold object to heat or cool the gas, a spring to exert force on a piston that contains the gas, and a syringe to add or remove gas molecules. The reason that I call these objects "props" is because, like in film making, they mostly serve as close simulations to the real things without necessarily performing the real functions (you don't want a prop gun to shoot real bullets, do you?).

The motions of the gas molecules are simulated using a molecular dynamics method and visualized on the computer screen. The volume of the gas is calculated in real time using the molecular dynamics method based on the three physical inputs. In addition to the physical controls through the three props, a set of virtual controls are available on the screen for students to interact with the simulation such as viewing the trajectory path or the kinetic energy of a molecule. These virtual controls support interactions that are impossible in reality (no, we cannot see the trajectory of a single molecule in the air).

The three props can control the gas simulation because a temperature sensor, a force sensor, and a gas pressure sensor are used to detect student interactions with them, respectively. The data from the sensors are then translated into inputs to the gas simulation, creating a virtual response to a real action (e.g., molecules are added or subtracted when the student pushes or pulls a syringe) and a molecular interpretation of the action (e.g., molecules run faster or slower when temperature increases or decreases).

Like in almost all NUIs, the sensors and the data they collect are hidden from students, meaning that students do not need to know that there are sensors involved in their interactions with the gas simulation and they do not need to see the raw data. This is unlike many other activities in which sensors play a central role in inquiry and must be explicitly explained to students (and the data they collected must be visually presented to students, too). There are definitely advantages of using sensors as inquiry tools to teach students how to collect and analyze data. Sometimes we even go extra miles to ask students to use a computer model to make sense of the data (like the simulation fitting idea I blogged before). But that is not the reason why the National Science Foundation funded innovators like us to do.

The NUIs for science simulations that we have developed in our NSF project all use sensors that have been widely used in schools, such as those from Vernier Software and Technology. This makes it possible for teachers to reuse existing sensors to run these NUI apps. This decision to build our NUI technology on existing probeware is essential for our NUI apps to run in a large number of classrooms in the future.

 Figure 2: Variation I.
Considering that not all schools have all the types of sensors needed to run the basic version of the Gas Frame app, we have also developed a number of variations that use only one type of sensor in each app.

Figure 2 shows a variation that uses two temperature sensors, each connected to the temperature of the virtual gas in a compartment. The two compartments are separated by a movable piston in the middle. Increasing or decreasing the temperature of the gas in the left or right compartment through heating or cooling the thermal contacts in which the sensors are applied will cause the virtual piston to move accordingly, allowing students to explore the relationships among pressure, temperature, and volume through two thermal interactions in the real world.

 Figure 3: Variation II.
Figure 3 shows another variation that uses two gas pressure sensors, each connected to the number of molecules of the virtual gas in a compartment through an attached syringe. Like in Variation I, the two compartment are separated by a movable piston in the middle. Pushing or pulling the real syringes will cause molecules to be added or removed from the virtual compartments, allowing students to explore the relationships among number of molecules, pressure, and volume through two tactile interactions.

If you don't have that many sensors, don't worry -- both variations will still work if only one sensor is available.

I hear you asking: All these sounds fun, but so what? Will students learn more from these? If not, why bother to go through these extra troubles, compared with using an existing GUI version that needs nothing but a computer? I have to confess that I cannot answer this question at this moment. But in the next blog post, I will try to explain our plan for figuring this out.

## Wednesday, July 31, 2013

### Fair asessment for engineering design?

 The student's design #1
In our June study on engineering design in a high school, one student's designs caught my eye. The design challenge required students to use Energy3D to design a cluster of buildings in a city block that takes solar radiation into consideration, but this particular student came up with two neat designs.

 The student's design #2
The student didn't pay much attention to the solar design part, but both designs are, I would say, hmm, beautiful. I have to admit that I am not an architect and I am judging this mostly based on my appreciation of the mathematical beauty (see Design #1) expressed in these designs. But even so, I feel that this is something worth my writing, because -- considering that the student absolutely did not know anything about Energy3D before -- it is amazing to see that how quickly he mastered the tool and came up with pretty sophisticated designs that look pleasant to my picky eyes. Where did his talent come from? I wish I had a chance to ask him.

And then the interesting story is that when I showed these designs to a colleague, she actually had a different opinion about them (compared with other designs that I think are not great). This reflects how subjective and unreliable performance assessment based on product analysis could sometimes become. While I cannot assert that my assessment is more justified, I can imagine how much efforts and thoughts this student put into these extremely well-conceived and polished designs (look how perfectly symmetric they are). This cannot be possibly the results of some random actions. A negative assessment might not do justice to this student's designs.

Which is why I had to invent the process analytics, an assessment technique that aims to provide more comprehensive, more trustworthy evaluation of students' entire design processes, not just on the final looks of the products and the evaluator's personal taste.

## Sunday, July 28, 2013

### SimBuilding funded by the National Science Foundation

 A thermal bridge simulation in SimBuilding
Building science is, to a large extent, a “black box” to many students, as it involves many invisible physical processes such as thermal radiation, heat transfer, air flow, and moisture transport that are hard to imagine. But students must learn how these processes occur and interact within a building in order to understand how design, construction, operation, and maintenance affect them and, therefore, the wellbeing of the entire building. These processes form a “science envelope” that is much more difficult to understand than the shape of the building envelope alone. With 3D graphics that can visualize these invisible processes in a virtual building, simulation games provide a promising key to open the black box. They offer a highly interactive learning environment in which STEM content and pedagogy can be embedded in the gameplay, game scores can be aligned to educational objectives to provide formative assessments, and students can be enticed to devote more time and explore more ramifications than didactic instruction. A significant advantage is that students can freely experiment with a virtual building to learn a concept before exploring it in a real building with all the consequences and costs that may entail.

A new grant (\$900K) from the National Science Foundation will allow us to develop a simulation game engine called SimBuilding based on computational building simulation. The application of advanced building simulation technologies to developing training simulation games will be an original contribution of this project. Although building simulation has become an important tool in the industry and can be very helpful in understanding how a building works, it has never been used to build simulation games before. SimBuilding will unveil this untapped instructional power. Furthermore, this game engine will be written in JavaScript and WebGL, allowing it to run on most computing devices.

Amanda Evans, Director of Center of Excellence for Green Building and Energy Efficiency at Santa Fe Community College in New Mexico, will be our collaborator on this grant.

## Sunday, June 30, 2013

### First research paper using the Molecular Workbench submitted to arXiv

 Credit: M. Rendi, A.S. Suprijadi, & S. Viridi
Researchers from Institut Teknologi Bandung, Indonesia recently submitted a paper "Modeling Of Blood Vessel Constriction In 2-D Case Using Molecular Dynamics Method" to arXiv (an open e-print repository), in which they claimed: "Blood vessel constriction is simulated with particle-based method using a molecular dynamics authoring software known as Molecular Workbench. Blood flow and vessel wall, the only components considered in constructing a blood vessel, are all represented in particle form with interaction potentials: Lennard-Jones potential, push-pull spring potential, and bending spring potential. Influence of medium or blood plasma is accommodated in plasma viscosity through Stokes drag force. It has been observed that pressure p is increased as constriction c is increased. Leakage of blood vessel starts at 80 % constriction, which shows existence of maximum pressure that can be overcome by vessel wall."

This blog article is not to endorse their paper but to use this example to illustrate the point that a piece of simulation software that was originally intended to be an educational tool can turn out to be also useful to scientists. If you are a teacher, don't you want your students to have such a tool that assumes no boundary to what they can do? The science education community has published numerous papers about how to teach students think and act like a scientist, but much less has been done to actually empower them with tools they can realistically use.

## Sunday, June 23, 2013

### Solar urban design and data mining in the classroom

 Image usage permitted by students.
 Image usage permitted by students.
In the past two weeks, seventy ninth graders in three physics classes of Arlington High School (MA) each used our Energy3D CAD software to solve the Solar Urban Design Challenge (which I blogged earlier). I observed them for three days. I didn't have experience with these students before, but according to their teacher, they were exceptionally engaged. Most students hadn't "run out of steam" even after 4-5 days of continuous work on the design project. As anyone who works in schools knows, it is hard to keep students interested in serious science projects for that long, especially near the end of a semester. These students seemed to have enjoyed this learning experience. This is a good sign that we must have done something right. I suppose the colorful 3D solar visualization provides some eye candies to keep them curious for a while.

 Image usage permitted by students.
CAD tools are probably not new things in classrooms these days, at least not for Arlington High School that uses SketchUp and AutoCAD for years. What is cool about our CAD tool is that all these students' actions were recorded behind the scene -- at a frequency of every two seconds! That is to say, the computer was "watching" every move of every student. This sounds like a little concerning if you have heard in the news about a secret governmental project called the Prism that is probably "watching" me writing this blog article at this time. But rest assured that we are using this data mining technology in a good way. Our mission is not to spy on students but to figure out how to help them learn science and engineering in a more fruitful way. This is probably equally important -- if not more -- to our national security if we are to maintain our global leadership in science and technology.

## Saturday, June 1, 2013

### Solar urban design using Energy3D: Part IV

In Part I, II, and III, we mainly explored the possible layouts of buildings in the city block and their solar energy outputs in different seasons. In those cases, the solar radiation on a new construction is mostly affected by other new constructions and existing buildings in the neighborhood. We haven't explored the effect of the shape of a building. The shape of a building is what makes architecture matter, but it also has solar implications. In this blog post, we will explore these implications.

 Figure 1: Compare solar heating of three different shapes in two seasons.
Let's start with a square-shaped tall building and make two variations. The first one is a U-shaped building and the second is a T-shaped one. In both variations, the base areas and the heights are identical to those of the original square-shaped building. Let's save these buildings into separate files and don't put them into the city block. We just want to study the solar performance of each individual building before we put them in a city.

The U-shaped building has a larger surface area than the square-shaped and the T-shaped ones (which have an identical surface area). Having a larger surface means that the building can potentially receive more solar radiation. But the two wings of the U-shaped building also obstruct sunlight. So does the U-shaped building get more or less energy? It would have been very difficult to tell without running some solar simulations, which tell us that this particular U-shaped building gets more solar energy than the square-shaped one both in the winter and in the summer.

In comparison, the T-shaped building gets the least amount of solar energy in both seasons. This is not surprising because its surface is not larger than the square-shaped one but its shape obstructs sunlight to its western part in the morning and to its eastern part in the afternoon, resulting in a reduction of solar heating.

## Wednesday, May 22, 2013

### Solar urban design using Energy3D: Part III

 Figure 1
In Part I and II, we discussed how solar simulations in Energy3D can be used to decide where to erect a new building in a city block surrounded by existing buildings. Now, what about putting multiple buildings in the block? The optimization problem becomes more complex because students will have to deal with more variables while searching for an optimal solution.
 Figure 2
Suppose students have to decide the locations of two new constructions A and B that have identical shapes. Now they have six options to layout
the two new constructions. Figure 1 shows the results of the solar simulations for all these six layouts in the winter. Placing the buildings in the northeast and northwest parts (the first in the first row of Figure 1) seems to be the best solution for receiving solar heating in the winter. This is not surprising because this layout creates large south-facing areas for both buildings that will get a lot of solar energy in the winter and there are not shadowed very much by the surrounding buildings.

Switch the season to the summer.  Figure 2 shows the results of the solar simulations for all these six layouts in July. Placing the buildings in the southeast and southwest parts (the first in the second row of Figure 2) seems to be the best solution for avoiding solar heating in the summer.

To make a trade-off between winter heating and summer cooling, it seems the southeast and southwest locations are the optimal solution: In the winter the solar heating on the two buildings is the second best (which is not much lower than the highest) and in the summer the solar heating on them is the lowest (which is much lower than the contender).

## Saturday, May 18, 2013

### Solar urban design using Energy3D: Part II

 Figure 1
The sun is lower in the winter and higher in the summer. How does the sun path affect the solar radiation on the city block in our urban design challenge? Is solar heating different in different seasons? Let's find out using Energy3D's solar simulator. Energy3D has a nice feature that allows us to look at the 3D view exactly from the top. This kind of reduces the 3D problem to a 2D one once you complete your 3D construction and want to do some solar analysis. The 2D view is clearer and the drag-and-drop of buildings is easier.

 Figure 2
First, we added a rectangular building to the city block and moved it to four different places -- northwest, northeast, southeast, and southwest -- in the city block and set the month to be January and the location to be Boston, MA (which is where we are close to). Not surprisingly, the solar radiation on the building is the lowest at the southeast location (Figure 1). This is because to the southeast of the block, there are three tall buildings that shadow the southeast part of the block --- you can see in the heat map that the southeast part is deep blue. At the southwest location, the building receives the highest solar energy. The northwest location seconds it with a slightly smaller number.

 Figure 3
Next we set the month to be July and repeated the solar simulation.This time, the solar heating on the building at all locations increases (Figure 2). However, the location that receives the lowest solar heating, surprisingly, is not southeast but southwest! The location that receives the highest solar heating is northwest. The reason could be that there is a tall building next to the southwest location that provides a lot of shadow (Figure 3). This shadowing effect seems to be more significant than the shadowing effect from the three tall buildings around the southeast corner.
 Figure 4
The conclusion is that the building of this particular shape receives the highest solar energy in the winter and the lowest in the summer at the southwest spot.

Now, what about the orientation of the building? Let's rotate the building 90 degrees and redo the solar analysis in January (Figure 4). The results show that the building receives higher solar energy at all locations. This is because the building has a larger south-facing side in this orientation than in the previous one. The southeast location remains the coldest spot, but the difference between southwest and northwest is less.

## Friday, May 17, 2013

### Solar urban design using Energy3D: Part I

 Figure 1
In sustainable architecture, passive solar design refers to searching for optimal strategies to maximize solar heating on a building in the winter and minimize solar heating in the summer in order to reduce heating and cooling costs of the building. A passive solar design challenge is a typical optimization problem that requires many steps of engineering design to solve, such as testing ideas, analyzing data, considering constraints, and making trade-offs.

 Figure 2
For urban design, site layout has a big impact on passive solar heating in buildings as neighboring tall buildings can block low winter sun. Energy3D’s solar simulator can compute, visualize, and analyze solar radiation in obstructed situations commonly encountered in dense urban areas.

The solar urban design project we have developed challenges students to use Energy3D to construct a square city block surrounded by a number of existing buildings of different heights, with the goals to maximize solar access for new constructions and minimize obstruction of sunlight to existing buildings. The existing buildings, which cannot be modified by students, serve as constraints for the design challenge. This design challenge is an authentic engineering problem as it requires students to consider solar radiation as it varies over seasons and apply these math and science concepts to solve open-ended problems using a supporting analytic tool. This distinguishes it from common computer drafting activities in which students draw structures whose functions cannot or will not be verified or tested.

 Figure 3
Energy3D can generate solar radiation heat maps on the walls of buildings and the ground (Figures 1 and 2). These heat maps show the cumulative heat of solar radiation on a surface over a certain period (a day or a month). They are calculated by summing up the solar energy projected onto each unit area of the surface while the sun moves cyclically in its path at the given location. The total solar heating result (in kWh), summing from all the unit areas of all the walls, is shown on top of each building. This number will go up and down as students move or reshape the building. This calculated result is more accurate than shadow and shading, which only reflects instantaneous solar heating at a particular moment.

The horizontal radiation heat map can be used to identify the hot and cold areas of the empty city block. With this heat map, students can find out where the new constructions should be in order to have maximal solar heating in the winter. Once they put in a new building, they can move the building around within the construction site to experiment how much solar energy the building will gain. As an example, Figure 3 shows that a rectangular high-rise building will receive the highest amount of solar radiation in January if it is placed at the southwestern part of the square and it will receive the lowest amount of solar radiation if it is placed at the southeastern part.

Such an analytic tool provides data for students to make their design decisions, creating plenty of opportunities of inquiry in design processes.