Category: astro 101

#eqjp, a teachable moment

In my current assignment through the Carl Wieman Science Education Initiative in Physics and Astronomy at UBC, I’m working closely with a senior astronomy professor to help him better teach his general-education “Astro 101” course. It’s a mixture of providing resources, mentoring, helping him clarify what he wants the students to learn, and coaxing (sometimes dragging – he’s a great sport!) his teaching to a learner-centered approach.

Today was supposed to be the first class in the last, big section of the course, comparative planetology. That is, the characteristics of the planets and other bodies in our Solar System and, more importantly, what their similarities and differences tell us about the formation of Solar System some 4.5 billion years ago. Traditionally, one follows the textbook’s lead. Chapter 10: Mercury. Chapter 11: Venus,… Chapter 15: Saturn,… Chapter 20: Other Crap, Chapter 21: [finally!] Formation. And by this time, nobody remembers Mercury, Venus, or gives a damn. I’m glad to say we long ago scrapped that approach and instead, focus on the gathering and analyzing the evidence that points to a single formation event. Our learning goal states that a student will be able to

deduce from patterns and properties of the planets, moons, asteroids and other bodies that the Solar System had a single formation event.

Where was I? Oh, right, teachable moment.

Last night (March 10), there was a massive earthquake in Japan. Magnitude 8.9, one of the biggest earthquakes recorded. The ensuing tsunami(s) devastated parts of Japan. I pay attention to these things, perhaps more than others, because my home, Vancouver, is on the list of places expecting The Big One. And we can be hit by tsunamis caused by earthquakes around the ring of fire. Thankfully, the west coast of Canada and the U.S. were spared this time.

It occurred to me, on the bus ride to work this morning, we could use last night’s earthquake in class today. Seismic activity tells us about the structure and evolution of the Earth. Similar signs of earthquakes and volcanoes on other planets, or lack thereof, tell us about their structure and evolution. Not seeing volcanoes on a planet is just as telling as seeing them. Using the earthquake to introduce this last arc in the course would set the tone for the next month of classes: we don’t care about the exact surface temperature on Mercury or the exact density of Neptune. We care about patterns in the physical properties of the planets. And we care about how we find, collate and reconcile those patterns.

Shortly after this “A-ha!” moment, my brain countered with, “Is this a teachable moment. Or are you exploiting the earthquake because you can’t think of an interesting way to teach comparative planetology?”

So I tweeted…

…and, as usual, was overwhelmed by the quick and intelligent response of the great tweeps who follow me. Thanks @TanyaCNoel, @penmachine, @snowandscience, @cpm5280, @derekbruff, @erinleeryan, @cosmos4u. The overwhelming advice was take advantage of the teachable moment:

Good idea. Understanding is always helpful.
teachable moment. everyone’s talking about it anyway…
Definitely a teachable moment

I’m also thankful to @ptruchon for putting words to something that bothered me:

Tough one…Do some of them have family in Japan? If so, are they ok?

So, I went for it. And by went for it, I mean I decided to convince the prof to use the earthquake in today’s class. I proposed he could run the “Earth’s Changing Surface” lecture-tutorial but he decided against it. Instead, he used the earthquake to segue from “here are the 3 or 4 key patterns that support a single formation event” to “how do we know all that, anyway?” Through open questions  like, “What does the earthquake tell us about the structure of the Earth?” and “What does this picture [of Mars’ Olympus Mons] tell you about this planet?” he lead a nice discussion with the 170-or-so students in class today. Many students, men and women, from the front and the back of the lecture hall, participated.

A very successful class, in my opinion, one that demonstrated to me and himself and the students, how “agile” this prof is getting. I was proud that we were able to adapt our presentation so quickly and help the students learn about something they care about.

P.S. A special hat-tip to @cpm5280 who reminded me about that this earthquake was predicted, yes predicted, by the Super Moon wingnuts. I gave the prof a quick summary, just in case. And sure enough, at the end of class, a gaggle of students came down and asked him if he knew anything about the Moon being super-close on March 19. He hit them with a few, key scientific facts (in particular, that because gravity follows an inverse-square law, the tiny decrease in distance won’t do very much) and told them that the whole earthquake-prediction thing was, “a load of crap.” He used their language and they, like, totally got it.

Going over the exam

How often have you heard your fellow instructors lament,

I don’t know why I bother with comments on the exams or even handing them back – students don’t go over their exams to see where they what they got right and wrong, they just look at the mark and move on.

If you often say or think this, you might want to ask yourself, What’s their motivation for going over the exam, besides “It will help me learn…”? But that’s the topic for another post.

In the introductory gen-ed astronomy class I’m working on, we gave a midterm exam last week. We dutifully marked it which was simple because the midterm exam was multiple-choice answered on Scantron cards. And calculated the average. And fixed the scoring on a couple of questions where the question stem was ambiguous (when you say, “summer in the southern hemisphere, do you mean June or do you mean when it gets hot?”). And we moved on.

Hey, wait a minute! Isn’t that just what the students do — check the mark and move on?

Since I have the data, every student’s answer to every question, via the Scantron and already in Excel, I decided to “go over the exam” to try to learn from it.

(Psst: I just finished wringing some graphs out of Excel and I wanted to start writing this post before I got distracted by, er, life so I haven’t done the analysis yet. I can’t wait to see what I write below!)

Besides the average (23.1/35 questions or 66%) and standard deviation (5.3/35 or 15%), I created a histogram of the students’ choices for each question. Here is a selection of questions which, as you’ll see further below, are widespread on the good-to-bad scale.

Question 9: You photograph a region of the night sky in March, in September, and again the following March. The two March photographs look the same but the September photo shows 3 stars in different locations. Of these three stars, the one whose position shifts the most must be

A) farthest away
B) closest
C) receding from Earth most rapidly
D) approaching Earth most rapidly
E) the brightest one

Students' choices for Question 9. The correct answer is B.

Question 16: What is the shape of the shadow of the Earth, as seen projected onto the Moon, during a lunar eclipse?

A) always a full circle
B) part of a circle
C) a straight line
D) an ellipse
E) a lunar eclipse does not involve the shadow of the Earth

Students' choices for Question 16. The correct answer is B.

Question 25: On the vernal equinox, compare the number of daytime hours in 3 cities, one at the north pole, one at 45 degrees north latitude and one at the equator.

A) 0, 12, 24
B) 12, 18, 24
C) 12, 12, 12
D) 0, 12, 18
E) 18, 18, 18

Students' answers to Question 25. The correct answer is C.

How much can you learn from these histograms? Quite a bit. Question 9 is too easy and we should use our precious time to better evaluate the students’ knowledge. The “straight line” choice on Question 16 should be replaced with a better distractor – no one “fell for” that one.  I’m a bit alarmed that 5% of the students think that the Earth’s shadow has nothing to do with eclipses but then again, that’s only 1 in 20 (actually, 11 in 204 students – aren’t data great!)  We’re used to seeing these histograms because in class, we have frequent think-pair-share episodes using i>clickers and use the students’ vote to decide how to proceed. If these were first-vote distributions in a clicker question, we wouldn’t do Question 9 again but we’d definitely get them to pair and share for Question 16 and maybe even Question 25. As I’ve written elsewhere, a 70% “success rate” can mean only about 60% of the students chose the correct answer for the right reasons.

I decided to turn it up a notch by following some advice I got from Ed Prather at the Center for Astronomy Education. He and his colleagues analyze multiple-choice questions using the point-biserial correlation coefficient. I’ll admit it – I’m not a statistics guru, so I had to look that one up. Wikipedia helped a bit, so did  this article and Bardar et al. (2006). Normally, a correlation coefficient tells you how two variables are related. A favourite around Vancouver is the correlation between property crime and distance to the nearest Skytrain station (with all the correlation-causation arguments that go with it.) With point-biserial correlation, you can look for a relationship between students’ test scores and their success on a particular question (this is the “dichotomous variable” with only two values, 0 (wrong) and 1 (right).) It allows you to speculate on things like,

  • (for high correlation) “If they got this question, they probably did well on the entire exam.” In other words, that one question could be a litmus test for the entire test.
  • (for low correlation) “Anyone could have got this question right, regardless of whether they did well or poorly on the rest of the exam.” Maybe we should drop that question since it does nothing to discriminate or resolve the student’s level of understanding.

I cranked up my Excel worksheet to compute the coefficient, usually called ρpb or ρpbis:

where μ+ is the average test score for all students who got this particular questions correct, μx is the average test score for all students, σx is the standard deviation of all test scores, p is the fraction of students who got this question right and q=(1-p) is the fraction who got it wrong. You compute this coefficient for every question on the test. The key step in my Excel worksheet, after giving each student a 0 or 1 for each question they answered, was the AVERAGEIF function: for each question I computed

=AVERAGEIF(B$3:B$206,”=1″,$AL3:$AL206)

where, for example, Column B holds the 0 and 1 scores for Question 1 and Column AL holds the exam marks. This function takes the average of the exam scores only for those students (rows) who have got a “1” on Question 1. At last then, the point-biserial correlation coefficients for each of the 35 questions on the midterm, sorted from lowest to highest:

Point-biserial correlation coefficient for the 35 multiple-choice question in our astronomy midterm, sorted from lowest to highest. (Red) limits of very weak to strong (according to the APEX disserations article) and also the (green) "desirable" range of Bardar et al. are shown.

First of all, ooo shiney! I can’t stand the default graphics settings of Excel (and PowerPoint) but with some adjustments, you can produce a reasonable plot. Not that this in is perfect, but it’s not bad. Gotta work on the labels and a better way to represent the bands of “desirable”, “weak”, etc.

Back to going over the exam, how did the questions I included above fare? Question 9 has a weak, not desirable coefficient, just 0.21. That suggests anyone could get this question right (or equivalently, no could get this question right). It does nothing to discriminate or distinguish high-performing students from low-performing students. Question 16, with ρpb = 0.37 is in the desirable range – just hard enough to begin to separate the high- and low-performing students. Question 25 is one of the best on the exam, I think.

In case you’re wondering, Question 6 (with the second highest ρpb ) is a rather ugly calculation. It discriminated between high- and low-performing students but personally, I wouldn’t include it – doesn’t match the more conceptual learning goals IMHO.

I was pretty happy with this analysis (and my not-such-a-novice-anymore skills in Excel and statistics.) I should stopped there. But like a good scientist making sure every observation is consistent with the theory, I looked at Question 26, the one with the highest point-biserial correlation coefficient. I was shocked, alarmed even. The most discriminating question on the test was this?

Question 26: What is the phase of the Moon shown in this image?

A) waning crescent
B) waxing crescent
C) waning gibbous
D) waxing gibbous
E) third quarter

It’s waning gibbous, by the way, and 73% of the students knew it. That’s a lame, Bloom’s taxonomy Level 1, memorization question. Damn. To which my wise and mentoring colleague asked, “Well, what was the exam really testing, anyway?”

Alright, perhaps I didn’t get the result I wanted. But that’s not the point of science. Of this exercise.  I definitely learned a lot by “going over the exam”, about validating questions, Excel, statistics and WordPress. And perhaps made it easier for the next person, shoulders of giants and all that…

Clicker votes when students guess

I’m working with a veteran gen-ed astronomy (#astro101) instructor to make his classroom more learner-centered. We’re working hard on effective clicker implementation. The benefit of using clickers for think-pair-share (TPS) questions is the instructor can use the students’ votes to guide the instruction.

i>clicker receiver and clicker (sorry, can't find credits for this pic.)

If everyone gets a question right, just confirm the answer and move on – don’t waste valuable class time re-teaching something everyone already knows! Conversely, if the students have no clue what the answer is and simply guess, you’d expect 20% for each choice A-E, 25% each if there are 4 choices, and so on. If that’s how they vote, either there’s something wrong with the question (a critical typo, perhaps) or the students haven’t learned the concept yet. Teach it again BUT NOT JUST LOUDER. Teach it again in a different way.

The “sweet spot” is when there’s a nice split between 2 or choices. The students have thought hard enough to formulate and pick the choice they feel is correct, which means they’re prepared to interact with their peers. In cases like this, we ask them to “turn to your neighbours and convince them you’re right.” Then you sit back and let them teach themselves. Ahhh.

(Well, actually, you shouldn’t sit back. You should wander around the room and eavesdrop – you’re going to hear some great ideas you can use for choices on the final exam!)

The hard part for instructors is knowing when to move on or when to get the students to discuss the question. Is 90% correct enough? Yes, probably. What about 80%? What about 60%?

In today’s astronomy class, the instructor asked the students a TPS question and the distribution of votes was A 0, B 0, C 67%, D 20%, E 13%. The instructor wasn’t overjoyed, but 67%? That means 2/3 of the students got it, right?

Wrong. Some knew the answer. And the rest guest. Er, guessed.

I did a little thought experiment with the instructor afterwards. “Suppose only half the students knew the answer and the rest just guessed. What vote distribution would you get?”

“Er, 50% then 10% for each choice, so a 60 and 10’s.”

“Great,” I said. “Suppose 2 of the 5 choices were obviously wrong. Then what.”

He thought for about 2 seconds. “67-17-17.” Our numbers from that today. “Oh.”

That’s right, when there are only 3 valid choice and only half the students know the answer, you still get about 67% success. And you might be tempted to move on even though half the students don’t know what you’re talking about!

That got me thinking – suppose fraction f of the students know the correct answer and the rest guess. What do the clicker vote distributions look like? I cast a spell with Excel (I’ve finally reached novice Excel spellcaster) and found these results:

Distribution of votes when fraction f of students know the correct answer is A and the rest of the students make a random guess. Each set of 5 bars show the votes for A, B, C, D, E.

(Quick limit test that us math-types do: when no one knows and f=0.0, the votes are 20% for each choice. And when everyone knows, it’s 100-0-0-0-0. Got it.)

For example, when the peak vote is 60%, only 50% of the students actually know the answer. And it gets worse when there are fewer choices (or equivalently, when you can eliminate some of the 5 choices because they’re obviously wrong.) Here are the distributions when there are 4 choice and 3 choices:

Distribution of votes when fraction f of students know the correct answer is A and the rest of the students make a random guess. Each set of 4 bars show the votes for A, B, C, D.
Distribution of votes when fraction f of students know the correct answer is A and the rest of the students make a random guess. Each set of 3 bars show the votes for A, B, C.

This last chart shows our 67-17-17 vote distribution corresponding to only 50% of the students knowing the right answer.

This isn’t ground-breaking research. I bet many clicker users have done this, too. Or at least, worked out a few special cases.

The moral of the story, though: the fraction of students who choose the correct answer is always higher than the fraction of students who know the correct answer. Don’t move on to the next topic unless you get a very strong peak.

What’s your threshold for moving on or doubling-back with a pair-share?

Navigation