Monday, October 6, 2014

Impressions on Judging a Hackathon

A Berkeley CS undergraduate I know who is passionate about hacking asked me this summer if I would be a judge for this weekend's Cal Hacks hackathon—being on a panel that would select the "top 3 overall" projects from among the top 12 finalists identified in a first-round of judging.

I spend a lot of time in Soda Hall and you can't go a month without seeing flyers for some new hackathon or programming contest up on the student bulletin boards.  Never having experienced one, I agreed to do this, and I'm certainly glad I did.

The overall winning hack was extremely impressive: a team of six students reverse-engineered the radio protocol for the remote control that drives a quadcopter drone, programmed an Arduino with radio board to be the controller, connected that to a Myo armband so you could control the copter just by arm gestures, then went further and used EEG-style electrodes to allow it to be controlled by your thoughts.  Even though this has been done before, the fact that six students did it in a weekend is remarkable, especially (to me) the hacker ethic embodied in their attitude that they shouldn't be limited to controlling the drone using the manufacturer-provide remote control.  (I summarize a few of the other top-12 projects at the end of this post.)

Here is the "CS professor's take" on this event.  TL;DR: to my faculty colleagues, I'd say if you are asked to be involved in one of these, you should do so at least once; many of us who ended up as CS professors started as passionate hackers, and it's important to understand what forms and manifestations that urge takes today and how we might be able to support them.  (It is definitely alive and well.)

This event is huge.  Over two dozen companies provided financial support, and not just swag and hardware for hacking on, but travel expenses to/from Berkeley for accepted participants (there were over 1000 people from all over the place, mostly the US & Canada).  The event was held in the various facilities of Cal Memorial Stadium.  Nearly all day on Saturday, both companies and student groups offered one-hour "crash courses" in using particular tools or technologies (JavaScript, d3.js, specific site APIs or hardware product SDKs, etc.).  Most teams who built a hack around a particular hardware product got to keep the product, whether they placed in the final ranking or not.  Teams who did make the finals also got additional prizes: swag, hardware, and especially…

Recruiting.  More than one company offered as a "prize" a paid field trip to shadow an engineer at the company for a week, or a mock (or real!) interview with the company.  A few high school students were even present, which should be a recruiting opportunity for Berkeley as well!

The students are amazingly creative.  True, some of the projects amounted to little more than just another social network, but there were many creative ideas, both software-only and hardware+software system.  One team,, created a custom iPhone soft-keyboard that lets the user type in phonetic sounds using a Roman alphabet and converts them to the appropriate symbols for Arabic, Simplified Chinese, Sanskrit, and a couple other languages. Parcelly is a "peer-to-peer parcel delivery system" based on the Uber model (and using Uber's API to schedule deliveries and pickups).  Myo Fighter connected Myo armbands, worn on both arms and ankles, to an emulator for the once-famous Street Fighter arcade game, so you can play Street Fighter by actually doing the body motions, a little like Wii on steroids. Scribble is an Arduino-based servoing pantograph that slaves one pen to another wielded by a human drawing on a 2d surface.  Splash is a p2p messaging system for use at protests and other crowded events where data coverage or Wifi are unavailable: it sends PGP-encrypted messages point-to-point by building and managing a Bluetooth route network over the devices of people who have the app.  Fashion asks you to "rate" different types of clothing based on various attributes, and then recommends items you might like.

At the same time, I wish I could've advised some of the teams before they started hacking, because they sometimes tended to reinvent some wheels out of naïveté.  Splash's routing algorithm was prone to the same attacks as unprotected wide-area IP routing (false route advertisements, etc.) but the students were unaware of this work and reimplemented naïve versions of those algorithms.  The recommendation algorithm for Fashion was a set of simplistic heuristics, even though well-known algorithms for collaborative filtering are available in library form.  None of this is to detract from the students' effort and work, but I couldn't help but wonder if those of us who know the field a little better could add some value on the front end by pointing students to at least the existence of technologies and algorithms they could use.

On the popular yet annoying game show Who Wants To Be a Millionaire, contestants are allowed a limited number of calls to one of their "lifeline" colleagues during game play to ask them about the answer to a question.  I wonder if future hackathons might allow hackers the equivalent of one or two "lifeline" calls to a panel of industry/research/faculty experts in various areas in order to get just-in-time advice on tackling a problem in specific areas of their hack (maybe there are existing algorithms or even libraries they don't know about, etc.)  The only problem would be getting these "lifeline" folks, most of whom probably have lives and families, to be awake from midnight to 8am on a Friday and Saturday...

Lastly, I would've liked to see their code.  Judging was based on technical difficulty, creativity, social impact, and similar factors.  I wouldn't expect the code to be beautiful or anything, but I would like these students to be aware that while hackathons like this certainly develop their ability to learn about a new tool/technology/API/whatever and then rapidly apply that knowledge in prototyping something, that is a different process from actually producing an artifact that will be used in production and must be maintained.  (And you can often tell a lot about someone's internal design process from looking at their code.)  There is absolutely a place for hackathons, and I'm all for them, but I also want students to learn to distinguish "hacking mode" from "careful design and implementation mode".  Maybe they already do, and my concern is for naught.

It was cool that Joel Spolsky (cofounder of StackOverflow and ) gave 10-minute opening remarks, talking about how debugging (vs. writing the code) takes up 2/3 of your time in hackathons and 98% of your time as a professional programmer, and described Two Things Real Programmers (vs wannabes) Do: divide-and-conquer debugging to narrow the bug down to a small piece of code, and using the "scientific method" to formulate a hypothesis about what's going wrong in that piece of code and then writing tests/harnesses to check that hypothesis.  "People who get flame-broiled on StackOverflow usually forget to follow one or both of these steps," simply posting their entire program or an entire long incomprehensible stack trace without having put any apparent effort into narrowing down what is going on.  (I'd like him to give this talk in CS169!)

Either way, these are a thing, and judging by the many hundreds of participants and their level of energy during the judging presentations after being awake for the better part of 36 hours, students love them.  (And companies clearly love them too, as you may have guessed.)  We should be figuring out how to maximize the value of these experiences on students' ongoing CS education.  That's a subject for a faculty lunch discussion, which I'll report on here after I initiate it.

A few other projects I saw demos of, not mentioned above: Multipass is 2-factor authentication for teams.  ("We could have modified OAuth, but we preferred to write our own 3rd-party authentication protocol in Erlang.")  Control lets you program gesture controls based not only on body accelerometers but on GPS position and orientation, so the same gesture can be context-tuned to different physical locations/environments.   Oahu is a BOINC-style "donate cycles to help solve big problems" that runs entirely in-browser: just having your browser open and the extension enabled lets you participate.  Myo Man combines Myo controllers and Oculus Rift VR headset to let you fly around in a virtual world like Iron Man by making the same gestures he makes (think of Superman when he's flying).

Monday, August 11, 2014

Conversation pitfalls in peer instruction

Dave Patterson and I have long used Eric Mazur's brand of Peer Instruction in CS169 lectures:

  • We display a multiple-choice question on the screen, intended to test students' understanding of the last 5-10 minutes of lecture material;

  • Everyone votes on their preferred answer individually (some people use clickers; we use colored index cards);

  • Each student turns to a neighbor and discusses the answer for about a minute;

  • Everyone votes again.

Students we surveyed have said they enjoy this activity; we find that it keeps students more engaged during lecture (and forces us to organize our material better); and there is a ton of literature documenting real studies showing that it improves retention and comprehension.

But what are students discussing when they talk to each other?  In a 2010 paper, James & Willoughby qualitatively analyzed a sample of 361 conversations held among 147 students discussing 45 such questions in introductory physics and astronomy.

They found two fascinating things.  One is that instructors' "idealized" imagination of what mistakes students might make (as embodied by the "distractor" incorrect answers) and what kinds of conversations they'll have are often woefully wrong.  The other is the discovery of several "nonstandard conversation" types, that is, conversations in which students are not in general discussing and discarding wrong answers to converge on the right answer:

  1. Unanticipated student ideas about prerequisite knowledge (12.5% of conversations): students may share incorrect prerequisite knowledge unanticipated by instructor (including misunderstanding of very basic material), apply prerequisite knowledge in not quite the right way, or naively try to match "keywords" in the question with those heard in lecture to determine which prerequisite knowledge to apply.

  2. Guessing: Students may use cues such as keywords to select a clicker response, or may simply defer to a student they think is more knowledgeable.  So the statistical feedback provided by clickers is not necessarily representative of student comprehension.

  3. Pitfalls (37.7% of conversations): Some discussions do not surface statements about the specific reasons an answer is correct. Three variants are: no connection to the question stem ("I think it's (C), do you agree?"  "Yeah, makes sense to me"), especially when everyone agrees that the (wrong) answer is self-evident (30%) ; passive deference to another student (5%); and inability to converse because all discussants lack knowledge to even attempt the question, as stated in their transcripts (2%).  Deference was more pronounced in "high stakes" settings where getting the right answer as  a result of peer instruction counted relatively more towards the students' grade.

The pitfalls occur in roughly the same proportions for recall questions as for higher-level cognition questions.

We're working on some research to allow peer learning to happen in online settings such as MOOCs.  A nice side effect of moving such discussions online is that not only can we instrument them more closely to get a better sense of when these pitfalls happen, but we can even move students among (virtual) groups if it turns out that certain elements of student demographics or previous performance are predictors of how they'll behave in  a given group type.


Wednesday, August 6, 2014

Learning from examples: how to do it right

This survey of the learning-from-worked-examples literature highlights some best practices for using worked examples as a learning aid.

Learning from examples is most effective in stages 1 and 2 of the four-stage ACT-R cognitive framework:

  1. learners solve problems by analogy

  2. learners develop abstract declarative “rules” to guide problem solving (some generalization from step 1)

  3. learners no longer need to consciously invoke the “rules script” to solve problems

  4. learners have practiced many types of problems, so can instantly “retrieve a solution template”

Throughout the survey, “A is more effective than B”  is generally measured by pre/post testing to measure transfer in controlled experiments. In some cases a hypothesis is proposed to explain the result in terms of one or another theoretical cognitive framework; in other cases no interpretation of result is offered.

A key finding is that students who engage in “self explanation” [Chi et al., many many cites], in which a learner pauses while inspecting an example to construct the omitted rationale for a particular step, outperform those who don’t.  Here are several ways to stimulate this behavior (*) along with other best practices for creating and using worked examples:

  1. * Identify subgoals within the task.

  2. * Several partially-worked examples of varying complexity and illustrating various strategies/approaches, with enough “missing” to stimulate some self-explanation, are more effective than fewer but more-thoroughly-worked examples.

  3. * Don’t mix formats in one example, eg, use either a labeled diagram showing some concepts or a textual explanation of those concepts, but not both: the “split attention” cost actually retards learning.

  4. Don’t assigning an “explainer” role to stimulate self-explanation: it actually hinders learning, possibly because of increased stress and reduced intrinsic motivation for the learners.

  5. Visuals accompanied or immediately followed by aural comments are more effective than either visuals or comments alone.

  6. Alternate worked examples with practice problems, rather than showing N examples followed by N problems.

  7. Novices tend to overfocus on problem context rather than underlying conceptual structure; to compensate, use the same context/background for a set of different problem types.

Wednesday, July 30, 2014

Writing good multiple choice questions

Lots of us use MCQs in homeworks, exams, quizzes, etc., and there's a wealth of info on writing good ones.  Beyond the obvious things like "make all the distractors plausible", here are some tips I've distilled from various sources, which are listed at the end according to the numbered citations.

Review of assessment terminology

  • Bloom's taxonomy orders cognition levels from "lower" to "higher": knowledge (recall/memory), comprehension, application, analysis, synthesis, evaluation.  Challenge is to test the higher-level skills using multiple choice questions.

  • Reliability: the extent to which a learner's answer to a question reflects her true knowledge. Guessing and slipping can thwart it.

  • Discrimination: how well a question separates learners who really understand it from those who don't.

  • Difficulty: the median level of mastery above which students are likely to get the question right.  Difficulty and discrimination are two of the parameters that can be measured using item response theory.

  • Transfer: the extent to which successful performance on an assessment will allow valid generalizations about achievements to be made.

Checklist: the stem (base of question)

  • Write a stem that is specific to the question: this immediately focuses the question on a specific learning outcome.
    BEFORE: Which of the following statements is true?  [various statements about unit tests]
    AFTER: Which characteristic is most commonly observed in unit tests?  [rephrase choices to focus on characteristics of unit tests]

  • Don't put "fill-in blanks" in the stem—it increases student cognitive load without testing their cognition any better.
    BEFORE:  Mocks and ____ allow you to isolate behaviors in unit tests.
    AFTER: Besides mocks, what other mechanism allows you to isolate behaviors in unit tests?

Checklist: answer & distractors

  • Every answer should form a grammatically correct sentence when appended to the stem (pronoun agreement, etc.)

  • Keep items similar in length, complexity, formality, tone, etc., and avoid re-using exact wording from lecture/textbook/notes.  Otherwise student may pick the "most textbook-like" answer, the "most nuanced" answer, the longest answer, etc.

  • Either truly randomize the order of the answers, or use a deterministic rule such as alphabetical order.

  • Avoid questions where students could get right answer for wrong reason (even if not guessing): 

    • "all of the above" (students who can identify >1 correct answer can choose it even if don't understand why all answers correct)

    • "none of the above" (better, but may be chosen even if misconception of why true)

    • true/false questions (you won't know if they understand why it's true or false, plus can guess)

    • negative questions (unless learning outcome specifically requires it, i.e., being able to indicate a non-example of something; otherwise students may be able to identify an incorrect answer without knowing the correct answer).

  • Avoid complex combinations of items ("(a) and (b) only", "all of (a),(b),(c)" etc): a sophisticated test-taker can use partial knowledge to guess correct answer. (Also, students hate this kind of question since they may get no partial credit for knowing part of the answer.)
    Possible alternative: "Select all that apply" of N choices, and get 1/N of credit for correctly determining whether each choice is checked or not.

How many choices per question?  TL;DR:  Three.

The metric of interest is the number of total choices on the exam, i.e. 30 questions of 4 choices creates a comparable cognitive load to 40 questions of 3 choices, so the trade-off is really one of longer tests vs. more choices per question.

A meta-analysis of 80 years of MCQ research [] reveals both theoretical and empirical evidence for 3 total choices per question:

  • Theoretical: Frederic Lord, one of the architects of Item Response Theory, showed statistically that longer tests with fewer choices per question "increases exam efficiency for high-level examinees, and decreases it for low-level examinees".  Tversky later showed that three choices per question  maximizes the information obtained per time unit regarding students' ability.

  • Empirical: You’d think more distractors would thwart guessing, but on existing standardized and high-quality career tests, only 16% of 4-option items had 4 effective choices (ie,  all plausible enough to be chosen a nontrivial fraction of the time) and only 5% of 5-option questions had 5 functional items.

  • Caveat: the meta-analyses assume that exactly one of the choices per question is correct, and that the learner gets a single attempt to answer each question.

Types of questions that test higher levels of cognition

  • Memory + Application: instead of "Ricardo's Principle of Comparative Advantage states that…" (memory), you can ask "Which of the following is an example of applying Ricardo's Principle of Comparative Advantage?" and give N scenarios, exactly one of which illustrates applying the principle.

  • Premise-Consequence: If X happens, then which of the following will happen?

  • Analogy: X is to Y as  W is to which of the following?

  • Case study: a background paragraph serves as the setting for a series of questions that require the student to analyze the scenario from various angles.

  • Incomplete scenario: show a diagram, taxonomy, architecture, etc. similar but not identical to what's been seen in lecture/readings.  Ask students to fill in blanks, or ask questions about what makes it different from the version seen in lecture.

  • Evaluation: present both a question and a proposed answer, eg a set of design constraints and a proposed design.  Provide a rubric according to which students must indicate whether the proposed answer is correct, complete, etc.

  • Inference/higher-level reasoning: present a scenario, then ask which of n statements can reasonably be said to follow from the scenario.

Students' rules of thumb for guessing on multiple-choice tests from []  (and ways to thwart them)

  1. Pick longest or most scientific-sounding answer (make all choices comparably long and use comparable prose)

  2. Pick 'b' or 'c' (randomize or use deterministic order)

  3. Avoid choices containing 'always' or 'never' (don't use those words)

  4. If two choices are opposites, one of them is probably the answer (include 2 choices that are opposites and are both distractors)

  5. Pick keywords/phrases that were related to this topic (include keywords/phrases in distractors)

  6. True/False questions are more often true than false, since instructors tend to emphasize true things. (use both forms of a question, or avoid T/F questions)

Useful open-source tools to help prepare & grade multiple-choice exams

  • RuQL is an open-source tool I made that lets you write questions and create tests in a variety of formats (printed, HTML, edX interactive quiz, etc.).  Some command-line skillz required.

  • AutoQCM lets you generate printable answer-bubble sheets that can be scanned on a high-speed scanner and graded using open source software.  RuQL can generate answer sheets and grading keys for AutoQCM.


  1. , Dr. Timothy Bothell, BYU Faculty Center

  2. , U of Oregon Teaching Effectiveness Program

  3. ,Cynthia J. Brame, Assistant Director, Vanderbilt University Center for Teaching

  4. Using multiple-choice questions effectively in information technology education, Karyn Woodford and Peter Bancroft, Queensland U. of Tech.

  5. , Ben Clay, Kansas Curriculum Center.

  6. Three Options Are Optimal for Multiple-Choice Items: A Meta-Analysis of 80 Years of Research, Michael C. Rodriguez, University of Minnesota. Educational Measurement: Issues and Practice, Summer 2005 issue.


Friday, July 25, 2014


In an effort to widen distribution for ESaaS, I've been trying to move the "expanded distribution" channel from CreateSpace to IngramSpark.

[Since I now know more about book distribution than I ever expected to know: CreateSpace is an author-facing print-on-demand (POD) service that was acquired by Amazon.  When someone orders your POD print book on Amazon, CreateSpace fulfills it.  They also have an "expanded distribution" channel for reaching bookstores, retailers, etc., but apparently bookstores don't like to deal with them for a variety of reasons.]

In contrast, Ingram is one of the two biggest book distributors, i.e., one of two big companies that bookstores call when they place an order for a book on behalf of a customer.  Historically Ingram only dealt with publishers, but they recently launched IngramSpark, an independent-author-facing service that allows your books to be POD'd and distributed through their network.  A colleague of ours who is a publishing industry veteran suggested this would be a much better channel for that expanded distribution, since (a) bookstores are already used to dealing with them and (b) bookstores hate Amazon and there is a guilt-by-association with CreateSpace, plus CreateSpace has a "no returns" policy that is incompatible with how many bookstores operate.

So I figured, simple.  Turn off "expanded distribution" on CreateSpace, and list our book on IngramSpark instead.

Not so fast.  Apparently only one distributor at a time can handle your book, and I was told it would take 6 to 8 weeks from the time I disabled "expanded distribution" on CreateSpace for the ISBN to be "released" allowing me to list that same ISBN on IngramSpark.  Note that "released" presumably means "moving it from one electronic catalog to another."  I thought the phrase "6 to 8 weeks" went out with mail-order products in the 80s.  Hmmm.

But the real eye-opener has been customer service.

Whenever I've had a problem or question using CreateSpace, I login to my author account, click the "Call me now" button, and within 2 minutes I am talking to a live well-informed human being (not a droid) and my question always gets answered without so much as escalation.

Whenever I've had a question using IngramSpark, I have to "fill out a support request".  It can "take up to 48 hours for us to respond, although we can often respond much quicker."  Only the first part of that sentence turns out to be true: 48 hours is about right.  And the responses aren't always helpful, so I always have a followup question…which takes another 48 hours.

Today I noticed there is another help option "Request a call."  Since that had worked well on CreateSpace, I tried that option.   They will call me..."within up to 48 hours".  Great.  So instead of checking my email I have to keep my phone around.

Customer service for Ingram says: "We'll call you when we call you.  Deal with it."  Customer service for CreateSpace says: "You are a customer.  How quickly can we speak to you to resolve your problem?"  IngramSpark is owned by a bricks-and-mortar book distributor: historically, authors are beholden to them (and to the publishing industry generally) to get their books out.  CreateSpace is owned by the most successful e-tailer: historically, they are beholden to their customers to stay in business.  Draw your own conclusions.

Thursday, May 22, 2014

The transitional form

Every new medium needs a transitional form exemplar—something that demonstrates the new technique or technology, but just replicates the previous techniques and doesn't use any of the unique affordances of the new medium.

The first movie was essentially the result of pointing a stationary motion picture camera at a stage where live actors performed a play.

The Lear-Siegler ADM-3, the first CRT terminal (predecessor to the ADM-3A), was no smarter than its Teletype printer forbears: there was no cursor control and only uppercase characters.  (Its predecessor was arguably Don Lancaster's TV Typewriter, construction plans for which are described in his TV Typewriter Cookbook.)

Show Boat, while acknowledged as the first true musical where the book (story) rather than the songs was the most important element, nonetheless borrowed heavily from its vaudeville traditions, with many "production numbers" unrelated to the story to show off the chorus and dancers.

That's where we are with MOOCs in 2014.  With a few exceptions, we have yet to take advantage of the new medium's affordances.

But without the ADM-3 we wouldn't have had graphics terminals, without the movie camera we wouldn't have Pixar, and without Show Boat we wouldn't have had West Side Story.  What will we have in a few years that we wouldn't have had without "MOOCs 1.0"?

Wednesday, May 21, 2014

Dealing with exploding demand for CS

UW CS chair Ed Lazowska makes the case nicely that interest in CS is booming, and that it's not likely to be just a flash in the pan, and how are we going to meet this demand and deal with larger course sizes?

In terms of offering a high-quality course experience to ever-larger numbers of students, my view (doubtless shared by others) is that we cannot just do the same thing we've been doing but more of it; we need to find new ways to do things.  Neither can we plop the students in front of a MOOC with minimal instructor guidance, though clearly those technologies have a role to play.

In the interest of trying to figure some of this out, I tried collecting some thoughts about this based on the Berkeley experience.  I presented this as a position at the recent Dagstuhl seminar about MOOCs, but I'm also on an internal committee in Berkeley CS charged with figuring out how to expand access to CS curriculum while keeping high quality, so these thoughts are percolating there as well.  Comments welcome.

Separating Scalable from Non-Scalable Elements to Refactor Residential Course Delivery

In Fall 2013, enrollment in UC Berkeley’s introductory CS course exceeded 1,000 for the first time, reflecting a trend in top CS departments for exploding demand for CS.

Most residential courses are offered in a “one size fits all” model: a single lecture, a single set of assignments or labs that everyone does, multiple recitation sections that generally cover the same material as one another, a lab session where everyone works on the same lab with TA’s on hand to help, and perhaps small-group tutoring sessions or office hours as the only place where students with differing needs get more customized attention.  The assumption is that this combination of elements serves the “mass of the distribution” of students, stragglers can squeak by with the additional support of office hours or tutoring, and superstars can use their spare time to do undergraduate research.

Yet as enrollments grow, we observe that (a) not all aspects of offering the course scale equally well in terms of instructor resources, and (b) the “outliers” (superstars and stragglers) become more and more pronounced, with the former representing underutilized talent and the latter an  inordinate drain on instructional staff time.

I argue that MOOC technology, in the format of a SPOC (Small Private Online Course), can provide the leverage needed to refactor course resources to improve how we serve these exploding populations by separating the “scalable” from “non-scalable” elements of the course and by carefully thinking about the role MOOCs can play in each part.

Things that scale well

Certain MOOCs have demonstrated that some elements of a course scale inexpensively and well:

  • Sophisticated automatic grading, such as used in many CS MOOCs, allow nontrivial assignments to be automatically graded nearly instantly (vs. handed back a week later) and with finer-grained feedback than TA’s or readers could provide.

  • In some domains, automatic problem generation builds on autograding to support mastery learning.  Both autograding and problem generation can take advantage of inexpensive public cloud computing rather than requiring extensive on-campus infrastructure.

  • Free video distribution (YouTube) makes lectures cheap to distribute.

  • Well-structured Q&A forums such as StackOverflow or Piazza allow students to help each other, with occasional intervention from teaching staff.

Things that scale poorly, and how to address them

More obvious variance across student cohorts.  Especially for courses with “broad but shallow” prerequisites, students’ levels of preparation may vary, and during the course, different cohorts of students may need help with different topics.   Mitigation: “Just-in-time” flexible deployment of teaching staff.

High-end outliers (superstars): Superstar students are often an underused resource.  As well, an instructor whose efforts are all expended on simply managing a large course is unable to identify these superstars (and sometimes they’re not obvious) and cultivate them further (invite as research assistants, etc.)

Mitigation: Identify ways to formally recognize and train these students to be effective in helping their peers.

Low-end outliers (stragglers): Stragglers can take a disproportionate amount of staff time, leading to an Amdahl’s Law-like effect limiting course scaling.  Mitigation: combine JIT deployment of teaching staff with SPOC resources that enable mastery learning.

Learning activities that are interaction-intensive: Especially in engineering courses, most real synthesis learning occurs in design projects, but these are grading-intensive and interaction-intensive.  Mitigation: Refactor the course into multiple courses, each of which concentrates either on “high scale” or “high touch” but not both, and resource the courses differently.

The main argument is that we must examine both creating new staff roles and amplifying the productivity and leverage of those roles using MOOC technology, thereby increasing overall teaching productivity.

Rethinking Teaching Staff: New Roles & Flexible Deployment

New teaching roles are already being created, but ad hoc/post hoc.  We need to formalize these roles, resource them, and train them.  One possible factoring of roles (with some overlap among them, e.g. some community stewards may also be contributors) might be as follows:

  1. Authors/Creators make an initial set of editorial decisions that result in a narrative through a body of material,  analogous to textbook authors.  We have argued that the combination of SPOCs and e-books is a promising formula for packaging such content.

  2. Core Contributors create additional material such as assignments, assessments, tutorials and other scaffolding  within the author-provided framework, which may be used and adapted by many instructors downstream.   They might, for example, participate in direct teaching during the school year and spend the summer doing course development or analyzing the previous semester’s learning outcomes data.  The SPOC delivery model and the software supporting it make it more convenient than ever to use SPOCs for “curricular technology transfer.”

  3. Community Stewards become experts on the materials and help other instructors (including TA’s and other teaching staff) work effectively with the much larger range of materials available in a SPOC (compared to traditional textbooks).

  4. Course Managers  keep courses running  smoothly by keeping tabs on student cohorts to understand who’s having difficulty where, marshalling and deploying instructional staff to respond to those needs, responding to escalations from instructional staff, and so on.  The course manager must have domain expertise comparable to a very strong student, and may also be responsible for handling violations of academic integrity in the course.

  5. Discussion Leaders facilitate small-group discussions (analogous to today’s recitation sections) using a combination of their own and provided materials.

  6. Tutors work with small groups of students on specific material with which they need help.

  7. Students/learners also help each other in person (e.g. “guerrilla lab sections” ) and virtually (e.g. discussion forums, hangouts).

These new positions will have to be recognized and trained.

Recognition:  Residential campuses currently recognize relatively few “official” teaching roles, such as lecturer, TA/head TA, lab TA/lab assistant, reader/grader.  New roles should be recognized with a combination of academic credit and stipends.  For example, graduate TA’s receive a stipend and tuition waiver, but are also required to complete certain teaching activities as part of their PhD preparation.  At some schools, undergraduates can also be either regular or lab TA’s.  At Berkeley, a third mechanism allows undergraduates to receive credit for “Teaching in EECS” even if they can only commit 2-4 hours a week (vs. the 10 hour minimum for regular TA’s.)

Training (teaching skills): Many campuses’ current orientations and courses for teaching assistants focus on training “full” TA’s who will teach sections, create materials, grade assessments, conduct review sessions, and more.  These courses therefore cover more than is necessary for some of the other roles.  For example, “dealing with disruptive students in class” is not a topic that (e.g.) tutors would need much experience with.  Some basic training for less-than-full-TA’s might cover:

  • how to help students stuck on problems without giving away the answer

  • how to “coach” students to effectively use techniques such as pair programming or peer grading, both to evaluate each others’ work and learn from the process of doing so,

Training (Orientation to the material): Both  and the NSF CS10K project aim to train high school teachers to deliver computer science courses.  Not only will the courses themselves be delivered as SPOCs, but they are creating “teacher training SPOCs” that will be combined with live and remote interaction (Google Hangouts) to train instructors on the use of materials.

Example 1: UC Berkeley/edX CS169x, Software Engineering

This MOOC was developed based on a campus course whose enrollment had also been growing, and it features many of the elements that “scale well,” including rigorous but automatically graded programming assignments.  Over 100,000 MOOC students have attempted the course and over 10,000 have earned certificates over five offerings of it.  The course now has a facilitator who is a faculty member at another university who became excited about the material after taking the course as a MOOC student.  He marshals the volunteer community TA’s drawn from alumni of previous offerings, but is also a contributor who has created his own materials.  He also stewards a community of classroom instructors using the material in their classrooms in a SPOC model.  A SPOC is a Small Private Online Course usually based on MOOC materials, but with heavy involvement of the instructor “on the ground” in customizing and facilitating the course.  I have argued elsewhere for the potential of this model and reported on successful initial trials using Berkeley MOOC materials in a SPOC setting at half a dozen universities, finding that different instructors use different subsets of the resources, some add their own, and some either don’t use our videos  or use them to increase their own understanding of the material before presenting to their own students.  In the meantime, the course’s original authors continue to improve the foundational materials and textbook, relying on the “network” of support to efficiently disseminate those changes and create new materials around them.

Example 2: UC Berkeley CS61A, Great Ideas in Software Development

This campus-based course, which has no corresponding MOOC as of this writing, is a rigorous introduction to the main paradigms of programming—procedural abstraction, data abstraction, functional, and logic.  It is based on a transliteration into Python of Abelson & Sussman’s renowned Structure & Interpretation of Computer Programs.  For the record-breaking 1100-student offering of the course in Fall 2013, the lecturer  would huddle with TA’s on a weekly basis to understand which topics students were having trouble with, and would then deploy a subset of teaching staff to create and run “guerrilla sections” specifically covering difficult topics.  Combined with making his lectures available online in advance of the live lecture, this meant that most students don’t attend lecture and most students don’t attend the same sections.  He also recruited star alumni from the previous semester to serve as tutors or lab assistants who committed only 2-4 hours per week in exchange for academic credit designated as “Teaching in EECS”, a teaching role that most CS courses have yet to exploit.  These helpers knew the material since they were alumni of the course, but received informal training on how to address common questions on homeworks/lab exercises without giving away the answers.


  1. The current “one size fits all” model of residential course delivery is a poor fit for exploding enrollments as well as for faculty productivity in an era of tightening budgets.

  2. Separating the scalable from the non-scalable parts of a course allows the two to be resourced separately.  The scalable parts can serve the mass of the distribution while the non-scalable parts can be resourced in a way more tailored to utilizing the outliers, both the superstars and the stragglers.

MOOC technology in the form of curated SPOCs, with appropriate new teaching roles supporting the course, can play a role in both the scalable and non-scalable elements.


These ideas come from conversations with the UC Berkeley Taskforce on Computer Science Curriculum (CS-TFOC), which includes David Culler, Dan Garcia, Björn Hartmann, David Wagner; John DeNero, Dave Patterson, and Andrew Huang (Berkeley); Mehran Sahami (Stanford); Saman Amarasinghe (MIT); Sam Joseph (Hawaii Pacific University, and lead facilitator for CS 169x on edX); and many others.

Thursday, May 8, 2014

Gallup poll: how your college experience affects fulfillment & well-being at work

This morning at my semi-monthly meeting with her, Cathy Koshland, our Vice Provost for Teaching, Learning, Academic Facilities & Programs, called my attention to the recently-released Gallup/Purdue report on what factors in alums' college experiences are predictive of whether they will be "engaged" at work—not just financially stable, but emotionally and intellectually connected to what they do in a way that makes it rewarding. Gallup’s previous extensive studies of workplace engagement have found that it correlates with important economic metrics such as productivity and employee healthcare costs, so even if you don’t care about some touchy-feely “people should feel good about their work,” there’s good reason to care about engagement.

So if a major goal of going to college is to “get a good job”—leaving aside whether that is the proper goal for college—and if "good" means something more than just financially remunerative, it's worth asking whether colleges are succeeding in doing this.

The timing of this report is fortuitous: my involvement in the world of MOOCs has made clear that while there seems to be some bland agreement that "there's more to college than taking classes" (and hence a college degree is worth more than the courses it comprises, and therefore MOOCs by and large won't replace that experience), it's been slippery to articulate what that "more" is beyond using vague words like "socialization", "professional networking", "mentoring", and so on.

In contrast, this report identifies specific elements of the college experience that are predictive of "workplace engagement."  Which factors (perhaps surprisingly) do not influence it?  Public vs. private college, race or ethnicity, whether you’re first in your family to go to college, the “selectivity” of your school (i.e. whether it’s in the USN&WR “top 100”).

Which factors do influence it? The most important ones—factors that double the odds of whether you’ll be engaged at work—are as follows, together with the fraction of respondents who agreed with the statement:

  • Whether you had a professor who you believed cared about you as a person (27%)

  • Whether you had a professor who made you excited about learning (63%)

  • Whether you had a mentor who encouraged you to pursue your dreams (22%)

The report’s unclear on whether these have to be three different people, but it is striking that these are all elements that residential colleges are much better positioned to provide than MOOCs, yet most students don’t experience them (and only 14% agreed with all three). The above three criteria are clustered as “[My college] is passionate about my long-term success.” Interestingly, while selectivity of the college didn’t influence engagement, graduates of for-profit private colleges (e.g. University of Phoenix) were less likely to be engaged at work (29% of respondents) than those of not-for-profit public (38%) or private (40%) colleges.

The other general area that doubles the odds of good workplace engagement is “College prepared me well for life after college.” This is broken down into subquestions having to do with internships, extracurriculars, and multi-semester projects such as research or volunteering, but it’s provocative to ask (I think) about the social aspect of “preparation” as well. A few of us CS faculty recently discussed Harry Lewis's provocative book Excellence Without a Soul, which charges that colleges in fact are doing a terrible job at social preparation (among other duties): they are infantilizing students by sheltering them from learning from their own mistakes, rather than using mistakes as character-building teachable moments where personal growth can occur; they allow students to self-segregate by class or ethnicity or whatever; and they make themselves appealing to students by shallowly providing what students myopically say they want—thereby creating conditions in which the students expect to "blame the system" when something happens to them (personal conflicts, lower-than-expected grade, etc.) and as a result are no better socialized when they graduate than when they began.

The other interesting observation has to do with when you graduated. “Engagement” is measured in terms of 5 subcategories in each of which you are said to be Thriving, Struggling, or Suffering. When graduates are grouped by decade, graduates in the 50s and 60s reported “thriving” in all 5 categories at double the rate of graduates in the 70s-80s, and 3 to 7 times the rate of graduates since the 90s. The report states that this “highlights the important role that age plays in determining the relative influence of experiences on one’s well-being,” but an alternative hypothesis is that this reflects the university’s abdication of moral standing and the development of students’ character, as Lewis charges.

So the good news is universities can indeed offer important experiences beyond course-taking that have a profound effect on graduates’ well-being when they enter the workforce; the bad news is for the most part they’re not doing a great job of it. Lewis’s book continues this thought nicely by providing specific observations on why, and even linking the observations to other problems like academic integrity violations, grade-point-grubbing, and the rancorous debates about the role of college athletics.

Thursday, April 3, 2014

Emacs Word-to-TeX and TeX-to-Word macros

Grad students: tired of manually fixing up your professors' Word documents so you can incorporate them into your LaTeX papers?  Professors: tired of reading students' LaTeX drafts and wish you could just edit them in Word?

If you use Emacs, the below customizations can help.  Place them in your .emacs file somewhere, and run:

  • M-x texify-region to convert content pasted from Word into TeX-friendly content (It does the right thing for characters outside TeX's input set such as curly quotes, em- and en-dashes, and so on)

  • M-x wordify-region to go the other way

  • M-x empty-region to convert TeX paragraphs in the region (newline after each line, blank lines separate paragraphs) into Word-friendly text (one paragraph == one newline)

  • M-x copy-region-as-empty to do the above nondestructively, i.e. leaving your original TeX markup intact but copying a Word-friendly version to the kill ring (which doubles as the clipboard on non-broken PC & Mac implementations of Emacs)

  • M-x word-outline-to-latex  to convert a numbered headings outline (e.g. pasted from Word's outline mode view) into \section and \subsection hierarchy for LaTeX use

Have fun.

;;; turn the region into something suitable for pasting into Word or other
;;; non-ascii word processors

(defun change-many (change-list)
(dolist (subst change-list)
(goto-char (point-min))
(while (re-search-forward (car subst) nil t)
(replace-match (cadr subst) nil nil)))))
(provide 'change-many)

(defun wordify-region ()
"Nondestructively convert region of TeX and text-filled source for
pasting into MS Wurd, and leave converted region on kill ring"
(let ((buf (get-buffer-create "*wordify-temp*")))
(set-buffer buf)
(copy-to-buffer buf (point) (mark))
(set-buffer buf)
(change-many '(("\n\n" "~@@~")
("\n" " ")
("~@@~" "\n")
("``" "Ò")
("''" "Ó")
("\s-+" " "))
(copy-region-as-kill (point-min) (point-max))))))

(defun texify-region ()
"Destructively convert region of pasted-in Wurd text to be TeX-friendly."
(narrow-to-region (point) (mark))
(change-many '(("\\([^\\]\\)\\$" "\\1\\\\$")
("^%" "~@@~") ("\\([^\\]\\)%" "\\1\\\\%") ("~@@~" "%")
("’" "'")
("‘" "`")
("“" "``")
("”" "''")
("…" "\\\\ldots{}")
("\\.\\.\\." "\\\\ldots{}")
("\"\\([^\"]+\\)\"" "``\\1''")
("—" "---")
("–" "--")
("½" "$1/2$")
("¼" "$1/4$")
(fill-individual-paragraphs (point-min) (point-max))

;; ("\([^\\]\)%" "\\1\%")

(defun empty-region (nlines)
"Convert filled paragraphs to unfilled paragraphs in region. With prefix arg,
insert that many blank lines between paragraphs (default 0)."
(interactive "p")
(replace-all-in-region '(("\\\n\\\n+" "@@@@")
("\\s-*\\\n\\s-*" " ")
("@@@@" "\n"))))

(defun copy-region-as-empty ()
"Convert region to empty paragraphs and place it on the kill ring without
deleting it."
(empty-region 0)
(copy-region-as-kill (point) (mark))

(defun word-outline-to-latex ()
"Convert multilevel (numbered) Outline text pasted from Word into section,
subsection, etc. structure of laTeX."
(replace-all-in-region '(("^\\s-*[0-9]+\\s-+\\(.*\\)\\s-*$"
("^\\\\" "\n\\\\")

(defun replace-all-in-region (lst)
(narrow-to-region (point) (mark))
(let ((case-replace-search nil))
(dolist (pair lst)
(goto-char (point-min))
(while (re-search-forward (first pair) nil t)
(replace-match (second pair) nil nil))))))

Saturday, March 22, 2014

A day in Otavalo, Ecuador

Greetings from Ecuador!

We arrived last night in Quito's ultramodern, less-than-a-year-old airport conveniently located nowhere near Quito.  But that was OK since our first night's hotel reservation was in Otavalo, so we could catch the Saturday market there.

Our hotel person had kindly offered to arrange a ride from the airport, which didn't show up, so we just took a cab, which is still the most expensive thing we've done here at $60.

The airport is connected to Quito and Otavalo by a beautiful new smooth highway on which you can travel as fast as the slowest truck in front of you, which often was less than 15 mph.  So it took about two hours to get to Otavalo, and when we found our hotel it was locked down tight.  Repeated ringing of the doorbell was to no avail (despite the fact I had called ahead from the airport saying we were on our way), so I called back and after ringing for about a minute we finally got let in.  

We learned in the morning that our building was over 150 years old, and it definitely has the kind of charm that you know wouldn't survive a quake.  The rooms were super clean, modest, but the beds were fine, and in the morning it only took about 20 minutes to get the hot water going (apparently the young man who was keeping house forgot to turn on something or other).  In the meantime, though, we were kindly offered coffee by two little old ladies in the kitchen who I assume were part of the housekeeping staff; they appeared to be brewing it in a sock (Tonia editorial:  it's NOT a sock, just a fabric brewing filter used around these parts).

Hotel Riviera Sucre, Otavalo

Our main things to do in Otavalo were to see the Saturday market - both the main one and the animal market - and, time permitting, the Condor Park, a raptor-rehab-and-education bird park on a hill overlooking Otavalo.  The animal market only runs til 9am, so we splurged a dollar to take a cab over to make it in time.  This is where people come to buy and sell primarily food animals.  I was prepared for it to be much worse than it actually was, although there were definitely some chickens being handled pretty rough.  Still, most of these animals probably have it a lot better than factory-farmed animals; the only difference is we don't see those.

The main market is so large it basically takes over most of the city center, and every category of thing is for sale: produce, street food, ticky-tacky, handwoven ponchos and other garments, hats (we each got one), and it's easy to miss that some of the market stall buildings are elegant colonial-looking structures that have clearly been there a while.


The funny thing to get used to about the money here is that (a) they officially use the US dollar, although you often still get old Ecuadorian dollar coins as change, and (b) street food is basically free because it's so inexpensive - 15 cents for a fried corn dough thing, 18 cents for a sweet roll, 25 cents for another kind of sweet roll, 25 cents for three bananas - you can find more money than that lying in the gutter in Mission Street.  And cabs cost about a dollar to go anywhere in the city (it's a small place, but still, a dollar!).  So we snacked our way through the market and for lunch we split a plate of mote (basically wet popcorn), llapinguachos (potato pancakes), and pig meat pulled straight from a pig that was spit-roasted whole, so we got to look at his face while eating him.  Apparently his aborted last meal was a tomato, since that's what was stuffed in his mouth. (The plate of pig+stuff was $3.)

The afternoon activity was the Condor Park, a Dutch-owned park/reserve on a hill overlooking Otavalo that had numerous raptors including the splendid Andean condors.  They had a great bird show with free flying birds who go out and roam over the city (the park's amphitheater is on a bluff that has a tremendous view of the valley) and Tonia got to hold this very small falcon whose name I currently can't remember.



Following the locals' example, we walked the ~2.5 miles back down and enjoyed spectacular views of Otavalo and the surrounding volcanoes, plus the occasional owner-operated roadside stand serving roasted cuy...



...and finally using the lovely stairway down to Otavalo as we approached the city edge...



We finished the day by taking a bus to Quito, from which I write this paragraph.  The Lonely Planet Ecuador guide uses the words "comfortable" and "efficient" to describe Ecuador's intercity bus system.  Last week I was in Germany, and Rick Steves' Germany guide uses the same words to describe the Deutsches Bahn.  Both are true, but for very differently calibrated values of comfort and efficiency.  On the other hand, the fare system in Ecuador seems considerably simpler than Germany's: it costs a dollar per hour of travel.  (No extra charge for the loud Ecuadorean ballad-pop playing over the bus's PA system.)  What I like most about the bus is the smoothnew qa fof he wew8fhf n9ri ride.

The hotel we stayed at in Quito was disappointing due to its location, but the manager (who was very nice and tried to be helpful) suggested if we wanted to walk to dinner there were a couple of "typical" Ecuadorian restaurants just down the street.  The low-end neighborhood restaurants are comedores: out front are two or three big pots with whatever tonight's dinner is; someone explains the menu, and if you like it, you get a plate of it, school cafeteria style, and then sit at one of several large communal tables.  Apparently this evening's selection was testicle soup and sizzling wok of viscera, so we opted instead for what appeared to be a local fast food joint and had a pork chop and fries.  While standing in line there, I asked the local behind me (in Spanish) "What dishes do you like here?"  He replied "None.  I'm here because my friends wanted to eat here."  Oh well.  The pork chop and fries were good.

The only remaining challenge was sleep, something the glowing reviews on did not mention, which is unfortunate since that's how I chose the place.  Sleeping was made difficult because the LED streetlights were inches away from our window and set to Perma Noon brightness, plus occasionally a dilapidated school bus blasting cumbia and full of drunk partiers would drive by, and at around 3am someone turned up their radio full volume for no reason, plus drunks were wandering around.  And because of the old building construction, you could hear every step anyone took anywhere in the hotel.  Even with good ear plugs, it was one of the worst nights of travel sleep I've had in many years.  It was disconcerting, too, that the room doors were the type of doors found on cargo containers that can be padlocked from either side.  (My review will say "Keep looking".)

A proposal for the International Shower Rating Scale

  1. A bathable river, or a bucket of cold water and a rag

  2. There is a trickle of cool water from a spout or faucet located about four feet off the ground.  If you stoop under it and writhe, you can clean many parts of your body.

  3. Like #2, but the spout is high enough you don't need to stoop; instead you find yourself staring up into it in despair.

  4. A bucket of hot water and a rag or sponge

  5. As #3, but sometimes the water is hot, if you pay for the heater to be turned on.

  6. There is good water pressure as well as both hot and cold water, but the temperature is erratic, so you must do the Shower Dance to avoid getting scalded or frozen.

  7. Good water pressure and water that stays at the set temperature for long enough to finish your shower.


-0.5 if a badly designed or leaky enclosure causes use of the shower to flood the bathroom.

-0.5 if the drain stops up so that you spend most of the shower standing in your own filth.

Thursday, March 6, 2014

A few high-order bits from Learning@Scale

I tried to gather some notes from the excellent presentations at Learning@Scale, a new conference publishing scholarly research on large-scale online learning.  Co-chairs were Marti Hearst and I from UCB and Micki Chi who directs the Learning Sciences Institute at Arizona State University (which has a long track record innovating in online and hybrid education).

Many researchers presented great ideas and insights—based on analyzing actual data—about how learners use MOOCs, how they interact with the material, and how we might make improvements.

Here’s a few highlights, but complete information is available on the conference website:

Philip Guo (MIT, now going to U. Rochester as faculty) talked about understanding how learners in different demographics navigate MOOCs, examining ~40M events over several edX courses and segmenting by country and (self-reported) age, and tried to draw some design recommendations from the results:

  • Most learners (>1/2) jump backwards in course at some point, usually from an assignment to a previous lecture => opportunistic learners => rethink linear structure of course

  • Learners have specific subgoals that fit poorly with "pass/fail" of overall certificate: they care about specific skills, and beyond that, just try to get minimum points to pass.  => get away from single "pass/fail" and move towards something like individual skill badges?

In another talk, Guo described the properties of engaging and affordable video segments:

  • Preproduction to plan for ~6 min segments results in more engaging videos than when professor records "straight through" and expects postproduction to decide segmenting.

  • Talking head in videos is more engaging than slides-only (as measured by video drop-out rate over the length of a video).

  • Informal shots can beat expensive studio production!  Dropoff is WORSE for expensive 3-camera/studio setup.  (Different instructors/courses, but shows that expensive studio doesn't trump other things.)

  • Khan-style ("tablet drawing") tutorials beat "slides + code" tutorials.  => Use hand-drawn motion, which promotes extemporaneous/casual speaking (vs "rehearsed script") which in turn "humanizes" the presentation and makes it feel more informal/1-on-1.

SUMMARY RECOMMENDATIONS: short <6 min videos; pre-plan for short segments; talking head contributes to personal 1-on-1 feel; Khan-style informal drawing + extemporaneous beats slides + tightly scripted presentation.

Juho Kim talked about analyzing Video Drop-outs—people who don't watch all the way to the end of a video segment:

  • Tutorial videos have more drop-outs than lecture videos, but also show more "local peaks" of dense interaction events, especially around "step boundaries" in step-by-step tutorials and video "transitions" (eg, talking head => draw on screen) in lectures.

  • Re-watching videos exhibits more "local peaks" of interaction events than first-time watching.  => Learners coming back to specific points in video, vs watching linearly.

Jonathan Huang from Stanford compared Superposters (MOOC students who disproportionately participate in forums) to non-superposters: superposters tend to be older, take more courses, are 3x more likely to also be superposters in other courses, perform better (~1 stdev) in course (controlling for those who watched >90% lectures), although the margin is highly course-dependent.  And they don't "squelch" non-superposters—ratio of superposter to non-superposter responses doesn't change significantly with number of superposters.

Berkeley was well represented with two full papers and several short/work-in-progress papers.  Derrick Coetzee described how the incorporation of chatrooms into MOOCs did not result in improved learning outcomes or increased sense of community, though it did seem to engage students who don't post in the forums, and didn't hurt any learning outcomes.  This was one of several interesting examples of doing a live A/B test (“between-subjects experiment”) in a MOOC.  Kristin Stephens reported results of surveying over 90 MOOC instructors at various schools to understand what sources of information they value in understanding what's going on in their courses, and how they might want those information sources visualized.  A special-topics course taught in Fall 2013 by Profs. John Canny and Armando Fox yielded several work-in-progress papers on adaptive learning, automatic evaluation of students' coding style in Python, best practices for affordably producing MOOC video, and more.  (Drafts of all these papers are linked from the MOOCLab Recent Publications page, and the archival versions will soon be available in the ACM Digital Library.)

Eliana Feasley of Khan Academy gave a hands-on tutorial on using their open-source Python-based tools to do item response analysis of MOOC data.

More summary notes coming soon.

Book summary: A Thread Across the Ocean

A Thread Across the Ocean: The Heroic Story of the Transatlantic Cable by John Steele Gordon The most wonderful thing about the writ...