Wednesday, July 30, 2014

Writing good multiple choice questions

Lots of us use MCQs in homeworks, exams, quizzes, etc., and there's a wealth of info on writing good ones.  Beyond the obvious things like "make all the distractors plausible", here are some tips I've distilled from various sources, which are listed at the end according to the numbered citations.

Review of assessment terminology

  • Bloom's taxonomy orders cognition levels from "lower" to "higher": knowledge (recall/memory), comprehension, application, analysis, synthesis, evaluation.  Challenge is to test the higher-level skills using multiple choice questions.

  • Reliability: the extent to which a learner's answer to a question reflects her true knowledge. Guessing and slipping can thwart it.

  • Discrimination: how well a question separates learners who really understand it from those who don't.

  • Difficulty: the median level of mastery above which students are likely to get the question right.  Difficulty and discrimination are two of the parameters that can be measured using item response theory.

  • Transfer: the extent to which successful performance on an assessment will allow valid generalizations about achievements to be made.

Checklist: the stem (base of question)

  • Write a stem that is specific to the question: this immediately focuses the question on a specific learning outcome.
    BEFORE: Which of the following statements is true?  [various statements about unit tests]
    AFTER: Which characteristic is most commonly observed in unit tests?  [rephrase choices to focus on characteristics of unit tests]

  • Don't put "fill-in blanks" in the stem—it increases student cognitive load without testing their cognition any better.
    BEFORE:  Mocks and ____ allow you to isolate behaviors in unit tests.
    AFTER: Besides mocks, what other mechanism allows you to isolate behaviors in unit tests?

Checklist: answer & distractors

  • Every answer should form a grammatically correct sentence when appended to the stem (pronoun agreement, etc.)

  • Keep items similar in length, complexity, formality, tone, etc., and avoid re-using exact wording from lecture/textbook/notes.  Otherwise student may pick the "most textbook-like" answer, the "most nuanced" answer, the longest answer, etc.

  • Either truly randomize the order of the answers, or use a deterministic rule such as alphabetical order.

  • Avoid questions where students could get right answer for wrong reason (even if not guessing): 

    • "all of the above" (students who can identify >1 correct answer can choose it even if don't understand why all answers correct)

    • "none of the above" (better, but may be chosen even if misconception of why true)

    • true/false questions (you won't know if they understand why it's true or false, plus can guess)

    • negative questions (unless learning outcome specifically requires it, i.e., being able to indicate a non-example of something; otherwise students may be able to identify an incorrect answer without knowing the correct answer).

  • Avoid complex combinations of items ("(a) and (b) only", "all of (a),(b),(c)" etc): a sophisticated test-taker can use partial knowledge to guess correct answer. (Also, students hate this kind of question since they may get no partial credit for knowing part of the answer.)
    Possible alternative: "Select all that apply" of N choices, and get 1/N of credit for correctly determining whether each choice is checked or not.

How many choices per question?  TL;DR:  Three.

The metric of interest is the number of total choices on the exam, i.e. 30 questions of 4 choices creates a comparable cognitive load to 40 questions of 3 choices, so the trade-off is really one of longer tests vs. more choices per question.

A meta-analysis of 80 years of MCQ research [] reveals both theoretical and empirical evidence for 3 total choices per question:

  • Theoretical: Frederic Lord, one of the architects of Item Response Theory, showed statistically that longer tests with fewer choices per question "increases exam efficiency for high-level examinees, and decreases it for low-level examinees".  Tversky later showed that three choices per question  maximizes the information obtained per time unit regarding students' ability.

  • Empirical: You’d think more distractors would thwart guessing, but on existing standardized and high-quality career tests, only 16% of 4-option items had 4 effective choices (ie,  all plausible enough to be chosen a nontrivial fraction of the time) and only 5% of 5-option questions had 5 functional items.

  • Caveat: the meta-analyses assume that exactly one of the choices per question is correct, and that the learner gets a single attempt to answer each question.

Types of questions that test higher levels of cognition

  • Memory + Application: instead of "Ricardo's Principle of Comparative Advantage states that…" (memory), you can ask "Which of the following is an example of applying Ricardo's Principle of Comparative Advantage?" and give N scenarios, exactly one of which illustrates applying the principle.

  • Premise-Consequence: If X happens, then which of the following will happen?

  • Analogy: X is to Y as  W is to which of the following?

  • Case study: a background paragraph serves as the setting for a series of questions that require the student to analyze the scenario from various angles.

  • Incomplete scenario: show a diagram, taxonomy, architecture, etc. similar but not identical to what's been seen in lecture/readings.  Ask students to fill in blanks, or ask questions about what makes it different from the version seen in lecture.

  • Evaluation: present both a question and a proposed answer, eg a set of design constraints and a proposed design.  Provide a rubric according to which students must indicate whether the proposed answer is correct, complete, etc.

  • Inference/higher-level reasoning: present a scenario, then ask which of n statements can reasonably be said to follow from the scenario.

Students' rules of thumb for guessing on multiple-choice tests from []  (and ways to thwart them)

  1. Pick longest or most scientific-sounding answer (make all choices comparably long and use comparable prose)

  2. Pick 'b' or 'c' (randomize or use deterministic order)

  3. Avoid choices containing 'always' or 'never' (don't use those words)

  4. If two choices are opposites, one of them is probably the answer (include 2 choices that are opposites and are both distractors)

  5. Pick keywords/phrases that were related to this topic (include keywords/phrases in distractors)

  6. True/False questions are more often true than false, since instructors tend to emphasize true things. (use both forms of a question, or avoid T/F questions)

Useful open-source tools to help prepare & grade multiple-choice exams

  • RuQL is an open-source tool I made that lets you write questions and create tests in a variety of formats (printed, HTML, edX interactive quiz, etc.).  Some command-line skillz required.

  • AutoQCM lets you generate printable answer-bubble sheets that can be scanned on a high-speed scanner and graded using open source software.  RuQL can generate answer sheets and grading keys for AutoQCM.


  1. , Dr. Timothy Bothell, BYU Faculty Center

  2. , U of Oregon Teaching Effectiveness Program

  3. ,Cynthia J. Brame, Assistant Director, Vanderbilt University Center for Teaching

  4. Using multiple-choice questions effectively in information technology education, Karyn Woodford and Peter Bancroft, Queensland U. of Tech.

  5. , Ben Clay, Kansas Curriculum Center.

  6. Three Options Are Optimal for Multiple-Choice Items: A Meta-Analysis of 80 Years of Research, Michael C. Rodriguez, University of Minnesota. Educational Measurement: Issues and Practice, Summer 2005 issue.


Friday, July 25, 2014


In an effort to widen distribution for ESaaS, I've been trying to move the "expanded distribution" channel from CreateSpace to IngramSpark.

[Since I now know more about book distribution than I ever expected to know: CreateSpace is an author-facing print-on-demand (POD) service that was acquired by Amazon.  When someone orders your POD print book on Amazon, CreateSpace fulfills it.  They also have an "expanded distribution" channel for reaching bookstores, retailers, etc., but apparently bookstores don't like to deal with them for a variety of reasons.]

In contrast, Ingram is one of the two biggest book distributors, i.e., one of two big companies that bookstores call when they place an order for a book on behalf of a customer.  Historically Ingram only dealt with publishers, but they recently launched IngramSpark, an independent-author-facing service that allows your books to be POD'd and distributed through their network.  A colleague of ours who is a publishing industry veteran suggested this would be a much better channel for that expanded distribution, since (a) bookstores are already used to dealing with them and (b) bookstores hate Amazon and there is a guilt-by-association with CreateSpace, plus CreateSpace has a "no returns" policy that is incompatible with how many bookstores operate.

So I figured, simple.  Turn off "expanded distribution" on CreateSpace, and list our book on IngramSpark instead.

Not so fast.  Apparently only one distributor at a time can handle your book, and I was told it would take 6 to 8 weeks from the time I disabled "expanded distribution" on CreateSpace for the ISBN to be "released" allowing me to list that same ISBN on IngramSpark.  Note that "released" presumably means "moving it from one electronic catalog to another."  I thought the phrase "6 to 8 weeks" went out with mail-order products in the 80s.  Hmmm.

But the real eye-opener has been customer service.

Whenever I've had a problem or question using CreateSpace, I login to my author account, click the "Call me now" button, and within 2 minutes I am talking to a live well-informed human being (not a droid) and my question always gets answered without so much as escalation.

Whenever I've had a question using IngramSpark, I have to "fill out a support request".  It can "take up to 48 hours for us to respond, although we can often respond much quicker."  Only the first part of that sentence turns out to be true: 48 hours is about right.  And the responses aren't always helpful, so I always have a followup question…which takes another 48 hours.

Today I noticed there is another help option "Request a call."  Since that had worked well on CreateSpace, I tried that option.   They will call me..."within up to 48 hours".  Great.  So instead of checking my email I have to keep my phone around.

Customer service for Ingram says: "We'll call you when we call you.  Deal with it."  Customer service for CreateSpace says: "You are a customer.  How quickly can we speak to you to resolve your problem?"  IngramSpark is owned by a bricks-and-mortar book distributor: historically, authors are beholden to them (and to the publishing industry generally) to get their books out.  CreateSpace is owned by the most successful e-tailer: historically, they are beholden to their customers to stay in business.  Draw your own conclusions.

Book summary: A Thread Across the Ocean

A Thread Across the Ocean: The Heroic Story of the Transatlantic Cable by John Steele Gordon The most wonderful thing about the writ...