Tuesday, February 21, 2012

Apprehensive, but inspired by Jennifer Widom’s blog. (And no, the book isn’t free.)

So our online SaaS class launched yesterday, with 62k students and counting.

Not having done this before, of course I’m apprehensive.  Will people “get” the material the way we explain it?  Will the book be useful (to those who are buying it)?  Will our autograders (which go far beyond the multiple-choice autograders used in previous courses on Coursera) scale?  Will the material appeal to most of the people taking the course, whose educational profile is pretty different from that of the Berkeley undergraduates for whom the course was originally designed?

The good news is that the course at Berkeley is going quite well, even with lots of new material since any previous offering and even dry-running some of the technologies that will be used for the online version.

And I’m also inspired by Prof. Jennifer Widom’s blog post “from 100 students to 100,000” about her recent experience teaching her Intro to Databases course at this scale.  I found myself mentally saying “+1? to a lot of her statements, such as “Creating these [multiple-choice but nontrivial] exams, at just the right level, turned out to be one of the most challenging tasks of the entire endeavor”; “having 60,000 students is the need for absolute perfection: not one tiny flaw or ambiguity goes unnoticed”; and the emails from students who were “unabashedly, genuinely, deeplyappreciative” (her emphasis).

We’ve received a few nice emails like that too, although like Prof. Widom, we’ve also (already!) got a handful of complainers.  So far, because there isn’t much actual content to complain about yet (the course just launched yesterday), most of those have been either about the fact that the book is not readily available in their country (to which I’m sympathetic) or the fact that it isn’t free (to which I’m not).  One person was wondering why we aren’t paying them to give us feedback on this early version.  (I guess this person doesn’t post reviews on Amazon or Yelp either, unless those companies have a payola system I don’t know about.)

The problem of the book not being available in some places is vexing.  Most of these complaints have come from students in the Middle East.  I hope they realize that we don’t control where Amazon does business and that we are actively looking at options for wider distribution, though I don’t know that we will solve this problem in time for the current offering of the class.  But we really do want to make the book available to as many people as possible.

That said, some people are apparently multiplying 60,000 by $10 (the price of the ebook) and assuming we (the authors) are cackling to ourselves while sleeping on a big pile of money, or expressing some level of indignation that we’re not giving the book away.  The facts are more modest—fewer than 5% of enrolled students have bought the book, we don’t receive anywhere near100% of the price of each copy sold, we haven’t seen a penny of revenue yet (it takes over 60 days to actually get paid, and the book wasn’t available til mid-January), and it’s cost roughly $20,000 of our own money so far (not Berkeley’s money) and thousands of additional hours of our own time (in addition to our regular duties at Berkeley, so it comes out of our weekends and vacations) to create.  That’s not counting the extra time (also our own) to adapt the course materials, develop autograders that (we hope) will provide meaningful feedback on programming assignments, and so on—work that we wouldn’t have done for the on-campus version and was undertaken specifically to do the best job we could with the online version.  It’s great that ebooks make the cost of distribution  nearly zero, but that doesn’t mean the cost of designing and creating the content is also zero. (Just ask my spouse if you don’t believe me!)

So that’s why the book isn’t free, and relax, we are not doing this in order to quit our day jobs.  Indeed, one might conclude that we actually like our day jobs quite a bit if we are willing to do all the extra work (for no extra compensation) of repurposing the course to reach 60,000 students within the constraints of Coursera’s infrastructure, despite the fact there is an active thread on the course forums about “how can I get a copy for free.”

So, to those of you who’ve expressed gratitude and well-wishes, we thank you deeply, and remind you that attitudes like yours are one of the reasons we LIKE teaching and were foolhardy enough to try this project.  We really and truly hope you will get something positive out of the course and that you’ll be motivated to give us constructive criticism on how to improve it, and when the inevitable infrastructure issues do occur, we hope you will be patient as we try to work them out.  We’re trying all kinds of stuff that even other courses on Coursera haven’t tried yet, especially where autograding is concerned.

And to those of you who believe we are doing this as a secret plot to cash out early, or who believe it is your right to get the book for free for whatever reason, sorry to disappoint you but I’m afraid we are just not as cynical as you.  We hope you get something out of the course anyway, and respectfully ask that you respect our work and our effort.

(Personally, like most people I believe that eventually these courses will have to charge some kind of tuition or find an underwriting model, since the expenses are nontrivial: we’ve used our connections with Amazon, Google, Microsoft and GitHub, among others, to secure donations of free products and services to support the class, but probably not everyone can do that.  My hunch is that if direct tuition were involved, even if it was only $10, a lot of these complaints would go away.  I spend a lot of volunteer time helping to run a small theater, and one thing we’ve learned is that if you give product away, some people conclude that it has no value, and they are the ones who tend to complain the most loudly.  The ones who pay usually say “I can’t believe you don’t charge more.”  It’ll be interesting to see how this plays out for online courses.)

Saturday, February 18, 2012

Adventures in self-publishing with LaTeX and Ruby

I’ve had a few comments and questions from people who have seen both the print edition and ebook edition of ELLS about what toolchains we used to do all this, and in general what our experience has been with self-publishing.  This post addresses both, in that order.

One of these days I’ll extract the toolchain from the content and post it somewhere, but in the meantime here is a brief description.  (UPDATE: I’ve done this.)

Technology: LaTeX

Like most EECS academics, I use LaTeX.  I like the fact that it separates logical structure from visual and physical formatting, has extensive support for cross-referencing, and all the rest.  Very early on I knew this was the way to go, since our goal was to have a single set of source materials from which we could automatically generate multiple very different formats.

The LaTeX macros I came up with are nothing unusual for advanced LaTeX  users; essentially, each type of book element—sidebar, chapter teaser, elaboration, fallacy/pitfall, etc., gets its own macro.  In the PDF version, the macros expand into sometimes-nontrivial low-level TeX formatting.  Those of you who’ve worked with complex macros know that it is a minefield of \protect, \expandafter, \relax and all kinds of other incantations to make up for the fact that TeX kind of looks like a programming language, but really isn’t one.  The PDF output is produced by fairly conventional (if complex) macros and the usual latex-bibtex-latex-makeindex-latex flow that you’re already used to.

Technology: tex4ht (aka htlatex)

The two most popular ebook formats right now are .mobi (used by Amazon Kindle) and .epub (used by most non-Amazon ebooks); both are based on markup, with .mobi using a lobotomized-and-then-extended version of HTML 3.2 and .epub being essentially XHTML.  The ebook editions are produced by running the same input through tex4ht, which bolts a back-end capable of emitting XML or HTML onto the TeX engine.

Tex4ht is remarkably difficult to learn to configure and use: there’s a lot of moving parts, documentation is sparse (the single best source is The LaTeX Web Companion, but I still need to scour TeX user groups and newsgroups when I run into problems), and information about how to expand different kinds of TeX constructs is split across different files.  The tex4ht macros are, for example, the place where sidebars get typeset as special div’s with gray backgrounds, where fallacies and pitfalls get a small .gif icon inserted, and so on.  In other words, for the most part, the tex4ht macros expand into HTML or XML markup that do approximately the right thing for preparation to ebook conversion.

The output of tex4ht has to undergo considerable postprocessing.  In some cases it’s because ebook formats have additional proprietary tags that need to be added, such as <mbp:pagebreak/> to force a new page in the Mobi format.  In other cases it’s because the default behavior of tex4ht is to insert markup that makes something look ugly, and I wasn’t able to find a way to change the behavior or turn it off.  In yet other cases, it’s because bugs in tex4ht cause weird LaTeX escape sequences to “leak into” the HTML code.  In any case, there is a long-ish Ruby script that uses the Nokogiri XML parser (a Ruby front-end to libxml2) and lots of XPath-fu to do very serious chainsaw surgery to the output of tex4ht.  The main reason to use tex4ht is that it can digest LaTeX and preserve things like cross-references, which is very valuable.

Technology: Pictures, Screencasts, and Code

We prepared our figures using OmniGraffle, and the versions in the book are PDF-encapsulated resolution-independent vector graphics.  However, since ebooks essentially follow HTML-like conventions, the figures need to be converted to formats such as GIF or PNG for ebook inclusion.  As well, they need to be layer-flattened, color-quantized, alpha channels removed, etc., or they will choke the ebook converters.  Most of this is taken care of by good old Makefile technology and free programs such as convert (based on RMagick and ImageMagick).

Each code example in the book is kept in its own tiny file, with LaTeX wrapper macros like \codefile used to include it.  Besides allowing code to be dealt with separately for PDF vs. ebook, this also enables another piece of automation, a Ruby script I wrote that uses Pastebin’s RESTful API to automagically keep the URL’s to each code example in the book up-to-date as the examples change.  (And in the ebook version, the URL’s are actually live links to Pastebin.)  Pastebin has proven so useful for this that I post “demo code” that I use during lecture, so that students can get access to the code examples after lecture.  Similar automation keeps the in-book descriptions and links to theVimeo screencasts up-to-date.

Technology: Ebook Conversion

The final step in creating an ebook is performed by KindleGen, a Mobi converter that Amazon modified and released as a free download to encourage Kindle authors.  KindleGen takes a single ginormous HTML file, a collection of referenced assets (images mostly), and some other arcane files, and creates a .mobi file suitable for uploading to Amazon’s Kindle Store.  One of the other arcane files it needs is the .opf (Open Packaging Format) file, which specifies book metadata such as the authors and ISBN number and BISAC codes; the KindleGen documentation includes an example of this.  The more challenging one is the .ncx file, which is used for navigation between chapters and providing a “page map” for the Kindle book. I wrote another Ruby script that generates this file automatically by parsing the output of LaTeX’s intermediate files created when it processes macros like \chapter and \section that generate TOC entries.


The book’s icons and covers were designed by our talented alumnus Arthur Klepchukov, who’s been working with us on a contract basis to do this work and also single-handedly put together the iBooks edition.  For better or worse, though, the overall layout of the book elements in the print versions is my responsibility.  Dave and I heavily borrowed great ideas from his previous successful textbooks (though we didn’t actually copy the layout or typography, so no need to sue us, publishers!).


All of the above steps are performed automatically by a fairly hairy Makefile.  The only manual adjustment needed is to ensure that sidebars and other elements don’t overflow the gutters or margins of the print book, since LaTeX often can’t ensure this on its own.  The most troublesome edition so far has been the iBooks edition, since iBooks Author does not support automatic importing of ePub; we’re hoping future versions of iBooks Author fix this.

So that’s how we start with a set of LaTeX sources, code snippets, PDF vector images, and MP4 movies, and produce multiple versions of a book.

And budding authors, before you ask: writing in LaTeX is more like programming than writing.  This definitely isn’t a “user friendly” solution for authors unless they happen to be math or engineering geeks already very comfortable with LaTeX and the Unix/makefile way of doing things.  There’s no point-and-click here, unless you count “pointing” at the screen in rage while “clicking” the ice cubes in your bourbon because some trivial error caused LaTeX to spew 400 lines of incomprehensible error messages.  So people who offer formatting services for self-publishers will have some job security for a while.

Wednesday, February 15, 2012


The online version of our SaaS course officially starts this coming Monday, 20-Feb-2012, although we “soft launched” this week and are putting up some of the introductory videos now.

We’ve been busily debugging the homeworks, autograder and other technology at Berkeley, hoping it won’t melt down when lots more students take it.

But surprisingly, those aren’t the things that made my heart skip a few beats the last couple of weeks.

It turns out that when you have 60,000 people enrolled, if 0.1% of them experience a problem, you’ll immediately get 60 identical emails.  And these people are resourceful: they simply Googled the names of the instructors and sent email to our personal email accounts at Berkeley.  Here are some problems those people had in the last couple of weeks.

Fun event #1: Kindle ebook disappears from Amazon.com

About 2 weeks ago, the Kindle ebook mysteriously disappeared from the Kindle store, showing “currently unavailable” if you visited its page.

  • We received no email warning before this happened.

  • No further explanation was provided on the Author/Publisher dashboard as to why it occurred.  In fact, according to the dashboard, our ebook was “live” and available for sale.

  • There was no way to appeal except by sending email via the Amazon KDP “author support” form; we had done this in the past and it takes 3-4 days to get a response to such requests.  No phone number is provided for author support.  (In contrast, CreateSpace, the print-on-demand company handling the print version that ironically is owned by Amazon now, has excellent telephone support.)

  • Desperate, we used our academic connections to the highest levels of Amazon to get this looked at.

  • We learned that there was a formatting issue with our ebook, and apparently when enough customers complain about that, the ebook is pulled.  Amazon’s system was supposed to have sent us an email notification of the problem to give us a chance to fix it, but due to a bug on their end, that email never got sent.

  • Once the escalation occurred, everything was resolved within a day; but if we hadn’t had higher-up contacts at Amazon, we would have been screwed.

Fun event #2: saasbook.info mysteriously shut down for “terms of service violations”

Last week the book’s website, hosted on Google Sites, was mysteriously shut off by Google “for terms of service violations”.  This was puzzling and panicking, since we had just announced to 57,000 students that they could start perusing the book, and this site was where they were directed to go to get it.  Plus, having the students see “This site has been taken down for TOS violations” made it sound like we were fronting pornography or running a link redirector or something equally questionable.

The scenario was eerily similar to the Amazon problem:

  • We received no email warning before this happened.

  • No further explanation was provided on our Google Sites dashboard as to why it occurred.

  • There was no way to appeal (except to click a single button that said “Appeal” with no other explanation).

  • Desperate, we used our academic connections to the highest levels of Google to get this looked at.

  • We learned that our site had been mistakenly manually misclassified as spam—which was puzzling for any number of reasons).

  • Once the escalation occurred, everything was resolved within a day; but if we hadn’t had higher-up contacts at Google, we would have been screwed.

Fun event #3: courseware VM can’t be downloaded or uploaded

We had numerous complaints from people unable to download the courseware VM via our AppEngine front end, and our TA’s were having trouble uploading the image file. Apparently, there were two problems.  One is that Google’s blobstore sometimes sporadically throws an error that doesn’t occur till the end of the upload, i.e. after spending 10 minutes uploading a 1.7GB file.  The error occurs inside one of the wrappers for the BlobStore API, and doesn’t get logged or rescued, so even though we have tech support at Google for this, there’s no way to show them what error occurred.  We ultimately transmitted the VM image file to a colleague at Google who was able to upload it via Google’s intranet.  If we hadn’t had this higher-up contact at Google, we would have been screwed.  (See a pattern yet?)

The other problem seems to occur for people whose Internet service is anemic.  The downloads get throttled and take so long that their ISP’s time out their TCP connection. We’re going to recommend to these folks that they use a download manager; one of the users discovered by accident that deeplinking to the AppEngine app works just fine.

We’re also going to torrent the file, and we’ve  provided an alternative to downloading a VM image—we created an Amazon Machine Image that can be used on EC2.  (A shout out to Yarko Tymciurak in the Chicago area who got us started on this!)  However, for students who don’t want to pay for EC2 usage, the free “micro” tier is just barely adequate to do this work—the CPU pins right away when running tests or builds.

Fun event #4: Kindle ebook updating doesn’t quite work the way we were told

An early and influential factor in our decision to focus heavily on an ebook was that we were assured that if we made significant changes to the ebook, we’d be able to quickly push them out to students by notifying Kindle author support, which would then notify purchasers of the ebook that they had the option of re-downloading a newer version.

This is partly true.  We tried this process last week, since we did in fact fix a bunch of typos and formatting issues reported during the first 4 weeks of the Berkeley course.  It took 3 days for us to get a response from Kindle author support.  The response was that we should submit detailed information about what changes were made, and “within 2 weeks” they’d make a decision of how and whether to handle our request to push the update.  So much for instant update.  They seemed to imply, though, that there was a way they could enable students to request a re-download if we notified students that one was available, so we’re going to try to go that route.  I hope this works, since we expect at least one more rev of the ebook by the end of the online course and then another before the planned summer offering of the online course.

Fun event #5: our Kindle book price is apparently just a suggested price

We had various ebook purchasers complain that although our own Web site says the ebook costs US$9.99, Amazon was displaying a higher price for them, as much as 27% higher in some cases.

It turns out that Amazon can, at their discretion, charge more for your ebook in certain territories “where their operating costs are higher”.  So it was with chagrin that we read these emails, given that we had agonized over the price a fair amount and ultimately decided to keep it below $10.

It also turns out that Amazon can, at their discretion, charge less for your ebook if they’re doing it to price-match a competitor or for other promotional purposes.

But things are rolling along anyway

But the news isn’t all bad.  Kindle ebooks are outselling print books by more than 4 to 1.  We’ve even gotten complaints that people in Indonesia can’t buy the ebook (which is unfortunate, but it’s humbling that we have followers that far away).

We can’t tell how many people are buying both the print book and ebook, since we haven’t been able to work out bundling. (Though we are working on it.)  We also can’t tell how many additional people, if any, have downloaded the free Kindle sample (which roughly corresponds to the first chapter of the book, therefore the first week of class) and may be considering buying the full version later.

We have a very  nice iBooks version about to come out with interactive self-assessment questions and screencasts built right in, and we will probably do a Nook/ePub version for people with non-Kindle ebook devices (though that’s a lower priority).

Overall, it’s been a major learning experience so far trying to reach this many people.  Next week comes the real test…

Thursday, February 2, 2012

Dry-running homeworks & quizzes for saas-class.org

When we signed up to offer the free saas-class.org, we decided (wisely, in retrospect) to pipeline it to start a few weeks after the on-campus course.

The rationale was that we’d have a chance to field-test the homeworks and quizzes on real students, debug the questions and answers, and fix them up in time for the online class.

We just got through grading the first programming homework using the autograder.  Tellingly, students made a number of (understandable) errors that we hadn’t thought of, so we had to change our specfiles to give partial credit for cases where some of these unforeseen errors were made.  (The autograder runs a bunch of specs on each submission, possibly with different weights toward the overall score, and reports the overall score and which specs failed.)

We also found, not surprisingly, quiz questions that we thought were unambiguous but actually needed fixing.

And of course, we commonly make minor errors in lecture slides that are corrected after lecture, or add clarifications to lecture slides based on questions received in class.

So the bottom line is the online students will benefit from having had the lectures, homeworks and quizzes pre-tested by a talented (and patient!) group of students on campus.  The students of CS 169 Spring 2012 say “you’re welcome.”

Book summary: A Thread Across the Ocean

A Thread Across the Ocean: The Heroic Story of the Transatlantic Cable by John Steele Gordon The most wonderful thing about the writ...