{"id":176,"date":"2009-07-28T21:32:35","date_gmt":"2009-07-29T04:32:35","guid":{"rendered":"http:\/\/35.225.155.113\/blog\/index.php\/2009\/07\/28\/mission_critical\/"},"modified":"2019-10-13T13:10:32","modified_gmt":"2019-10-13T20:10:32","slug":"mission-critical","status":"publish","type":"post","link":"https:\/\/www.weinstein.org\/blog\/index.php\/2009\/07\/mission-critical.html","title":{"rendered":"Mission Critical"},"content":{"rendered":"<p>Over the past week plenty of commentary<br \/>\nhas been made, <a href=\"http:\/\/pdw.weinstein.org\/2009\/07\/make-no-little-plans.html\">including my own<\/a>, in relation to the 40th<br \/>\nanniversary of the historic flight of <a href=\"http:\/\/www.google.com\/#hl=en&amp;q=Apollo+11&amp;aq=f&amp;oq=&amp;aqi=g10&amp;fp=jDavF0J_Rew\">Apollo 11<\/a>. One comment on<br \/>\nTwitter by <a href=\"http:\/\/twitter.com\/alexr\">alexr<\/a><br \/>\nhowever caught my specific attention, &#8220;SW<br \/>\nengineers should take this moment to consider if they&#8217;d trust their<br \/>\ncode to have gotten the LM onto the lunar surface safely.&#8221;<\/p>\n<p>Indeed a sobering thought at first<br \/>\nglance. In this day and age it is quite common to run into a program<br \/>\nwritten by one or more software engineers that seems unstable or<br \/>\nerror-prone. What if all those engineers had to deal with the rigors<br \/>\nof getting their code &#8220;flight ready&#8221; where billions of dollars<br \/>\nand at least 2 men&#8217;s lives at risk?<\/p>\n<p>But on second thought I think this<br \/>\ncomment does more harm than good. It seems to imply that either;<br \/>\nthe challenges of writing critically important, life-and-death<br \/>\nsoftware only occurs once in a blue moon<a class=\"sdfootnoteanc\" name=\"sdfootnote1anc\" href=\"#sdfootnote1sym\"><sup>1<\/sup><\/a><br \/>\nor that the good-old-days of elite superman programmers who wrote<br \/>\nerror-free programs are long since gone, replaced by thousands of<br \/>\nmediocre programmers writing millions of bug-infested computer code.<\/p>\n<p>Instances of life-or-death situations<br \/>\nbeing directed by computers (hardware and software) might not be an<br \/>\neveryday occurrence for a programmer, or even a once in a career<br \/>\noccurrence. But it does still occur. I recall the Professor who<br \/>\ntaught my <a href=\"http:\/\/en.wikipedia.org\/wiki\/Assembly_language\">Assembly Language<\/a> class in college mentioning his work on a<br \/>\nproject for Motorola on a car <a href=\"http:\/\/en.wikipedia.org\/wiki\/Fuel-injection\">fuel-injection<\/a> system. The engine had<br \/>\nhabit of shutting down when entering the presence of electrical<br \/>\ninterference.<\/p>\n<p>Just imagine driving down a highway at<br \/>\n55 mph only to have your car shutdown while passing by some<br \/>\nhigh-tension power lines&#8230;.. Now consider the added complexity of<br \/>\ntoday&#8217;s <a href=\"http:\/\/en.wikipedia.org\/wiki\/Hybrid_car#Design_considerations\">hybrid engines<\/a>.<\/p>\n<p>And while not every programming<br \/>\nchallenge is &#8220;life-and-death&#8221;, plenty of software code in today&#8217;s world is &#8220;mission critical&#8221; with millions, if not<br \/>\nbillions, of dollars at stake.<a class=\"sdfootnoteanc\" name=\"sdfootnote2anc\" href=\"#sdfootnote2sym\"><sup>2<\/sup><\/a><\/p>\n<p>In any case the coding of the <a href=\"http:\/\/en.wikipedia.org\/wiki\/Lunar_module\">Lunar Module<\/a>&#8216;s<br \/>\nsoftware was hardly error-free. In fact in regards to the Apollo 11<br \/>\nmoon landing two specific instances occurred with the Eagle&#8217;s<br \/>\n<a href=\"http:\/\/en.wikipedia.org\/wiki\/Apollo_Guidance_Computer\">Guidance Computer<\/a> during the critical decent to the moon&#8217;s surface.<br \/>\n<br .=\"\"><br \/>\nAt 102:38:30 Neil Armstrong <a href=\"http:\/\/history.nasa.gov\/alsj\/a11\/a11.landing.html\">calls out<\/a> a program alarm, &#8220;1202&#8221;. Ten seconds later,<br \/>\nArmstrong is asking for feedback from Houston on the error<br \/>\ncode.<a class=\"sdfootnoteanc\" name=\"sdfootnote3anc\" href=\"#sdfootnote3sym\"><sup>3<\/sup><\/a><br \/>\nHouston gives the astronauts  a &#8220;go&#8221; to continue their decent.<br \/>\nBut less than 5 minutes later, with 2000 feet separating the LM from<br \/>\nthe surface the ship&#8217;s computer issues a &#8220;1201&#8221;.<\/p>\n<div style=\"background-color: rgb(205, 205, 205);\">\n<blockquote>\n<p style=\"margin-bottom: 0in;\">\n<p style=\"margin-bottom: 0in;\"><b>102:42:13<\/b> <i>Armstrong<\/i>: (on-board): Okay. 3000 at 70.&nbsp;<\/p>\n<p style=\"margin-bottom: 0in;\">\n<b>102:42:17<\/b> <i>Aldrin<\/i>: Roger. Understand. Go for landing. 3000 feet.<\/p>\n<p style=\"margin-bottom: 0in;\">\n<p style=\"margin-bottom: 0in;\"><b>102:42:19<\/b> <i>Duke<\/i>: Copy.\n<\/p>\n<p style=\"margin-bottom: 0in;\">\n<p style=\"margin-bottom: 0in;\"><b>102:42:19<\/b> <i>Aldrin<\/i>: Program Alarm. (Pause) 1201\n<\/p>\n<p style=\"margin-bottom: 0in;\">\n<p style=\"margin-bottom: 0in;\"><b>102:42:24<\/b> <i>Armstrong<\/i>: 1201. (Pause) (On-board) Okay, 2000 at 50.\n<\/p>\n<p style=\"margin-bottom: 0in;\">\n<p style=\"margin-bottom: 0in;\"><b>102:42:25<\/b> <i>Duke<\/i>: Roger. 1201 alarm. (Pause) We&#8217;re Go. Same type. We&#8217;re Go.<\/p>\n<p style=\"margin-bottom: 0in;\">\n<\/blockquote>\n<\/div>\n<p><center><i>Second round of system issues.<\/i><\/center><\/p>\n<p>What was a <a href=\"http:\/\/history.nasa.gov\/alsj\/a11\/a11.1201-pa.html\">1201 and 1202 type error<\/a>?<br \/>\nOnly that the Apollo Guidance Computer was indicating that it was<br \/>\noverloaded with data inputs, couldn&#8217;t keep up and was resetting<br \/>\nitself.<\/p>\n<p>Yeap, that&#8217;s right, the guidance<br \/>\ncomputer for the LM rebooted, at least twice, during one of the most critical phases of the mission because<br \/>\nit ran out of memory.<\/p>\n<p>The problem? An error in one of the<br \/>\ncrew&#8217;s check lists had them turn on the rendezvous radar during the<br \/>\nlanding phase. Of course the LM crew was hardly trying to rendezvous<br \/>\nwith the <a href=\"http:\/\/en.wikipedia.org\/wiki\/Command_module\">Command Module<\/a> during their decent, but the repeated calls<br \/>\nto the computer to process imaginary rendezvous radar data filled up<br \/>\nthe limited writable computer memory<a class=\"sdfootnoteanc\" name=\"sdfootnote4anc\" href=\"#sdfootnote4sym\"><sup>4<\/sup><\/a><br \/>\nthe on-board system had, causing the system to repeatedly restart.<\/p>\n<p>Now I suppose somebody will argue that<br \/>\nthe computer was hardly to blame. It was a user-generated error<br \/>\nturning on the rendezvous radar (or a documentation error) not a<br \/>\ncomputer programmer error. Moreover the program was designed to<br \/>\nreset itself if it got overloaded on purpose.<a class=\"sdfootnoteanc\" name=\"sdfootnote5anc\" href=\"#sdfootnote5sym\"><sup>5<\/sup><\/a><\/p>\n<p>But, that&#8217;s just it. No programmer, not<br \/>\nmatter how good, can take into account every possible error or<br \/>\nmisuse, whether created by the programmer or the user.<a class=\"sdfootnoteanc\" name=\"sdfootnote6anc\" href=\"#sdfootnote6sym\"><sup>6<\/sup><\/a> Would you have<br \/>\nconsidered at first that an over-head power line might scramble your<br \/>\ncar&#8217;s fuel injection system?<\/p>\n<p>This is where the programming concept<br \/>\nof <a href=\"http:\/\/en.wikipedia.org\/wiki\/Fault-tolerant\">fault-tolerant<\/a> programming comes into play. The idea is pretty<br \/>\nbasic; enable the system to continue operating properly in the event<br \/>\nof an error. Just as the Apollo spacecraft (and <a href=\"http:\/\/en.wikipedia.org\/wiki\/Saturn_v\">Saturn V<\/a><br \/>\nlauncher) had mechanical backups to keep the physical system running<br \/>\nin case of failure the guidance program (and properly designed<br \/>\nprograms of today) manage the error and keep it from causing<br \/>\ncatastrophic results.<\/p>\n<p>Thus the statement should not be, engineers consider if you&#8217;d trust your code to get the LM to<br \/>\nand from the moon safely. Instead it is, do you consider your<br \/>\nsoftware fault-tolerant enough to get one to the moon and back<br \/>\nsafely?<\/p>\n<hr>\n<p>Interesting side note there is a community programming effort that has created a software <a href=\"http:\/\/en.wikipedia.org\/wiki\/Emulator\">emulator<\/a> of the Apollo hardware and software, <a href=\"http:\/\/googlecode.blogspot.com\/2009\/07\/apollo-11-missions-40th-anniversary-one.html\" id=\"nx88\" title=\"virtualagc\">Virtual AGC and AGS<\/a>.<br \/>\n&nbsp;&nbsp;<\/p>\n<hr>\n<div id=\"sdfootnote1\">\n<p class=\"sdfootnote\"><a class=\"sdfootnotesym\" name=\"sdfootnote1sym\" href=\"#sdfootnote1anc\">1<\/a> Pardon<br \/>\nthe pun.<\/p>\n<\/div>\n<div id=\"sdfootnote2\">\n<p style=\"margin-bottom: 0in;\"><a class=\"sdfootnotesym\" name=\"sdfootnote2sym\" href=\"#sdfootnote2anc\">2<\/a> And by indirect implication the lives of the employees, customers, stockholders,<br \/>\net al.<\/p>\n<p style=\"margin-bottom: 0in;\">\n<\/div>\n<div id=\"sdfootnote3\">\n<p class=\"sdfootnote\"><a class=\"sdfootnotesym\" name=\"sdfootnote3sym\" href=\"#sdfootnote3anc\">3<\/a> Sounds<br \/>\neerily familiar for any modern day computer user; the computer reports some<br \/>\ncryptic error code and the next step is to go searching for<br \/>\nadditional information on what&#8217;s gone wrong.<\/p>\n<\/div>\n<div id=\"sdfootnote4\">\n<p class=\"sdfootnote\"><a class=\"sdfootnotesym\" name=\"sdfootnote4sym\" href=\"#sdfootnote4anc\">4<\/a> Now<br \/>\na days we classify memory as Read-Only Memory (ROM) or Random Access<br \/>\nMemory (RAM) and talk about <a href=\"http:\/\/en.wikipedia.org\/wiki\/Gigobyte\">Gigabytes<\/a> (10<sup>9<\/sup>) of RAM for a laptop (or even a<br \/>\nsmartphone). The Apollo Guidance Computer? About 64 <a href=\"http:\/\/en.wikipedia.org\/wiki\/Kilobyte\">Kilobytes<\/a> (10<sup>3<\/sup>) of ROM<br \/>\nand only 2 Kilobytes of writable RAM.<\/p>\n<\/div>\n<div id=\"sdfootnote5\">\n<p class=\"sdfootnote\"><a class=\"sdfootnotesym\" name=\"sdfootnote5sym\" href=\"#sdfootnote5anc\">5<\/a> The idea was to clear the fault and reestablish import tasks, i.e.<br \/>\nclear out the waiting calls for calculating unnecessary  rendezvous<br \/>\ntelemetry and reestablish jobs for processing landing  telemetry.<\/p>\n<\/div>\n<div id=\"sdfootnote6\">\n<p class=\"sdfootnote\"><a class=\"sdfootnotesym\" name=\"sdfootnote6sym\" href=\"#sdfootnote6anc\">6<\/a> And just in case you wish to insist that the programmers of yesteryear were superman, well turns out one <a href=\"http:\/\/www.netjeff.com\/humor\/item.cgi?file=ApolloComputer\">uncorrected bug<\/a> could have crashed the LM by trying to flying the craft first &#8220;under&#8221; the surface then back &#8220;over&#8221; the surface and then &#8220;onto&#8221; the surface for a safe landing.<\/p>\n<p class=\"sdfootnote\">\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Over the past week plenty of commentary has been made, including my own, in relation to the 40th anniversary of the historic flight of Apollo 11. One comment on Twitter by alexr however caught my specific attention, &#8220;SW engineers should take this moment to consider if they&#8217;d trust their code to have gotten the LM [&hellip;]<\/p>\n","protected":false},"author":4,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[92,117,118],"tags":[202,201,196,199,200,203,78,190,7,43,106,146],"_links":{"self":[{"href":"https:\/\/www.weinstein.org\/blog\/index.php\/wp-json\/wp\/v2\/posts\/176"}],"collection":[{"href":"https:\/\/www.weinstein.org\/blog\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.weinstein.org\/blog\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.weinstein.org\/blog\/index.php\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/www.weinstein.org\/blog\/index.php\/wp-json\/wp\/v2\/comments?post=176"}],"version-history":[{"count":3,"href":"https:\/\/www.weinstein.org\/blog\/index.php\/wp-json\/wp\/v2\/posts\/176\/revisions"}],"predecessor-version":[{"id":735,"href":"https:\/\/www.weinstein.org\/blog\/index.php\/wp-json\/wp\/v2\/posts\/176\/revisions\/735"}],"wp:attachment":[{"href":"https:\/\/www.weinstein.org\/blog\/index.php\/wp-json\/wp\/v2\/media?parent=176"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.weinstein.org\/blog\/index.php\/wp-json\/wp\/v2\/categories?post=176"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.weinstein.org\/blog\/index.php\/wp-json\/wp\/v2\/tags?post=176"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}