<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-1913655139480042191</id><updated>2011-07-07T18:21:08.995-07:00</updated><category term='yahoo'/><category term='cloud computing'/><category term='hadoop'/><category term='google'/><title type='text'>Jeff on Cloud Computing</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://jeffeastman.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1913655139480042191/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://jeffeastman.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Jeff</name><uri>http://www.blogger.com/profile/06071361174103649709</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://bp2.blogger.com/_bjCfFqZoyrQ/R-rAdgO4RVI/AAAAAAAAAAU/wSeI9Ij6eDQ/S220/JeffSmall.jpg'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>17</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-1913655139480042191.post-5013169694941192850</id><published>2010-09-17T15:20:00.000-07:00</published><updated>2010-09-17T19:07:17.107-07:00</updated><title type='text'></title><content type='html'>&lt;a href="http://mahout.apache.org/"&gt;Mahout&lt;/a&gt; committers Ted Dunning, Grant Ingersoll and I met with some of our Mahout user friends over dinner at Panera's in Millbrae last night. The study of Machine Learning for me has always been a sequence of little mysteries to solve and this evening proved to be no exception. Ted kicked off the conversation with a provocative statement that ML is really about different ways to extract [meaningful] models from large volumes of data and that classification, clustering, SVD (singular value decomposition) and recommendation are all really just different ways to skin the same cat. It seemed preposterous at first. He drew a box with lots of arrows going in on the left and just a few arrows coming out on the right to illustrate how each of these processes consume volumes of data and produce much smaller and more concise models of it. He went on to say that each of these techniques is better than its brethren at extracting certain kinds of meaning and that real world data often will require more than one of these techniques to be chained together to gain accurate insight (more meaningful models).&lt;br /&gt;&lt;br /&gt;We've been having some discussions on the dev@mahout.apache.org mailing list recently about how to unify our clustering and classification data structures in order to make them more "plug and play". I had done some refactoring of the clustering data structures in order to eliminate a lot of redundant code and unify their behaviors. Ted had introduced an AbstractVectorClassifier a couple of months ago as a way of unifying all the classification algorithms and was looking at one of its new subclasses, the VectorModelClassifier; in the clustering package. Where had it come from? After reviewing the code I recalled it as an experiment I'd done to see if I could integrate our new clustering models into the classification framework. I had not intended to commit it at the time and so I didn't recognize it at first but there it was: a classifier that could classify vectors based upon the model output of any of our clustering jobs. The beginnings of integration were at hand.&lt;br /&gt;&lt;br /&gt;All of our clustering jobs can perform a final job step which assigns each input vector to one or more of the models which the clustering has produced. Said differently, they can all &lt;span style="font-style: italic;"&gt;classify&lt;/span&gt; each input vector to one or more of the models. And when I think about the cluster-creation steps that our clustering algorithms all perform as &lt;span style="font-style: italic;"&gt;training&lt;/span&gt;, the unification becomes even clearer. Of course, Ted pointed out, clustering is really just unsupervised classification and classification is really just supervised clustering. I think I'm starting to get it! Both consume large volumes of raw data and produce, either supervised or not, a smaller set of models that characterize the data: its meaning.&lt;br /&gt;&lt;br /&gt;So what about SVD? Our SVD implementation uses Lanczos' algorithm to produce a set of eigenvectors and their associated eigenvalues from an input matrix. The eigenvectors and eigenvalues are typically much smaller than the original data and may be used in place of it for many computations. Hey, they're models too! The clustering of text documents; for example, typically involves a very high dimensionality, sparse, term vector for each document in a corpus. If one tries to cluster these raw vectors one often confronts "the curse of dimensionality" and the clustering does not produce useful results. If, instead, one uses SVD to first reduce the dimensionality of the term vectors and then clusters that data the results are often considerably improved. To summarize, SVD is a process which extracts a [meaningful] set of models (the eigenvectors and eigenvalues) from the data. Because it is unsupervised, might one think of it as a form of clustering? IDK. At least it is one of the Mahout services that can be chained together with clustering to produce more insightful results.&lt;br /&gt;&lt;br /&gt;Matrices are also used a lot by our recommender services to recommend items to users based upon some metrics of user preference for each item. These co-occurrence matrices are generally large and unwieldy. In user based recommending, the goal is to recommend items to users based upon what items similar users found most interesting and the co-occurrence matrix has size equal to the number of users squared; often a huge matrix. In item-based recommending, the goal is to recommend based upon which items are similar to each other and the co-occurrence matrix has size equal to the number of items squared; usually smaller but still quite large. SVD can be used in both cases to reduce the dimensionality of the co-occurrence matrices. And so too can clustering services be used within a recommender engine to codify the similarity metrics used to make the recommendations. These services really do need to plug and play together.&lt;br /&gt;&lt;br /&gt;Ok, I'm having a bit of an epiphany here and this may not all be spot on. But the proposition that the parts of Mahout which I've always viewed as being unrelated are actually interdependent is starting to grow on me. It's kind of a grand unification theory which may well lead to further integration and other improvements in the Mahout service portfolio as it plays out. A few mysteries got solved last night and a few more got added to the list. An evening well spent.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1913655139480042191-5013169694941192850?l=jeffeastman.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jeffeastman.blogspot.com/feeds/5013169694941192850/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1913655139480042191&amp;postID=5013169694941192850' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1913655139480042191/posts/default/5013169694941192850'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1913655139480042191/posts/default/5013169694941192850'/><link rel='alternate' type='text/html' href='http://jeffeastman.blogspot.com/2010/09/mahout-committers-ted-dunning-grant.html' title=''/><author><name>Jeff</name><uri>http://www.blogger.com/profile/06071361174103649709</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://bp2.blogger.com/_bjCfFqZoyrQ/R-rAdgO4RVI/AAAAAAAAAAU/wSeI9Ij6eDQ/S220/JeffSmall.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1913655139480042191.post-3865711621097559626</id><published>2009-05-09T20:30:00.000-07:00</published><updated>2009-05-10T19:51:58.573-07:00</updated><title type='text'>A Pair of Cloud-related Talks by Me</title><content type='html'>I've been procrastinating but it's time to post an update. I've given a couple of talks recently and here are the titles, brief summaries and a little link love:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://www.sdforum.org/index.cfm?fuseaction=Calendar.eventDetail&amp;amp;eventID=13343"&gt;BI Over Petabytes: Meet Apache Mahout&lt;/a&gt; - I introduced the &lt;a href="http://lucene.apache.org/mahout/"&gt;Mahout&lt;/a&gt; project to the &lt;a href="http://www.sdforum.org/"&gt;SDForum&lt;/a&gt; Business Intelligence SIG meeting last month at SAP in Palo Alto. The talk was quite well attended and there was standing room only. After a brief overview of the project in general, I showed a comparison of the various Mahout clustering algorithms on a hypothetical astronomical dataset. &lt;a href="http://ororke.com/paul/blog/"&gt;Paul O'Rorke&lt;/a&gt;, one of the forum chairs, posted a nice blog entry on the talk &lt;a href="http://ororke.com/paul/blog/2009/04/21/apaches-mahout-project/#more-209"&gt;here&lt;/a&gt;. You can get a copy of the slides from &lt;a href="http://cwiki.apache.org/MAHOUT/bookstutorialstalks.data/SDForum.pdf"&gt;here&lt;/a&gt;. &lt;/li&gt;&lt;li&gt;&lt;a href="http://jaoo.com.au/sydney-2009/presentation/Net+Promoter%28TM%29+in+the+Cloud%3A+An+Experiment+on+the+Force.com%28TM%29"&gt;Net Promoter in the Cloud: An Experiment on the Force.com Platform&lt;/a&gt; - I had a nice opportunity to fly to Sydney to give this talk at the &lt;a href="http://jaoo.com.au/sydney-2009/"&gt;JAOO conference&lt;/a&gt; there. In it I described my experiences building the application that is discussed more in the postings below. The conference was organized into three concurrent tracks so the attendees had to choose between my talk and two others. The talk which won hands down was Patrick Linsky's "&lt;a style="font-style: italic;" href="http://blog.jaoo.dk/2009/05/08/jaoo-talk-patrick-linskey-makes-local-news-iphone-app-in-45-minutes/"&gt;How to build an iPhone application in 45 minutes&lt;/a&gt;" which, I must admit, I wanted to hear too. &lt;/li&gt;&lt;li&gt;&lt;a href="http://salesforce.com/"&gt;Salesforce.com&lt;/a&gt; was a sponsor of the show and I did get a chance to meet Clayton Brown, their local SE guru, who also gave an &lt;a href="http://jaoo.com.au/sydney-2009/presentation/Force.com"&gt;amazing example&lt;/a&gt; of building a Force.com enterprise application in 45 minutes. Not quite as much sizzle as an iPhone app, but IMHO a far more challenging problem.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Here's another link to Paul O'Rorke's blog where he describes another SDForum talk by Salesforce CTO Craig Wiessman titled "&lt;a style="font-style: italic;" href="http://ororke.com/paul/blog/2009/03/25/the-data-architecture-of-forcecom/#more-174"&gt;The Data Architecture of Force.com&lt;/a&gt;".&lt;br /&gt;&lt;/li&gt;&lt;li&gt;One interesting coincidence: Attendees were asked to rate each talk on their way out the door by dropping red, green or yellow slips of paper in a voting container. While not precisely &lt;a href="http://www.satmetrix.com/satmetrix/netpromoter.php?page=6"&gt;Net Promoter&lt;/a&gt; procedure, they calculated a similar score by subtracting the red percentage from the green one. By this metric I had a +66% which left me feeling worthwhile in the end :).&lt;/li&gt;&lt;/ul&gt;Right now I'm in Brisbane and will give another talk to the &lt;a href="http://jaoo.com.au/brisbane-2009/"&gt;JAOO Conference&lt;/a&gt; here on Tuesday. The scheduling is different: This time I will be able to see Patrick's talk and, hopefully, get a larger share of the turnout too.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1913655139480042191-3865711621097559626?l=jeffeastman.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jeffeastman.blogspot.com/feeds/3865711621097559626/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1913655139480042191&amp;postID=3865711621097559626' title='40 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1913655139480042191/posts/default/3865711621097559626'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1913655139480042191/posts/default/3865711621097559626'/><link rel='alternate' type='text/html' href='http://jeffeastman.blogspot.com/2009/05/pair-of-cloud-related-talks-by-me.html' title='A Pair of Cloud-related Talks by Me'/><author><name>Jeff</name><uri>http://www.blogger.com/profile/06071361174103649709</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://bp2.blogger.com/_bjCfFqZoyrQ/R-rAdgO4RVI/AAAAAAAAAAU/wSeI9Ij6eDQ/S220/JeffSmall.jpg'/></author><thr:total>40</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1913655139480042191.post-5195956100503487195</id><published>2009-03-10T19:14:00.000-07:00</published><updated>2009-03-10T21:41:07.009-07:00</updated><title type='text'>The Power of Naked Conversations</title><content type='html'>It's kind of exciting when you get a concrete indication that somebody - a real person - has actually read your blog! Exciting and maybe a little scary too, because you never really know what goes on "out there" in the &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_0"&gt;blogosphere&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;So, consider how excited I was when I got two (2!) independent responses from &lt;a href="http://salesforce.com"&gt;&lt;span class="blsp-spelling-error" id="SPELLING_ERROR_1"&gt;Salesforce&lt;/span&gt;.com&lt;/a&gt; people to my last posting: one comment to the post directly by Jon &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_2"&gt;Mountjoy&lt;/span&gt;, the Developer Force Community Manager; and one from Jesse Lorenz, a Force.com Technical Evangelist. When I said: "I think &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_3"&gt;Salesforce&lt;/span&gt;.com employees could do a better job of monitoring their discussion boards, I had no idea they would find out about my discussion posting from my blog! And here I thought I was merely ranting into the ether; the great big dummy load in the Internet Sky that absorbs all inputs and returns nothing but the warm feeling in the pit of my stomach when I write.&lt;br /&gt;&lt;br /&gt;I had another "Twilight Zone" experience after my &lt;a href="http://jeffeastman.blogspot.com/2008/06/wom-road-warrior-war-stories.html"&gt;Road Warrior Stories&lt;/a&gt; posting last summer. Nothing immediate happened at first, but the next time I happened to fly on Continental Air Lines I was mysteriously upgraded to first class! I had no miles, no status, but mysteriously I was in the front cabin. Go figure.&lt;br /&gt;&lt;br /&gt;It's almost like some companies proactively search the blogosphere, looking for user stories - unhappy ones - where they can intervene to turn a potential detractor into a promoter. It is really good business: a detractor is twice as likely to kill a sale than a neutral, and promoters give you an extra 50% boost. It usually does not require moving mountains to resolve their issue either: a couple of pointers into your documentation stack; a short email message; or a free upgrade. Small warm fuzzies from large, impersonal organizations have a huge impact.&lt;br /&gt;&lt;br /&gt;So, despite my poor experience in Cleveland last summer, I'm no longer a &lt;a href="http://contentinalairlines.com/"&gt;Continental&lt;/a&gt; detractor. And those two responses from Salesforce guys actually got me cooking again on my &lt;a href="http://netpromoter.com"&gt;NetPromoter&lt;/a&gt; experiment. I'm over the hump and rockin my way to a cool little application in the force.com universe. And, for some strange reason, there are more responses to the postings on their developer boards now than there were before. Kudos guys, may the &lt;a href="http://developer.force.com/"&gt;Force&lt;/a&gt; be with you!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1913655139480042191-5195956100503487195?l=jeffeastman.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jeffeastman.blogspot.com/feeds/5195956100503487195/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1913655139480042191&amp;postID=5195956100503487195' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1913655139480042191/posts/default/5195956100503487195'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1913655139480042191/posts/default/5195956100503487195'/><link rel='alternate' type='text/html' href='http://jeffeastman.blogspot.com/2009/03/power-of-naked-conversations.html' title='The Power of Naked Conversations'/><author><name>Jeff</name><uri>http://www.blogger.com/profile/06071361174103649709</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://bp2.blogger.com/_bjCfFqZoyrQ/R-rAdgO4RVI/AAAAAAAAAAU/wSeI9Ij6eDQ/S220/JeffSmall.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1913655139480042191.post-6832991888226220042</id><published>2009-03-05T10:53:00.000-08:00</published><updated>2009-03-05T11:57:54.504-08:00</updated><title type='text'>Force.com Experiment</title><content type='html'>I've spent most of the last month continuing to explore the &lt;a href="http://developer.force.com/"&gt;Force.com&lt;/a&gt; developer platform with an experiment to implement some &lt;a href="http://netpromoter.com/"&gt;&lt;span class="blsp-spelling-error" id="SPELLING_ERROR_0"&gt;NetPromoter&lt;/span&gt;&lt;/a&gt; capabilities. Building and updating my business object model was very straightforward using their web-based developer platform and Eclipse &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_1"&gt;plugin&lt;/span&gt;. Within a week I was able to implement some simple business processes using their &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_2"&gt;workflow&lt;/span&gt; engine that allowed me to notify my employees about detractor events, create tasks for them to do, plan and approve mitigating actions. I also found their &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_3"&gt;Visualforce&lt;/span&gt; web platform to be quite easy to use and their Apex Java-like scripting language to be powerful and succinct.&lt;br /&gt;&lt;br /&gt;It is; however, a huge system and learning its subtlties was rather slower than getting "Hello World" working. I started with the workbook tutorials that I got at the &lt;a href="http://cloudconnectevent.com/"&gt;Cloud Connect&lt;/a&gt; conference. As far as it goes, the tutorial really got me off to a good start. When I got out of its wading pool; however, I encountered the volumes of help documents in their online help facility and frustration began to set in. The CRM platform has so much capability I had to spend a lot of time understanding it before I could make more progress on my own experiment. The help documents only describe the simplest of examples, leaving a lot of my questions unanswered.&lt;br /&gt;&lt;br /&gt;I turned to their discussion boards and posted a few questions to their developer community. Perhaps I am still too much of a noob to be bothered with, or my questions did not make sense, but their community did not respond like the one I've experienced with &lt;a href="http://hadoop.apache.org/"&gt;Apache Hadoop&lt;/a&gt; for example. A majority of the questions posed by developers just go unanswered. I think Salesforce.com employees could do a better job of monitoring their discussion boards so that developers in my state of learning can get across the gap between their nice toy tutorials and developing a real system.&lt;br /&gt;&lt;br /&gt;While I'm still a little frustrated, I have not given up. From a cloud computing perspective, the force.com platform is at the highest tier of the Infrastructure-as-service, Platform-as-service and Application-as-service pyramid. This means there is maximum functionality to leverage but also maximum vendor lock-in to use their application. Apex, while Java-like, is not Java and porting my application to another platform (e.g. open source) does not look feasible. Visualforce, a taglib-style web toolkit, looks like lock-in too. This is a big bullet to bite.&lt;br /&gt;&lt;br /&gt;Their CRM offering; however, is well accepted world-wide and the ability for me to develop an application that can leverage their 55k+ customers' CRM artifacts and workflows is very attractive. Once I get my application working in their environment, it is fully scalable, localizable and web-service enabled. This means I can concentrate almost completely upon the features of my own product and leave all the infrastructure headaches to force.com. They even have an &lt;a href="http://www.salesforce.com/appexchange/"&gt;AppExchange&lt;/a&gt; to help me market and distribute my application. I gotta learn more about this stuff.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1913655139480042191-6832991888226220042?l=jeffeastman.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jeffeastman.blogspot.com/feeds/6832991888226220042/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1913655139480042191&amp;postID=6832991888226220042' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1913655139480042191/posts/default/6832991888226220042'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1913655139480042191/posts/default/6832991888226220042'/><link rel='alternate' type='text/html' href='http://jeffeastman.blogspot.com/2009/03/forcecom-experiment.html' title='Force.com Experiment'/><author><name>Jeff</name><uri>http://www.blogger.com/profile/06071361174103649709</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://bp2.blogger.com/_bjCfFqZoyrQ/R-rAdgO4RVI/AAAAAAAAAAU/wSeI9Ij6eDQ/S220/JeffSmall.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1913655139480042191.post-7701794040355777064</id><published>2009-01-23T12:35:00.000-08:00</published><updated>2009-01-23T13:22:44.103-08:00</updated><title type='text'>Cloud Connect Conference - Thursday</title><content type='html'>I wanted to demonstrate my application running on a real hadoop cluster on EC2, so I woke up early on Thursday to bring up a 3-node cluster using the excellent deployment scripts provided by the hadoop-18 distribution.&lt;br /&gt;&lt;br /&gt;At the conference I was preoccupied in Java jar file hell trying to build a deployable version of my demo and did not pay good attention to the speakers. By noon I had finally gotten over that roadblock and had a jar file that would run the entire application on hadoop. After I showed David my application, he challenged me to integrate it with the Google Maps API so I also missed most of the unconference sessions that preceeded the demo session attempting that. I was able to get one zip code to show in a browser on a map but a more complete solution eluded me. And so it goes with me, often getting sucked into building things when I should be listening to and interacting with others.&lt;br /&gt;&lt;br /&gt;At the demonstration session I gave a brief talk titled "&lt;span style="font-style: italic;"&gt;Using Hadoop to invert Force.com data - or - How to drive a thumbtack with a pile driver&lt;/span&gt;". The program used Axis to extract some account data tuples from the force.com demonstration site. It then used 48 mappers and a single reducer to invert these tuples using much the same map/reduce algorithm as Google and Yahoo! use to invert the Internet for page rank data. My demo worked, was well received and I won a nice iTouch for my labors. I thought the conference was useful and informative and I made a couple new friends in the process. I'd recommend it to others with a Net Promoter Score of 9.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1913655139480042191-7701794040355777064?l=jeffeastman.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jeffeastman.blogspot.com/feeds/7701794040355777064/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1913655139480042191&amp;postID=7701794040355777064' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1913655139480042191/posts/default/7701794040355777064'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1913655139480042191/posts/default/7701794040355777064'/><link rel='alternate' type='text/html' href='http://jeffeastman.blogspot.com/2009/01/cloud-computing-conference-thursday.html' title='Cloud Connect Conference - Thursday'/><author><name>Jeff</name><uri>http://www.blogger.com/profile/06071361174103649709</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://bp2.blogger.com/_bjCfFqZoyrQ/R-rAdgO4RVI/AAAAAAAAAAU/wSeI9Ij6eDQ/S220/JeffSmall.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1913655139480042191.post-2035833288761228157</id><published>2009-01-23T11:35:00.000-08:00</published><updated>2009-01-23T13:05:14.897-08:00</updated><title type='text'>Cloud Connect Conference - Wednesday</title><content type='html'>The Wednesday session began with opening remarks by David Berlind followed by a panel discussion moderated by Stephen O'Grady of RedMonk with panelists: Sam Charrington of Appistry, Alistair Croll of Bitcurrent and Bob Sutor of IBM.&lt;br /&gt;&lt;ul&gt;&lt;li&gt;ASPs -&gt; SaaS -&gt; cloud computing evolution has been around for over ten years now&lt;/li&gt;&lt;li&gt;PaaS is a more recent addition that offers the most open platform for hosting custom and proprietary applications&lt;/li&gt;&lt;li&gt;Standards, interoperability, portability and collaboration offer ways to avoid vendor lock-in&lt;/li&gt;&lt;li&gt;Companies should experiment with internal and external cloud technologies to gain perspective&lt;/li&gt;&lt;li&gt;Challenges in administration, governance, control and ownership of derivative works remain&lt;/li&gt;&lt;/ul&gt;Some questions from the audience:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Acquisitions create myriad application integration issues, how does the cloud help? Coexistence, interoperation and migration offer a range of approaches that are really independent of the cloud. The cloud offers the ability to mashup applications that were not possible before.&lt;/li&gt;&lt;li&gt;Larry Ellison and Richard Stahlman have been vocal critics of cloud computing. What's their beef? Some vendors thrive on lock-in and others advocate viral open software. The cloud is already here, it is thriving and it will assimilate everything.&lt;/li&gt;&lt;li&gt;Where is the cloud in terms of crossing the chasm? Email and web hosting are already on the other side, with SaaS vendors hot on their heels. Companies are cautiously entering the market but most are still on the early adopter side. Multiple layers of services from bare boxes to enterprise solutions offer many ways for companies to cross as they can benefit from the cloud's economies of scale.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;The panel was followed by nine brief technology "Solution Provider Speed Geeking" pitches and demonstrations that were given in the exhibit hall. We formed up in small groups and rotated between presentations on the various vendor products to the sound of Dave's loudspeaker siren. These were then followed by more in-depth sessions by the vendors after lunch. I attended the following sessions:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Google App Engine - takes care of automatically scaling my web applications written on top of their Python deployment framework. They support all the tools needed to build new dynamic application involving search, maps, earth, blogs and visualization.&lt;/li&gt;&lt;li&gt;Force.com Platform - an extended Java application framework that integrates with the Salesforce.com CRM artifacts. It has a great set of developer tools and rich new applications can be constructed and deployed easily.&lt;/li&gt;&lt;li&gt;Amazon EC2 - has released a new administration console that is a huge improvement over its predecessors.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Amazon Mechanical Turk - has a huge pool of "artificial artificial intelligence" workers that can be put to work on a fee-for-task basis, doing simple to complicated tasks for a sliding compensation scale from pennies to hundreds of dollars.&lt;/li&gt;&lt;li&gt;Google APIs - offer JavaScript libraries for integrating their server side applications in your web applications. Simple yet powerful to use.&lt;/li&gt;&lt;/ul&gt;Dave threw down the gauntlet to developers by offering some prizes to volunteers who would use some of these technologies to build a demonstration application for the following day. I volunteered and spent some time with a guy from force.com exporing their quickstart.java package to use web services to access some account data to munch with hadoop on EC2.&lt;br /&gt;&lt;br /&gt;It took only a few minutes to customize their quickstart application to obtain and invert some account_name and zip_code tuples in memory. I left the conference and by 11pm had a working Hadoop application that would perform the same inversion on terabytes of similar data using a supercomputer. Ironically, both programs were almost the same size!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1913655139480042191-2035833288761228157?l=jeffeastman.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jeffeastman.blogspot.com/feeds/2035833288761228157/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1913655139480042191&amp;postID=2035833288761228157' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1913655139480042191/posts/default/2035833288761228157'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1913655139480042191/posts/default/2035833288761228157'/><link rel='alternate' type='text/html' href='http://jeffeastman.blogspot.com/2009/01/cloud-computing-conference-wednesday.html' title='Cloud Connect Conference - Wednesday'/><author><name>Jeff</name><uri>http://www.blogger.com/profile/06071361174103649709</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://bp2.blogger.com/_bjCfFqZoyrQ/R-rAdgO4RVI/AAAAAAAAAAU/wSeI9Ij6eDQ/S220/JeffSmall.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1913655139480042191.post-8876841942060308579</id><published>2009-01-23T08:45:00.000-08:00</published><updated>2009-01-23T13:04:27.693-08:00</updated><title type='text'>Cloud Connect Conference - Tuesday</title><content type='html'>I just got back from the &lt;a href="http://cloudconnectevent.com/"&gt;Cloud Connect Conference&lt;/a&gt; at the &lt;a href="http://www.computerhistory.org/"&gt;Computer History Museum&lt;/a&gt; in Mountain View. The conference was partly an unconference that was sponsored by Google, Amazon,  Salesforce and others. David Berlind ran an energetic show that was product and technology focused and very hands-on.&lt;br /&gt;&lt;br /&gt;The first session on Tuesday evening brought three short customer "elevator pitch" presentations from Peter Coffee of Salesforce.com, Adam Selipsky of Amazon Web Services and Rajen Sheth of Google to a group of  four IT executives: Tim Crawford from Stanford University, Carolyn Lawson of California PUC, Ronald Smith of Cadence Design Systems and Robert Loolley of Utah Technical Services.&lt;br /&gt;&lt;br /&gt;The three vendors pitched different cloud computing products but there was a fair amount of overlap in many of their messages: "The benefits of cloud computing are clear, so why delay?"&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Adam presented the AWS platform-as-service offerings that he equated to the development of the electric power grid in the US. "We make electricity so you don't have to." I have a little experience with EC2 and S3 and would recommend. I've been running a web server on it for some months and a 5-node Hadoop cloud more recently.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Rajen presented their code.google.com/apis which consist of a collection of client-side JavaScript libraries that work in concert with server-side Python services. I don't do either language very well but got some hands-on experience later in the program. This would appeal to developers building calendar, map, search and earth related web applications.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Peter talked about desktops burdened with too much state and IT departments benefitting from improved productivity, scalability and governance provided by the force.com platform. It consists of a set of developer tools and web services that open up the innards of the salesforce.com CRM to facilitate integration of custom business applications. It is written in a Java dialect with SQL integration that really makes it easy to construct new applications.&lt;/li&gt;&lt;/ul&gt;The four potential customers asked a number of questions on the following that were fielded by the presenters:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Interactive Applications - Lag is a big impediment to hosting truly interactive applications remotely in the cloud&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Migration into the Cloud - Custom applications often must be rewritten to move into cloud deployment. Email and public website hosting were offered as no-brainer cloud services already in full production. Customers can leverage the innovation scale of cloud providers to gain business advantage.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Migration between Cloud vendors - Vendor lock-in is an issue since some of the platforms rely upon proprietary languages and all proprietary software frameworks discourage migration. Open source and standards were offered as mitigating lock-in but premature standards only help the established early providers. &lt;/li&gt;&lt;li&gt;Security - A general uneasiness with allowing private data to be hosted in the cloud was expressed. Vendors responded that their large investments in state of the art security lended economies of scale in the quest for data security.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Privacy - Once private data is cloud hosted it needs strict access controls to ensure its integrity. Vendors pointed out that lots of corporate data is lost every year to laptop theft and loss of USB keys and that the cloud offers better governance.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Legal Uncertainties - The cloud is so new that many legal issues about data ownership and rights to disclosure are untested in the courts.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1913655139480042191-8876841942060308579?l=jeffeastman.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jeffeastman.blogspot.com/feeds/8876841942060308579/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1913655139480042191&amp;postID=8876841942060308579' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1913655139480042191/posts/default/8876841942060308579'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1913655139480042191/posts/default/8876841942060308579'/><link rel='alternate' type='text/html' href='http://jeffeastman.blogspot.com/2009/01/cloud-computing-conference-tuesday.html' title='Cloud Connect Conference - Tuesday'/><author><name>Jeff</name><uri>http://www.blogger.com/profile/06071361174103649709</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://bp2.blogger.com/_bjCfFqZoyrQ/R-rAdgO4RVI/AAAAAAAAAAU/wSeI9Ij6eDQ/S220/JeffSmall.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1913655139480042191.post-4271286752331589576</id><published>2008-08-21T17:08:00.000-07:00</published><updated>2008-08-25T07:35:12.654-07:00</updated><title type='text'>Why I am a Northwest Airlines Promoter</title><content type='html'>Hi Honey,&lt;br /&gt;&lt;br /&gt;I thought you might be interested in this email chain from NWA. I think I will also post it to my blog as it is an example of the excellent customer experience they have delivered. I do not know if this is related, or merely policy due to my new status in their elite flier program, but when I went to check in for my flight home they had upgraded me to first class without my taking any action whatsoever. Now, that made my day.&lt;br /&gt;&lt;br /&gt;I'm a solid NWA promoter (&lt;span style="font-family:Verdana;font-size:85%;color:blue;"&gt;&lt;span style="font-size: 10pt; font-family: Verdana; color: blue;"&gt;&lt;a href="http://netpromoter.com/" target="_blank"&gt;netpromoter.com&lt;/a&gt;)&lt;/span&gt;&lt;/span&gt;, of course. See you tomorrow,&lt;br /&gt;Jeff&lt;br /&gt;&lt;br /&gt;&lt;div class="gmail_quote"&gt;---------- Forwarded message ----------&lt;br /&gt;From: &lt;b class="gmail_sendername"&gt;Northwest Airlines&lt;/b&gt; &lt;span dir="ltr"&gt;&lt;northwest.airlines@nwa.com&gt;&lt;/northwest.airlines@nwa.com&gt;&lt;/span&gt;&lt;br /&gt;Date: Wed, Aug 20, 2008 at 5:41 AM&lt;br /&gt;Subject: Re: Apology   (KMM17979656V36086L0KM)&lt;br /&gt;To: Jeff Eastman &lt;jdogsailing@gmail.com&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Dear Mr. Eastman,&lt;br /&gt;&lt;br /&gt;RE: Case Number 6186336&lt;br /&gt;&lt;br /&gt;You are welcome.&lt;br /&gt;&lt;br /&gt;Again, we apologize for the flight irregularity and look forward to&lt;br /&gt;serving you on a future Northwest flight.&lt;br /&gt;&lt;br /&gt;Sincerely,&lt;br /&gt;&lt;br /&gt;Sarah Sanders&lt;br /&gt;Customer Care&lt;br /&gt;Northwest/KLM Airlines&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Original Message Follows:&lt;br /&gt;-------------------------&lt;br /&gt;&lt;br /&gt;Hi Cassie,&lt;br /&gt;&lt;br /&gt;Thank-you for your cordial letter acknowledging the technical problem you had on August 6th. As I understood from your ground staff, the delay was caused by the boarding ramp damaging the door of the aircraft due to an operator error. Your airport staff was able to book me on an alternative flight (in 1st class) and I got home only two hours late. I do not imagine you have this sort of problem very often and your proactive handling of the incident leaves me a strong NorthWest promoter in spite of these incidents.Your overall record is still excellent and I appreciate the extra points you have offered.&lt;br /&gt;&lt;br /&gt;Thanks,&lt;br /&gt;Jeff Eastman&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;On Tue, Aug 19, 2008 at 9:28 AM, Northwest Airlines &lt; &lt;a href="mailto:Northwest.Airlines@nwa.com"&gt;Northwest.Airlines@nwa.com&lt;/a&gt;&gt; wrote:&lt;br /&gt;&lt;br /&gt;&gt; Dear Mr. Eastman,&lt;br /&gt;&gt;&lt;br /&gt;&gt; RE: Case Number 7143682&lt;br /&gt;&gt;&lt;br /&gt;&gt; On behalf of all the employees at Northwest Airlines, I would like to&lt;br /&gt;&gt; extend a sincere apology for the flight irregularity you experienced on&lt;br /&gt;&gt; Flight 5858 on August 6.  Travelers expect us to provide dependable and&lt;br /&gt;&gt; reliable service and we failed on this occasion.&lt;br /&gt;&gt;&lt;br /&gt;&gt; Furthermore, we regret to learn you experienced a previous interruption&lt;br /&gt;&gt; on Flight 235 on April 25.&lt;br /&gt;&gt;&lt;br /&gt;&gt; As a gesture of apology and in recognition of your Silver Elite status,&lt;br /&gt;&gt; I have added 4000 WorldPerks bonus miles.  Please allow 3-5 business&lt;br /&gt;&gt; days for the miles to appear in your account ****3265.&lt;br /&gt;&gt;&lt;br /&gt;&gt; My colleagues and I pledge to you that we are dedicated to providing&lt;br /&gt;&gt; good service.  Unfortunately, a reality in this industry is that there&lt;br /&gt;&gt; will be times when we are forced to delay, cancel, or divert flights.&lt;br /&gt;&gt; Thank you for your support and for flying Northwest.&lt;br /&gt;&gt;&lt;br /&gt;&gt; Sincerely,&lt;br /&gt;&gt;&lt;br /&gt;&gt; Cassie Steidler&lt;br /&gt;&gt; Manager, Customer Care&lt;br /&gt;&gt; Northwest/KLM Airlines&lt;br /&gt;&gt;&lt;br /&gt;&gt;&lt;br /&gt;&lt;br /&gt;&lt;/jdogsailing@gmail.com&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1913655139480042191-4271286752331589576?l=jeffeastman.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jeffeastman.blogspot.com/feeds/4271286752331589576/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1913655139480042191&amp;postID=4271286752331589576' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1913655139480042191/posts/default/4271286752331589576'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1913655139480042191/posts/default/4271286752331589576'/><link rel='alternate' type='text/html' href='http://jeffeastman.blogspot.com/2008/08/why-i-am-northwest-airlines-promoter.html' title='Why I am a Northwest Airlines Promoter'/><author><name>Jeff</name><uri>http://www.blogger.com/profile/06071361174103649709</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://bp2.blogger.com/_bjCfFqZoyrQ/R-rAdgO4RVI/AAAAAAAAAAU/wSeI9Ij6eDQ/S220/JeffSmall.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1913655139480042191.post-6999617927600610042</id><published>2008-06-10T18:31:00.000-07:00</published><updated>2008-06-12T08:28:52.892-07:00</updated><title type='text'>WOM: Road Warrior War Stories</title><content type='html'>I had the occasion to fly via &lt;a href="http://www.continental.com/web/en-US/default.aspx"&gt;Continental Airlines&lt;/a&gt; to Erie, PA yesterday and have three war stories to recount. The first is a huge missed opportunity that leaves me a Continental detractor. The other two are positive experiences.&lt;br /&gt;&lt;br /&gt;My flight out of SFO was delayed for mysterious "air traffic control delays", but only for 15 minutes and we departed under clear skies. When we arrived over Cleveland it was evident from the turbulence that there was "weather" in the area and, sure enough, we landed almost 45 minutes late. I had a close connection on another Continental jet to Erie and - when I checked with the gate attendant upon my arrival - was told they had not yet closed the door and if I hurried I could make my connection. Needless to say, I hurried, but when I arrived the door had been sealed anyway. No amount of cajoling the gate attendants could open it up, notwithstanding the fact that the airport itself was locked down because the local thunderstorm was now dumping buckets of rain on everything. I went to the customer service desk and was told - basically - "tough shit, you missed your flight and it was not our problem so we don't have to help you". I got a coupon with an 800 number to call if I actually wanted to overnight in some motel nearby on my nickel at a "discount". Flying just ain't what it used to be. You already knew that. Other airlines have actually held the doors for a minute or two so late connection passengers could make it. Not &lt;a href="http://www.continental.com/web/en-US/default.aspx"&gt;Continental&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Needless to say, I did not wish to take that option, so I took the tedious bus trip to the rental car terminal about five miles away to get a car to drive the last 100 miles. Unfortunately, all of the rental cars at all of the agencies had been booked already and there were none to be had. The guy at the &lt;a href="http://www.enterprise.com/car_rental/home.do"&gt;Enterprise&lt;/a&gt; desk made a helpful phone call and determined that I could take a Greyhound bus to Erie if only I could get way downtown in Cleveland to the bus terminal and wait for up to an hour. At 10pm that was not a really attractive option but there were not any good alternatives at this point so finally I went back to the &lt;a href="https://www.hertz.com/"&gt;Hertz&lt;/a&gt; desk where I am a gold member.  There, a wonderful woman who empathized with my plight said her supervisor was looking for more cars and was soon able to rent one to me. Two hours drive later I was checking into the Sheraton Hotel on the waterfront in Erie.&lt;br /&gt;&lt;br /&gt;The &lt;a href="http://www.blogger.com/www.Sheraton.com/Erie"&gt;Sheraton Bayfront Hotel&lt;/a&gt; in Erie is a new hotel and is not even in 411 directory service yet. It has lovely rooms that are well-appointed and comfortable. It is a most agreeable destination at 12pm on a stormy northeast night on Lake Erie and it has a nice little restaurant.  This evening I had dinner there and had one the nicer presentations of a salmon Caesar salad that I have ever experienced. In Erie, Pennsylvania, go figure. I will be back to the hotel and to the restaurant for another meal.&lt;br /&gt;&lt;br /&gt;I guess the epiphany for me in all of this is that the whole &lt;a href="http://netpromoter.com/"&gt;Net Promoter&lt;/a&gt; question is most relevant when we relate our personal war stories to our friends via word of mouth. If we have good experiences with a brand we tend to say nice things to our friends about it. When we have some bad experiences, we tend to say bad things. When we have neutral experiences we say nothing at all. And blogging allows everybody who can use Google or Yahoo! to become my virtual friends. Maybe I'm just venting but it feels good.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1913655139480042191-6999617927600610042?l=jeffeastman.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jeffeastman.blogspot.com/feeds/6999617927600610042/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1913655139480042191&amp;postID=6999617927600610042' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1913655139480042191/posts/default/6999617927600610042'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1913655139480042191/posts/default/6999617927600610042'/><link rel='alternate' type='text/html' href='http://jeffeastman.blogspot.com/2008/06/wom-road-warrior-war-stories.html' title='WOM: Road Warrior War Stories'/><author><name>Jeff</name><uri>http://www.blogger.com/profile/06071361174103649709</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://bp2.blogger.com/_bjCfFqZoyrQ/R-rAdgO4RVI/AAAAAAAAAAU/wSeI9Ij6eDQ/S220/JeffSmall.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1913655139480042191.post-6506563100771440126</id><published>2008-06-05T16:39:00.000-07:00</published><updated>2008-06-05T16:59:24.123-07:00</updated><title type='text'>Windward Portlets in the Clouds</title><content type='html'>I just finished uploading the latest image of &lt;a href="http://liferay.com"&gt;Liferay 5.0.2&lt;/a&gt; to our new prototype website that contains two new portlets that I just completed in addition to the hundred-plus already in Liferay. &lt;span style="font-style: italic;"&gt;Touchpoint&lt;/span&gt; portlets allow me to build and host community web sites with embedded Net Promoter data capture at various touch points with its users. When community members answer one of the Touchpoint questions (on a scale of 0-10 of course) their score and comments are saved in the database. The comments are also posted automatically to one of three message boards, depending upon their classification as a promoter, detractor or neutral respondent.&lt;br /&gt;&lt;br /&gt;The other new portlet is a &lt;span style="font-style: italic;"&gt;Touchpoint Admin&lt;/span&gt; portlet that lets community administrators create new questions and monitor their results using tabular and charting formats.  Of course, this is a baby step into the &lt;a href="http://netpromoter.com"&gt;Net Promoter &lt;/a&gt;world. Any reasonably large community would generate a huge volume of messages and so my next project will be to work on a text analytics portlet to allow these comments to be filtered and organized in meaningful ways. This will draw me back into &lt;a href="http://lucene.apache.org/mahout"&gt;Mahout&lt;/a&gt;, where I'm exposed to some pretty heavy hitters in this field.&lt;br /&gt;&lt;br /&gt;Of course, my site is also &lt;a href="http://hadoop.apache.org"&gt;Hadoop&lt;/a&gt;-enabled. I have not yet figured out how to utilize cloud computing clusters for this task but I'm working on it. Maybe I'll build a portlet to administer my cloud first.&lt;br /&gt;&lt;br /&gt;Oh, I'm not quite ready to go live with the new site, so stay tuned to Windward's &lt;a href="http://windwardsolutions.com"&gt;current site&lt;/a&gt; for news.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1913655139480042191-6506563100771440126?l=jeffeastman.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jeffeastman.blogspot.com/feeds/6506563100771440126/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1913655139480042191&amp;postID=6506563100771440126' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1913655139480042191/posts/default/6506563100771440126'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1913655139480042191/posts/default/6506563100771440126'/><link rel='alternate' type='text/html' href='http://jeffeastman.blogspot.com/2008/06/windward-portlets-in-clouds.html' title='Windward Portlets in the Clouds'/><author><name>Jeff</name><uri>http://www.blogger.com/profile/06071361174103649709</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://bp2.blogger.com/_bjCfFqZoyrQ/R-rAdgO4RVI/AAAAAAAAAAU/wSeI9Ij6eDQ/S220/JeffSmall.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1913655139480042191.post-5623414806686782173</id><published>2008-05-20T19:40:00.000-07:00</published><updated>2008-05-20T20:11:35.159-07:00</updated><title type='text'>Windward in the Clouds</title><content type='html'>Amazon is on the vanguard of the new cloud computing marketplace and, while I've been EC2-aware for months, until recently I've not actually gotten my hands on it. That changed about a week ago when I decided to see if I could bring up a &lt;a href="http://hadoop.apache.org"&gt;Hadoop&lt;/a&gt; cluster. My longer term goal is to run some scalability tests of the &lt;a href="http://lucene.apache.org/mahout"&gt;Mahout&lt;/a&gt; clustering code and since I lost my little cluster when I left CollabNet I need a replacement.&lt;br /&gt;&lt;br /&gt;The economics are really super: ten cents an hour for a box in one of their datacenters and fifteen cents per gigabyte per month for storage. A rather large run of an hour on twenty boxes costs $2, plus the storage costs. I figured I can afford that, so why not see what it takes?&lt;br /&gt;&lt;br /&gt;The process was pretty simple. After signing up with &lt;a href="http://aws.amazon.com"&gt;Amazon Web Services&lt;/a&gt; and downloading their toolkit, I followed their excellent getting started tutorial and pretty soon had a Fedora 8 box running under my control. Getting Hadoop installed required a bit more work as the box comes with nothing but Linux on it. A couple of 'yum' installs later the Java environment was running and Hadoop was installed. I brought up a single node Hadoop cluster and then decided to wait for Hadoop 0.17 to release as it has some DNS optimizations that make running on EC2 simpler.&lt;br /&gt;&lt;br /&gt;Since I had a little experience, and since Windward is in dire need of website rebranding improvements I decided to bring up a copy of the &lt;a href="http://liferay.com/"&gt;Liferay Enterprise Portal&lt;/a&gt; to see what that would be like. That required running their installation as well as installing MySQL. There were some script problems with the Liferay 5.0.1 RC script, but in all the process has been moderately easy.&lt;br /&gt;&lt;br /&gt;Today I spent most of the day customizing the site with some graphics and an initial page layout. I still need to finish the rebranding and finally point my existing website URL to it, but &lt;a href="http://ec2-67-202-60-102.compute-1.amazonaws.com:8080"&gt;Windward&lt;/a&gt; is now living in the clouds. Woo hoo.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1913655139480042191-5623414806686782173?l=jeffeastman.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jeffeastman.blogspot.com/feeds/5623414806686782173/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1913655139480042191&amp;postID=5623414806686782173' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1913655139480042191/posts/default/5623414806686782173'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1913655139480042191/posts/default/5623414806686782173'/><link rel='alternate' type='text/html' href='http://jeffeastman.blogspot.com/2008/05/windward-in-clouds.html' title='Windward in the Clouds'/><author><name>Jeff</name><uri>http://www.blogger.com/profile/06071361174103649709</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://bp2.blogger.com/_bjCfFqZoyrQ/R-rAdgO4RVI/AAAAAAAAAAU/wSeI9Ij6eDQ/S220/JeffSmall.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1913655139480042191.post-6337478650605949381</id><published>2008-04-17T18:43:00.000-07:00</published><updated>2008-04-20T18:25:10.090-07:00</updated><title type='text'>I Encounter Scary Math</title><content type='html'>My first programming experiences on the &lt;a href="http://lucene.apache.org/mahout"&gt;Mahout&lt;/a&gt; project were pretty straightforward. I was working on a problem that seemed to need some clustering - grouping data that is "similar". I listened to a couple of excellent Google Tutorials that explained how to use Map/Reduce to implement &lt;a href="http://www.youtube.com/watch?v=1ZDybXl212Q"&gt;Canopy Clustering&lt;/a&gt;. This was a small leap from my previous experiences and I was able to implement it in a few days. In the process, I learned about &lt;a href="http://www2.chass.ncsu.edu/garson/PA765/cluster.htm"&gt;k-Means Clustering&lt;/a&gt; since canopy clusters are often used as input to k-means. Now, the math was starting to slow me down but the algorithm was still pretty simple and so I was able to make good progress.&lt;br /&gt;&lt;br /&gt;Then came mean shift clustering. A student had posted an email expressing interest in the algorithm and in a later correspondence included a reference to a &lt;a href="http://www.caip.rutgers.edu/riul/research/papers/pdf/mnshft.pdf"&gt;comprehensive paper&lt;/a&gt; on it. The math was scary, completely opaque to me, and filled with new statistical terms. I must have read that paper a dozen times before it dawned upon me what it was doing. It's not so much that I could not interpret the notation, I just have a hard time mapping it into the real world so that I can visualize its intent. One night I had a vision of hydrogen atoms floating in interstellar space, weakly attracted to each other and slowly forming into clumps - gas clusters - that ultimately became stars. Somehow, that vision morphed into vast clouds of canopy clusters moving and merging together and an implementation was born. I still cannot prove it is correct, but it worked on a test dataset and it felt right so I &lt;a href="http://cwiki.apache.org/MAHOUT/mean-shift.html"&gt;committed it&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Then, &lt;a href="http://tdunning.blogspot.com/"&gt;Ted Dunning&lt;/a&gt;, an active contributor to the Hadoop and Mahout mailing lists, introduced me to &lt;a href="http://www.cs.berkeley.edu/%7Ejordan/papers/jordan-valencia.pdf"&gt;Dirichlet Process Clustering&lt;/a&gt;. Unlike the other clustering algorithms which assign each data point to a single, "best" cluster, Dirichlet allows each point to be assigned to multiple clusters - each with an associated probability. This is much more realistic, but it makes the math really, really complicated and I'm still struggling to map the notation onto reality. Each time I read all the papers it gets a little bit clearer, but I'm hoping for another vision. So far, the best I can do right now is a variation of the &lt;a href="http://en.wikipedia.org/wiki/Chinese_restaurant_process"&gt;Chinese Restaurant Process&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Imagine a very large Chinese restaurant, with (infinitely) many tables. Each table can seat (infinitely) many patrons but only serves a single set of dishes to all of them. The first patron to sit at an empty table orders exactly what she likes from the menu for that table. When a new patron enters the restaurant, he surveys all of the tables. Each will have some items he likes and some that he does not. By comparing his likes and dislikes with the menu on each table, we can calculate the probability that he will sit at each table as well as the probability that he will choose an empty table. If the tables represent the clusters and the patrons represent the data points then some clusters will be more likely than others to contain the point. Of course, the probabilities must all add up to 1. Maybe I can get Ted to comment on this posting and my dumbed-down version of DPC.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1913655139480042191-6337478650605949381?l=jeffeastman.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jeffeastman.blogspot.com/feeds/6337478650605949381/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1913655139480042191&amp;postID=6337478650605949381' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1913655139480042191/posts/default/6337478650605949381'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1913655139480042191/posts/default/6337478650605949381'/><link rel='alternate' type='text/html' href='http://jeffeastman.blogspot.com/2008/04/i-encounter-scary-math.html' title='I Encounter Scary Math'/><author><name>Jeff</name><uri>http://www.blogger.com/profile/06071361174103649709</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://bp2.blogger.com/_bjCfFqZoyrQ/R-rAdgO4RVI/AAAAAAAAAAU/wSeI9Ij6eDQ/S220/JeffSmall.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1913655139480042191.post-6811495287811684349</id><published>2008-04-11T04:33:00.000-07:00</published><updated>2008-04-14T10:38:59.578-07:00</updated><title type='text'>Word of Mouth Marketing: An Expedia Experience</title><content type='html'>My wife &lt;a href="http://debeastman.blogspot.com/"&gt;Deborah&lt;/a&gt; is a marketing professional who has taught me a lot about the emerging power of word-of-mouth marketing. We have taken turns running our consulting and technology company &lt;a href="http://windwardsolutions.com/"&gt;Windward Solutions&lt;/a&gt; while the other has a day job. Since April it has been my turn to be 'Jeff on a jet'.&lt;br /&gt;&lt;br /&gt;I booked my first trip through &lt;a href="http://expedia.com/"&gt;Expedia.com.&lt;/a&gt; It was a package deal that included airfare and a hotel booking with &lt;a href="http://embassysuites.com/"&gt;Embassy Suites&lt;/a&gt;. I arrived at my destination and found that the hotel had no record of my reservation. Fortunately, they had a room and so I checked-in. When it came time to check out; however, I was faced with a second bill to pay for the same reservation. This was an order fulfillment problem between Expedia and Embassy Suites, not my problem.&lt;br /&gt;&lt;br /&gt;I tried to get to speak with a person at Expedia about this double payment problem, and unfortunately American Airlines had canceled thousands of flights and the wait was interminable. I sent an email to the Expedia support desk explaining the situation and providing the details. I was told that they could not handle my complaint and that I needed to wait forever in their phone queue to be helped. I'm not inclined to do that, since this is not my problem, so I called my credit card company and contested the charge.&lt;br /&gt;&lt;br /&gt;By the way, they had charged two transactions: one for the airfare and another for the hotel. The hotel charge was $133 more than the actual charge on my bill - some package deal.&lt;br /&gt;&lt;br /&gt;What does this have to do with word-of-mouth marketing? Well, I know that there are many people who routinely search the blogosphere for what customers are saying about their experiences with corporations. There are also companies, such as &lt;a href="http://satmetrix.com/"&gt;Satmetrix&lt;/a&gt;, &lt;a href="http://biz360.com/"&gt;Biz360&lt;/a&gt; and others (help me out here Deb), that do this for a living. So, I'm sending this little anecdote off into the blogosphere in the fond hope that it will show up as a black mark on Expedia's record. In &lt;a href="http://netpromoter.com/"&gt;Net Promoter&lt;/a&gt; terms, I am now a detractor. Sorry Expedia.&lt;br /&gt;&lt;br /&gt;But I'm fair. If they can resolve this problem quickly and to my satisfaction then I will post that result too and perhaps consider them again when I travel on business. I'm not going to call them; however, but I've given them my cell phone number so they can call me. I usually answer it right away and there is no annoying phone triage for my callers so they will get to a real person immediately.&lt;br /&gt;&lt;br /&gt;What does this have to do with cloud computing? What if I could boil the Internet and distill out for you what people are saying about your company and your products?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1913655139480042191-6811495287811684349?l=jeffeastman.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jeffeastman.blogspot.com/feeds/6811495287811684349/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1913655139480042191&amp;postID=6811495287811684349' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1913655139480042191/posts/default/6811495287811684349'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1913655139480042191/posts/default/6811495287811684349'/><link rel='alternate' type='text/html' href='http://jeffeastman.blogspot.com/2008/04/word-of-mouth-marketing-expedia.html' title='Word of Mouth Marketing: An Expedia Experience'/><author><name>Jeff</name><uri>http://www.blogger.com/profile/06071361174103649709</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://bp2.blogger.com/_bjCfFqZoyrQ/R-rAdgO4RVI/AAAAAAAAAAU/wSeI9Ij6eDQ/S220/JeffSmall.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1913655139480042191.post-2040773257783499014</id><published>2008-03-29T22:50:00.000-07:00</published><updated>2008-04-14T10:36:37.642-07:00</updated><title type='text'>How Big is a Petabyte, Anyway?</title><content type='html'>I woke up a bit early this morning wondering how to describe a petabyte. I can easily count to a hundred and do math in that range. In the thousands and above, I resort to scientific notation. A petabyte is 1,000,000,000,000,000 bytes, or 10^15 bytes. How do I convey that incredible size in a comprehensible way?&lt;br /&gt;&lt;br /&gt;Well, most of us nowdays have a gigabyte of memory in our laptops. A thousand laptops is a terabyte and a million laptops is a &lt;em&gt;petabyte&lt;/em&gt;. What if you could make a million laptops all work together on the same problem? What could they do? What would you want them to do?&lt;br /&gt;&lt;br /&gt;There are about two hundred billion stars in our Milky Way Galaxy, or 2x10^11 stars. Five thousand Milky Way galaxies would contain 10^15 stars, a &lt;em&gt;petastar&lt;/em&gt;.&lt;br /&gt;&lt;br /&gt;If you started typing and typed a petabyte of data it would show on the screen as a very long string. On the screen, how long would it be? Well, if you typed 5 characters per centimeter, then 10^15 characters would be 2x10^14 centimeters long, two billion kilometers.&lt;br /&gt;&lt;br /&gt;Ok, thats still pretty incomprehensible. Let's see how long the beam of a flashlight takes to go from one end to the other. Light travels at 3x10^10 cm/sec or 6,666 seconds to traverse the petabyte string, almost two hours.&lt;br /&gt;&lt;br /&gt;This is still hard to grasp. Consider instead getting on a jet and flying at 1000 km/hr over the string. The trip would take 2x10^6 hours, about 220 years. Better fly first class.&lt;br /&gt;&lt;br /&gt;Of course, you could not type a petabyte yourself in your lifetime, nor could you and all of your friends. But the Web is perhaps a tenth of a petabyte or so right now and is still growing really fast. Lots of people are typing at the same time and computers are helping them. With Hadoop on a good sized cloud, you can run analytics over that dataset in reasonable time. What kind of questions do you want to ask?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1913655139480042191-2040773257783499014?l=jeffeastman.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jeffeastman.blogspot.com/feeds/2040773257783499014/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1913655139480042191&amp;postID=2040773257783499014' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1913655139480042191/posts/default/2040773257783499014'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1913655139480042191/posts/default/2040773257783499014'/><link rel='alternate' type='text/html' href='http://jeffeastman.blogspot.com/2008/03/how-big-is-petabyte-anyway.html' title='How Big is a Petabyte, Anyway?'/><author><name>Jeff</name><uri>http://www.blogger.com/profile/06071361174103649709</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://bp2.blogger.com/_bjCfFqZoyrQ/R-rAdgO4RVI/AAAAAAAAAAU/wSeI9Ij6eDQ/S220/JeffSmall.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1913655139480042191.post-7426502828041500139</id><published>2008-03-27T10:09:00.000-07:00</published><updated>2008-03-27T22:31:43.582-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='yahoo'/><category scheme='http://www.blogger.com/atom/ns#' term='cloud computing'/><category scheme='http://www.blogger.com/atom/ns#' term='google'/><category scheme='http://www.blogger.com/atom/ns#' term='hadoop'/><title type='text'>The First Hadoop Summit</title><content type='html'>On March 25&lt;span class="blsp-spelling-error" id="SPELLING_ERROR_0"&gt;th&lt;/span&gt;, I attended the first &lt;a href="http://developer.yahoo.com/hadoop/summit/"&gt;&lt;span class="blsp-spelling-error" id="SPELLING_ERROR_1"&gt;Hadoop&lt;/span&gt; Summit&lt;/a&gt;. When I got to the conference, I picked up my t-shirt and introduced myself to Ajay &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_2"&gt;Anand&lt;/span&gt;, the &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_3"&gt;Hadoop&lt;/span&gt; product manager and conference organizer. What had started out as a small, local workshop in the minds of the organizers had mushroomed into an overnight sensation. The original venue had space for perhaps a hundred participants and was booked full within a day of the registration. After finding a bigger room at Yahoo! which was also immediately filled, they partnered with &lt;a href="http://aws.typepad.com/"&gt;Amazon Web Services&lt;/a&gt; to move the venue to the Network Meeting Center in Santa Clara, CA. By the time I arrived, that venue was filled to standing room only. I went into the auditorium and found a seat next to a gentleman who is head of Emerging Technology of a Korean company. He told me he has a 200 node cluster and is interested in new marketing applications that are now possible using this technology. There are lots of similar business opportunities awaiting leading edge adopters of Hadoop.&lt;br /&gt;&lt;br /&gt;Ajay opened the conference and introduced Doug Cutting and Eric &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_5"&gt;Baldeshweieler&lt;/span&gt; who gave a historical overview of the &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_6"&gt;Hadoop&lt;/span&gt; evolution up to where it is today in production at Y! &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_7"&gt;Hadoop&lt;/span&gt; began its life as a part of the &lt;a href="http://lucene.apache.org/nutch/"&gt;Apache &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_8"&gt;Lucene&lt;/span&gt; &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_9"&gt;Nutch&lt;/span&gt; &lt;/a&gt;project, which needed a distributed file system to store the web pages returned by its crawlers. They were aware of the work being done at Google and wanted to exploit the &lt;a href="http://labs.google.com/papers/mapreduce.html"&gt;Map/Reduce &lt;/a&gt;paradigm to run computations over these very large &lt;span class="blsp-spelling-corrected" id="SPELLING_ERROR_10"&gt;data sets&lt;/span&gt;. The project snowballed with the support of an active, worldwide, open source community abetted by Yahoo! investments and has recently become a top level Apache project of its own right.&lt;br /&gt;&lt;br /&gt;Five different speakers followed this introduction that each described work being done on top of the &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_11"&gt;Hadoop&lt;/span&gt; platform.&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Chris &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_12"&gt;Olston&lt;/span&gt; (Y!) gave a nice introduction to &lt;a href="http://incubator.apache.org/pig/"&gt;Pig&lt;/a&gt;, which I have explored a bit and have found to be quite powerful. "Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_13"&gt;parallelization&lt;/span&gt;, which in turns enables them to handle very large data sets." &lt;/li&gt;&lt;li&gt;Chris, Kevin Beyer (IBM) gave a talk on &lt;a href="http://www.jaql.org/"&gt;&lt;span class="blsp-spelling-error" id="SPELLING_ERROR_14"&gt;JAQL&lt;/span&gt; &lt;/a&gt;which is a new, more &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_15"&gt;SQL&lt;/span&gt;-like, query language for processing &lt;a href="http://www.json.org/"&gt;&lt;span class="blsp-spelling-error" id="SPELLING_ERROR_16"&gt;JSON&lt;/span&gt; &lt;/a&gt;data on top of &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_17"&gt;Hadoop&lt;/span&gt;. &lt;/li&gt;&lt;li&gt;Michael &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_18"&gt;Isard&lt;/span&gt; (Microsoft Research) described &lt;a href="http://research.microsoft.com/research/sv/dryad/"&gt;&lt;span class="blsp-spelling-error" id="SPELLING_ERROR_19"&gt;DryadLINQ&lt;/span&gt;&lt;/a&gt;, a highly parallel environment for developing computations on a cloud computing infrastructure. He showed that map/reduce computations can be phrased quite simply using their language. The reaction of several people I spoke with was, unfortunately, "too bad it is buried inside of Microsoft's platform". &lt;/li&gt;&lt;li&gt;Andy Konsinski (UC Berkeley) talked about the X-trace monitoring framework they had embedded inside of Hadoop adding only about 500 lines of code. This seems to be potentially useful in understanding the actual behavior of M/R jobs and they promise to clean it up and submit it as a patch. &lt;/li&gt;&lt;li&gt;Ben Reed (Y!) discussed &lt;a href="http://developer.yahoo.com/blogs/hadoop/2008/03/intro-to-zookeeper-video.html"&gt;Zookeeper&lt;/a&gt;, a hierarchical namespace directory service that can be used for coordinating and communicating between multiple user jobs on Hadoop. &lt;/li&gt;&lt;/ul&gt;After lunch Michael Stack (&lt;a href="http://www.powerset.com/"&gt;&lt;span class="blsp-spelling-error" id="SPELLING_ERROR_20"&gt;Powerset&lt;/span&gt;&lt;/a&gt;) gave an introduction to &lt;a href="http://hadoop.apache.org/hbase/"&gt;&lt;span class="blsp-spelling-error" id="SPELLING_ERROR_21"&gt;HBase&lt;/span&gt;&lt;/a&gt;, a scalable, robust, column-oriented database that is build upon the &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_22"&gt;Hadoop&lt;/span&gt; distributed file system. The project is in its second year and is based upon &lt;a href="http://labs.google.com/papers/bigtable.html"&gt;&lt;span class="blsp-spelling-error" id="SPELLING_ERROR_23"&gt;BigTable&lt;/span&gt;&lt;/a&gt;, another Google technology. It stores very large tables which can be accessed by row primary key, column name and a timestamp. I've not yet experimented with &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_24"&gt;HBase&lt;/span&gt;, but will likely need to utilize it in my &lt;a href="http://lucene.apache.org/mahout/"&gt;Mahout &lt;/a&gt;work for storing and manipulating very large vectors and matrices. Afterwards, Brian &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_25"&gt;Duxbury&lt;/span&gt; (&lt;a href="http://www.rapleaf.com/"&gt;&lt;span class="blsp-spelling-error" id="SPELLING_ERROR_26"&gt;Rapleaf&lt;/span&gt;&lt;/a&gt;) described how &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_27"&gt;HBase&lt;/span&gt; and &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_28"&gt;Hadoop&lt;/span&gt; are used to search the Web for information about people's reputations that can be gleaned from various online sources. How can I influence &lt;em&gt;that&lt;/em&gt; score?&lt;br /&gt;&lt;br /&gt;There were several additional talks that addressed application level work being done on top of the &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_29"&gt;Hadoop&lt;/span&gt; platform:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;span class="blsp-spelling-error" id="SPELLING_ERROR_30"&gt;Jinesh&lt;/span&gt; &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_31"&gt;Varia&lt;/span&gt; (Amazon) talked about how they deploy &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_32"&gt;GrepTheWeb&lt;/span&gt; jobs on &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_33"&gt;Hadoop&lt;/span&gt; clusters that are materialized on &lt;a href="http://www.amazon.com/gp/browse.html?node=201590011"&gt;EC2 &lt;/a&gt;to run and then vanish when they are finished. This is an example of the kind of technology that is now available to anybody with a wallet and a good map/reduce algorithm that they want to use for generating business value.&lt;/li&gt;&lt;li&gt;Steve &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_34"&gt;Schlosser&lt;/span&gt; (Intel) and David O’&lt;span class="blsp-spelling-error" id="SPELLING_ERROR_35"&gt;Hallaron&lt;/span&gt; (&lt;span class="blsp-spelling-error" id="SPELLING_ERROR_36"&gt;CMU&lt;/span&gt;) talked about building ground models of Southern California, using &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_37"&gt;Hadoop&lt;/span&gt; in a novel processing application of seismic data.&lt;/li&gt;&lt;li&gt;Mike Haley (Autodesk) talked about how they are using classification algorithms and Hadoop to correlate the product catalogs of building parts suppliers into their graphical component library that is used for CAD. &lt;/li&gt;&lt;li&gt;Christian Kunz (Y!) described their recently-announced &lt;a href="http://developer.yahoo.com/blogs/hadoop/2008/02/yahoo-worlds-largest-production-hadoop.html"&gt;production use of Hadoop&lt;/a&gt;. He showed some very big numbers and impressive improvements over their previous technology in terms of scale, reliability, manageability and speed. To generate their web search index, they routinely run 100k map and 10k reduce jobs over hundreds of terabytes of web data using a cluster with 10k cores and 20 petabytes of disk space. This illustrates what is now possible to do in production settings with Hadoop.&lt;/li&gt;&lt;li&gt;Jimmy Lin (&lt;a href="http://www.umiacs.umd.edu/~jimmylin/"&gt;University of MD&lt;/a&gt;) and Christophe Bisciglia (Google) talked about natural language processing work going on at UMD and other universities. I got a chance to shake Christophe’s hand during the happy hour and to thank him for revolutionizing my life (see My Hadoop Odyssey, below).&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;I had a fantastic opportunity to sit on the futures panel with leaders of the Hadoop community (Sameer Paranjpye, Sanjay Radia, Owen O’Malley (all Y!) and Chad Walters (Powerset)) to introduce the new &lt;a href="http://lucene.apache.org/mahout/"&gt;Mahout &lt;/a&gt;project while they presented the future directions of Hadoop and Hbase. The panel gave me an outstanding soapbox, generated a lot of interest in machine learning applications and several great opportunities for followup discussions with people from the greater Hadoop community.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1913655139480042191-7426502828041500139?l=jeffeastman.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jeffeastman.blogspot.com/feeds/7426502828041500139/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1913655139480042191&amp;postID=7426502828041500139' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1913655139480042191/posts/default/7426502828041500139'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1913655139480042191/posts/default/7426502828041500139'/><link rel='alternate' type='text/html' href='http://jeffeastman.blogspot.com/2008/03/first-hadoop-summit.html' title='The First Hadoop Summit'/><author><name>Jeff</name><uri>http://www.blogger.com/profile/06071361174103649709</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://bp2.blogger.com/_bjCfFqZoyrQ/R-rAdgO4RVI/AAAAAAAAAAU/wSeI9Ij6eDQ/S220/JeffSmall.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1913655139480042191.post-7199440680821308644</id><published>2008-03-26T18:32:00.000-07:00</published><updated>2008-03-27T10:04:36.429-07:00</updated><title type='text'>What is Mahout?</title><content type='html'>Around the end of January I saw an interesting post on the &lt;a href="http://hadoop.apache.org/core/mailing_lists.html#Developers"&gt;Hadoop users list &lt;/a&gt;announcing the creation of a new sub-project called &lt;a href="http://lucene.apache.org/mahout/"&gt;Mahout &lt;/a&gt;under the Apache Lucene project. I decided this would be a good place to continue my Hadoop odyssey.&lt;br /&gt;&lt;br /&gt;Using cloud computing technologies such as &lt;a href="http://www.amazon.com/gp/browse.html?node=201590011"&gt;EC2&lt;/a&gt;, &lt;a href="http://lucene.apache.org/"&gt;Lucene&lt;/a&gt;, &lt;a href="http://lucene.apache.org/nutch/"&gt;Nutch&lt;/a&gt;, &lt;a href="http://hadoop.apache.org/core/"&gt;Hadoop&lt;/a&gt;, &lt;a href="http://incubator.apache.org/pig/"&gt;Pig &lt;/a&gt;and &lt;a href="http://hadoop.apache.org/hbase/"&gt;Hbase &lt;/a&gt;it is now possible for even small companies to perform analytics over the entire Worldwide Web. The emerging challenge is now to develop improved analytics that can separate relevant information from spam, learn from previous experience and organize information in ever more meaningful ways.&lt;br /&gt;&lt;br /&gt;In recent years a rather large community of researchers has addressed the problem of extracting useful intelligence from the Web. Whether is it classifying documents into categories, clustering them to form groups that make sense to users or ranking them by relevancy given some query, these methods fall under the broad category of machine learning algorithms. Unfortunately, most of the available algorithms are either proprietary, under restrictive licenses or do not scale to massive amounts of information.&lt;br /&gt;&lt;br /&gt;The focus of the Mahout project is to develop commercially-friendly, scalable machine learning algorithms such as classification, clustering, regression and dimension reduction under the Apache brand and on top of Hadoop. Its initial areas of focus are to build out the ten machine learning libraries detailed in &lt;em&gt;&lt;a href="http://www.cs.stanford.edu/people/ang//papers/nips06-mapreducemulticore.pdf"&gt;Map-Reduce for Machine Learning on Multicore&lt;/a&gt;&lt;/em&gt;, by Chu, Kim, Liu, Yu, Bradski, Ng &amp;amp; Olukotun of Stanford University. Though the project is only in its second month, we have an active and growing community with initial submissions in the areas of clustering, classification and matrix operations.&lt;br /&gt;&lt;br /&gt;The Mahout team chose this name for the project out of admiration and respect for work of the Hadoop project, whose logo is that of an elephant. According to &lt;a href="http://en.wikipedia.org/wiki/Mahout"&gt;Wikipedia&lt;/a&gt;, “A mahout is a person who drives an elephant”. It goes on to say that the “Sanskrit language distinguishes three types [of mahouts]: Reghawan, who use &lt;em&gt;love&lt;/em&gt; to control their elephants, Yukthiman, who use &lt;em&gt;ingenuity&lt;/em&gt; to outsmart them and Balwan, who control elephants with &lt;em&gt;cruelty&lt;/em&gt;”. We intend to practice only in the first two categories and welcome individuals with similar values who would like to contribute to the project.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1913655139480042191-7199440680821308644?l=jeffeastman.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jeffeastman.blogspot.com/feeds/7199440680821308644/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1913655139480042191&amp;postID=7199440680821308644' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1913655139480042191/posts/default/7199440680821308644'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1913655139480042191/posts/default/7199440680821308644'/><link rel='alternate' type='text/html' href='http://jeffeastman.blogspot.com/2008/03/what-is-mahout.html' title='What is Mahout?'/><author><name>Jeff</name><uri>http://www.blogger.com/profile/06071361174103649709</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://bp2.blogger.com/_bjCfFqZoyrQ/R-rAdgO4RVI/AAAAAAAAAAU/wSeI9Ij6eDQ/S220/JeffSmall.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1913655139480042191.post-3455797752077144031</id><published>2008-03-26T18:28:00.000-07:00</published><updated>2008-03-27T09:44:45.441-07:00</updated><title type='text'>My Hadoop Odyssey</title><content type='html'>It all started on a lazy Sunday afternoon back in December. The new issue of Business Week had just arrived and it had an interesting-looking cover about a new technology called "cloud computing". The long-haired guy on the cover looked somewhat out of place for the usual &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_0"&gt;BW&lt;/span&gt; fair, but I had never heard about the technology and so it piqued my interest. Reading through the article, it turned out to be about massively parallel computing platforms and their emerging impact on the business world of web-scale information processing. The guy on the cover, &lt;a href="http://seattlepi.nwsource.com/business/304299_google20.html"&gt;Christophe &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_1"&gt;Bisciglia&lt;/span&gt;&lt;/a&gt;, was a young engineer at Google who had become their cloud computing guru and was working with some universities to establish curricula for teaching students about this new technology.&lt;br /&gt;&lt;br /&gt;As a necessary ingredient of their web search business, Google had developed a way to use massive arrays of general purpose computers as a single computational platform. They had racks and racks full of off-the-shelf PCs, each with its own memory and disk drive. Using proprietary technology based upon an old functional programming technique called &lt;a href="http://labs.google.com/papers/mapreduce.html"&gt;Map/Reduce &lt;/a&gt;they are able to store massive amounts of web search data redundantly on these computer arrays and run jobs over a database consisting of the entire world wide web. I'd always wondered how they did it.&lt;br /&gt;&lt;br /&gt;The article went on to mention &lt;a href="http://hadoop.apache.org/"&gt;&lt;span class="blsp-spelling-error" id="SPELLING_ERROR_2"&gt;Hadoop&lt;/span&gt;&lt;/a&gt;, an open source version of this technology that was being developed by Yahoo!, IBM, Amazon and other companies to make this technology available under the Apache Software Foundation's flexible licensing terms. Though this was a competitive effort to the Google work, it was behind them on the learning curve and it provided an open platform to train young engineers to think in terms of these massively parallel computations.&lt;br /&gt;&lt;br /&gt;It also provided me with a window into this new technology. That evening, I downloaded a copy and started reading the &lt;a href="http://wiki.apache.org/hadoop/"&gt;documentation&lt;/a&gt;. By midnight, I had a 1-node &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_3"&gt;Hadoop&lt;/span&gt; cloud running on my laptop and was running some of the example jobs. The next day I want into my office at &lt;a href="http://collab.net/"&gt;&lt;span class="blsp-spelling-error" id="SPELLING_ERROR_4"&gt;CollabNet&lt;/span&gt; &lt;/a&gt;and &lt;span class="blsp-spelling-corrected" id="SPELLING_ERROR_5"&gt;commandeered&lt;/span&gt; a few of my colleague's Linux boxes to build a 12-node cloud that had over a terabyte of storage and a dozen CPUs. Then I went looking for some data to munch with it.&lt;br /&gt;&lt;br /&gt;CollabNet's in the globally-distributed software development business, not the web search business and so about the only large sources of data we had were the logs from our Apache web servers. I got ops to give me about 6 gb of logs and started writing a program to extract some usage information from them. In short time, I had my first map/reduce application tested using the very fine &lt;a href="http://www.alphaworks.ibm.com/tech/mapreducetools"&gt;Eclipse plugin &lt;/a&gt;provided by IBM. I ran it on a single-node cluster against 5 months of logs and the program took about 120 minutes to complete. Then I launched it on my 12-node cloud and it took only 12 minutes - almost linear scalability with cluster size. This really cemented my interest.&lt;br /&gt;&lt;br /&gt;There was one aspect of CollabNet's business that, I felt, might benefit from this technology. &lt;a href="http://cubit.open.collab.net/"&gt;CUBiT&lt;/a&gt;, a nascent product designed to manage pools of developer boxes, allows engineers to check a machine out from a pool, install a particular operating system profile on it, check out a particular version of our software, build it and use it for testing. Using the CUBiT user interface, I was able to see that we had literally hundreds of Linux boxes in use around the company. I was also able to see that most of them were only about 2-5% utilized, sitting there stirring and warming the air in their machine rooms most of the time just waiting for engineers to need them.&lt;br /&gt;&lt;br /&gt;We were sitting on top of a massive supercomputer and did not even realize it! How many of our customers had similar environments? Our customers included major Fortune 1000 corporations. Probably ours was one of the smallest latent clouds around. What if we bundled Hadoop into CUBiT? It would be totally opportunistic as a product feature but it would enable our customers to develop and run jobs that they never even dreamed were possible, right in their own laboratories, for free. "Buy CUBiT, get a free supercomputer" became my mantra.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1913655139480042191-3455797752077144031?l=jeffeastman.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jeffeastman.blogspot.com/feeds/3455797752077144031/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1913655139480042191&amp;postID=3455797752077144031' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1913655139480042191/posts/default/3455797752077144031'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1913655139480042191/posts/default/3455797752077144031'/><link rel='alternate' type='text/html' href='http://jeffeastman.blogspot.com/2008/03/yesterday-i-attended-first-hadoop.html' title='My Hadoop Odyssey'/><author><name>Jeff</name><uri>http://www.blogger.com/profile/06071361174103649709</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://bp2.blogger.com/_bjCfFqZoyrQ/R-rAdgO4RVI/AAAAAAAAAAU/wSeI9Ij6eDQ/S220/JeffSmall.jpg'/></author><thr:total>0</thr:total></entry></feed>
