A Brief History of "Coogle" (formerly known as "Google for Concord") Coogle was a grass-roots effort by a relatively new engineer to aid in grappling with the millions of lines of code that he (and all engineers) have to deal with. For sake of simplicity, lets just pretend that the engineer's name is "Joe" and that he started working at Concord on April Fool's day, 2002 (both of these are in fact true). Summer 2002 - CODEGREP Joe's first attempt to navigate the source code was to produce a tool at home (using GNU C++) called codegrep. This tool still remains in /home/tools/bin but takes 45 minutes to recursive grep for something in "the Vobs" (across a single branch of the source code). Codegrep was useful, but it was clear that we needed faster (and more comprehensive) searching across multiple branches of the code and all the documentation, specification, project plans, etc. Feb 2003 - SWISH Having completed a search project a few years ago using public domain search engine software SWISH, Joe's next attempt was to implement a version of that here. Our code base (C++, Java, Shell, etc.) had never before been "indexed" in this way before. SWISH could simply not handle the amount of content that we have here at Concord. "What would you say, sire... too many notes... ?" This was especially true, given Joe's personal ambition to get BAFS (tech support's "Big Ass File Server" indexed). BAFS holds log files and key customer interactions from all of our key accounts. It is an absolute gold mine for engineers on SWAT. The only prior access to BAFS was through UNC paths that were pasted into Remedy tickets. Summer 2003 - GLIMPSE, SWISH++, ... Fruitlessly, Joe tried lots of other public domain software to index our world. Without exception, each package ended up having problems. Especially with BAFS (huge log files, MTF files, poller configs, ...). June 2003 - GOOGLE APPLIANCE Even though he had already coined the expression "Google for Concord", Joe did not yet realize that Google Corporation in California, actually lease appliances (loaded with the real Google software). He soon got management approval for a loaner and brought the appliance in house. Before long, Joe had "the world" indexed: - multiple branches of the software off the Vobs - unbranched Vobs - one-off directories on Janeway - UNIX home directories (see OOPS below) - /home/tools/bin - other useful UNIX mount points July 2003 - GOOGLE OOPS In a rush to get feedback from other Engineers, we published the internal "google for concord" URL on Friday, July 11. What Joe did not realize is that some people keep very private, sometimes confidential, sometimes embarrassing information in their UNIX home directories. Google merely "shined the light" on it all. Vicious rumors about "google" spread like wildfire. Google was proportingly: - penetrating MIS security - spidering HR records - displaying pornographic material Basically, this was all much overblown, but it almost killed the entire project. In less than 24 hours, MIS had disabled the port (on Joe's wallplate) where the Google appliance was plugged into the net, thereby ending the experiment. July 2003 - Joe and MIS As a result of the July 11 mis-step, Joe formed a joint committee with MIS and started working with Kathy Hickey and Brian Edwards on solving search problems at a higher level. They knew that the search capability on "planet" was far from acceptable and wanted help solving all the remaining search needs: - www.concord.com (for authenticated customers) - planet - primus (the tech support knowledge base) etc. Joe immediately begged off these larger responsibilities and explained that he would only be able to solve Engineering's needs (if that). In order to "rein in" what Google would (and would not) spider, Joe created the "spiderme" concept. An engineer must now create a "spiderme" directory under their UNIX home, and place in it the documents (or sym-links) they wish to have indexed. August 2003 - GOOGLE on BAFS Since Joe was on SWAT and battling with obscure Oracle errors in the field on eHealth 5.5, he was anxious to get BAFS indexed. The loaner appliance was licensed for 300,000 URLs. We soon saturated this by adding BAFS to the google appliance collections, but even with the partial indexing of BAFS, Joe was able to research and solve (or help solve) some key problems: Aug-2003 : ORA-00001 : unique constraint violations on NH_GROUP Oct-2003 : NaN : NaN showing up in reports across multiple releases ... : ORA-xxxxx : the prevalence of many other Oracle errors (some even unreported) occurring in the field The Google appliance had no trouble crunching through the BAFS content. In fact it could index the entire "world" (6 code branches plus BAFS) in about 8 hours. The only real problems with the Google appliance were that: - our loaner period had been extended multiple times - the Google appliance max'ed out at 300,000 and the next level of hardware was an entire rack of appliances that cost a lot of money - when you click a link, it did not "highlight" your search expression in the resulting document (same problem that google in the cloud has, when you don't have the google highlighter installed) November 2003 - VERITY In the meantime, MIS had been talking to search vendors. Verity (a long established authority on search engines) was gunning for the business. They had better authentication methods and establish hooks into planet, Primus, Remedy, etc. Unfortunately, out of the gate Verity suffered the same fate as the public domain packages (SWISH, GLIMPSE, etc.). Despite what Verity kept reassuring us, it seemed like it just could not handle the amount of information we wanted to index. Verity worked hard. Joe worked hard. MIS worked hard. We worked through issue after issue, long into the nights. By the end of November, Verity's PO hung on Joe's decision that the POC (proof of concept) was a success. What Joe saw in Verity was a piece of complex software with a million knobs, a legacy of very large search applications (much larger than ours) and a professional service organization to back up their product. Verity got the contract. December 2003 - Bye Bye GOOGLE With a commitment to "make Verity work", we unplugged the neat little Google appliance and shipped it back to Google. January 2004 - VERITY Today Today, Verity indexes nearly a half million documents at Concord. It still has a lot of trouble with BAFS, but all in all, we've got the system in a very functional, very useful state. February 2004 - COOGLE Branded Following a naming contest held across Engineering in January, the winning name "Coogle" was selected (by popular vote) to be the new brand name. The complete list of names submitted (by engineers) for the contest is: naming.xls. The runner-ups were: "joeker", "conoogle" and "query". Some Intesting Links: Original Approach Document: 27-Jan-2003 Approach Functional Specification: 15-Jul-2003 Func Spec