A Brief History of "Coogle" (formerly known as "Google for Concord")
Coogle was a grass-roots effort by a relatively new engineer
to aid in grappling with the millions of lines of code that he (and all
engineers) have to deal with.
For sake of simplicity, lets just pretend that the engineer's name is "Joe"
and that he started working at Concord on April Fool's day, 2002
(both of these are in fact true).
Summer 2002 - CODEGREP
Joe's first attempt to navigate the source code was to produce a tool
at home (using GNU C++) called codegrep. This tool still remains
in /home/tools/bin but takes 45 minutes to recursive grep for something in "the Vobs" (across a single
branch of the source code).
Codegrep was useful, but it was clear that we needed faster (and more
comprehensive) searching across multiple branches of the code and all
the documentation, specification, project plans, etc.
Feb 2003 - SWISH
Having completed a search project a few years ago using public domain
search engine software SWISH, Joe's next attempt was to implement
a version of that here.
Our code base (C++, Java, Shell, etc.) had never before been "indexed"
in this way before.
SWISH could simply not handle the amount of content that we have
here at Concord. "What would you say, sire... too many notes... ?"
This was especially true, given Joe's personal ambition to get BAFS
(tech support's "Big Ass File Server" indexed). BAFS holds log files
and key customer interactions from all of our key accounts. It is an
absolute gold mine for engineers on SWAT. The only prior access to
BAFS was through UNC paths that were pasted into Remedy tickets.
Summer 2003 - GLIMPSE, SWISH++, ...
Fruitlessly, Joe tried lots of other public domain software to index
our world. Without exception, each package ended up having problems.
Especially with BAFS (huge log files, MTF files, poller configs, ...).
June 2003 - GOOGLE APPLIANCE
Even though he had already coined the expression "Google for Concord",
Joe did not yet realize that Google Corporation in California,
actually lease appliances (loaded with the real Google software).
He soon got management approval for a loaner and brought the appliance
in house.
Before long, Joe had "the world" indexed:
- multiple branches of the software off the Vobs
- unbranched Vobs
- one-off directories on Janeway
- UNIX home directories (see OOPS below)
- /home/tools/bin
- other useful UNIX mount points
July 2003 - GOOGLE OOPS
In a rush to get feedback from other Engineers, we published the
internal "google for concord" URL on Friday, July 11.
What Joe did not realize is that some people keep very private,
sometimes confidential, sometimes embarrassing information in
their UNIX home directories.
Google merely "shined the light" on it all.
Vicious rumors about "google" spread like wildfire. Google
was proportingly:
- penetrating MIS security
- spidering HR records
- displaying pornographic material
Basically, this was all much overblown, but it almost killed
the entire project.
In less than 24 hours, MIS had disabled the port (on Joe's wallplate)
where the Google appliance was plugged into the net, thereby
ending the experiment.
July 2003 - Joe and MIS
As a result of the July 11 mis-step, Joe formed a joint committee
with MIS and started working with Kathy Hickey and Brian Edwards
on solving search problems at a higher level.
They knew that the search capability on "planet" was far from
acceptable and wanted help solving all the remaining search needs:
- www.concord.com (for authenticated customers)
- planet
- primus (the tech support knowledge base)
etc.
Joe immediately begged off these larger responsibilities and
explained that he would only be able to solve Engineering's
needs (if that).
In order to "rein in" what Google would (and would not) spider,
Joe created the "spiderme" concept. An engineer must now create
a "spiderme" directory under their UNIX home, and place in it
the documents (or sym-links) they wish to have indexed.
August 2003 - GOOGLE on BAFS
Since Joe was on SWAT and battling with obscure Oracle errors in the
field on eHealth 5.5, he was anxious to get BAFS indexed.
The loaner appliance was licensed for 300,000 URLs. We soon saturated
this by adding BAFS to the google appliance collections, but even
with the partial indexing of BAFS, Joe was able to research and solve
(or help solve) some key problems:
Aug-2003 : ORA-00001 : unique constraint violations on NH_GROUP
Oct-2003 : NaN : NaN showing up in reports across multiple releases
... : ORA-xxxxx : the prevalence of many other Oracle errors (some
even unreported) occurring in the field
The Google appliance had no trouble crunching through the BAFS content.
In fact it could index the entire "world" (6 code branches plus BAFS)
in about 8 hours.
The only real problems with the Google appliance were that:
- our loaner period had been extended multiple times
- the Google appliance max'ed out at 300,000 and the next level
of hardware was an entire rack of appliances that cost a lot
of money
- when you click a link, it did not "highlight" your search expression
in the resulting document (same problem that google in the cloud
has, when you don't have the google highlighter installed)
November 2003 - VERITY
In the meantime, MIS had been talking to search vendors. Verity (a
long established authority on search engines) was gunning for the
business. They had better authentication methods and establish hooks
into planet, Primus, Remedy, etc.
Unfortunately, out of the gate Verity suffered the same fate as
the public domain packages (SWISH, GLIMPSE, etc.). Despite
what Verity kept reassuring us, it seemed like it just could not
handle the amount of information we wanted to index.
Verity worked hard. Joe worked hard. MIS worked hard. We worked
through issue after issue, long into the nights. By the end of
November, Verity's PO hung on Joe's decision that the POC
(proof of concept) was a success.
What Joe saw in Verity was a piece of complex software with a
million knobs, a legacy of very large search applications (much
larger than ours) and a professional service organization to back up
their product. Verity got the contract.
December 2003 - Bye Bye GOOGLE
With a commitment to "make Verity work", we unplugged the neat
little Google appliance and shipped it back to Google.
January 2004 - VERITY Today
Today, Verity indexes nearly a half million documents at Concord.
It still has a lot of trouble with BAFS, but all in all, we've
got the system in a very functional, very useful state.
February 2004 - COOGLE Branded
Following a naming contest held across Engineering in January, the
winning name "Coogle" was selected (by popular vote) to be the new
brand name. The complete list of names submitted (by engineers)
for the contest is: naming.xls.
The runner-ups were: "joeker", "conoogle" and "query".
Some Intesting Links:
Original Approach Document: 27-Jan-2003 Approach
Functional Specification: 15-Jul-2003 Func Spec