Summary of Sandy:
Sandy is a program that understands the English language from a mechanical perspective. It organizes the relationships between words and their definitions, with the focus on the meaning behind the relationship of the definition to the word. Sandy is built on top of the English dictionary, (specifically WordNet 2.1) incorporating and organizing the dictionary data to form a type of Ontology for each word and definition derivation. The current prototype contains a base of 36,554,566 unique Ontologies.
When a document (text, HTML, XML or anything exposing text) is presented to Sandy, Sandy has the ability to identify well formed thoughts. Sandy processes the well formed thoughts by deconstructing the thought into its logical components, subject and predicate. If the thought can be simplified i.e.; a compound sentence, Sandy will create multiple simple sentences from the compound sentence. Sandy then leverages the individual Ontologies to create relationships between the individual words with in the thought. These relationships allow Sandy to apply pattern matching and probability to establish the specific definition for each word. For example given the following statement: At the start of the trial the judge showed the jury the photographs in a private chamber
First the pattern is established: Starting Adjunct: At the start of the trial, Subject: the judge, Sentence verb: showed, Indirect object: the jury, Direct object: the photographs, Ending Adjunct: in a private chamber.
Second: use the relational Ontology, pattern matching, and probability to establish context. The subject “the judge” will be assigned the value of “a public official authorized to decide questions brought before a court of justice”, as apposed to: “an authority who is able to estimate worth or quality”. Sandy applies the same logic across multiple thoughts or sentences, as it parses text. Sandy re-evaluates past assignments as new context is established. At the top there is the concept of topic that creates relationships across sentences, much along the same line as a “Topic of conversation”.
Sandy has the ability to store the data based on the establish topic in the components that make up the thought, i.e. indexing the Adjunct, Subject, Verb, Subject compliment, Indirect object, Direct object, Object complement. Indexing the data in a way that reflects the sentence patterns, allows Sandy to search on any given component. For example, given the following thought: “With almost 330,000 employees worldwide and revenues of 91 billion annually IBM is the biggest information technology company in the world”, one could ask: “What is IBM”, Sandy would return “IBM is the biggest information technology company in the world”.
Sandy recognizes two types of thoughts; the informative and the interrogative. If the thought is informative, Sandy processes the data as stated. If the thought is interrogative Sandy performs a local search in an attempt to answer the question. As part of the database build out, Sandy takes each word and properly relates it to the underlying definition to generate a well formed sentence or thought. These statements are then digested into thought component tables.
At the end of indexing the English dictionary we start to see the power of the product. One might inquire “what is carbon-dioxide” and the response would be “a heavy, odorless, colorless gas formed during respiration and by the decomposition of organic substances; absorbed from the air by plants in photosynthesis”. This is no different than looking up the word in a dictionary. The real power comes from indexing the word “Black”, where there is a specific definition of “Joseph Black”, “A British chemist who identified carbon dioxide and who formulated the concepts of specific heat and latent heat”. Based on indexing this thought one could not only inquire on a subject, carbon-dioxide but more powerfully ask question related to carbon-dioxide, i.e. “Who discovered carbon-dioxide? The information returned would be “Joseph Black identified carbon dioxide”. One could go on to ask “Who was Joseph Black”
This is a simple example of indexing well established information. You may have been able to find the same level of information by looking up carbon-dioxide in your favorite encyclopedia, scan through the page and you might find this information. Wikipedia has a reference to a Scottish physician named Joseph Black. After indexing all of Wikipedia, one could argue that Sandy has knowledge of the universe as we know it.
If you move this process into data that is not so established (just about every thing else) you have a new way to look at indexing free formed information. For example, If Sandy indexes a list of financial websites; one could ask questions about potential investments. i.e. “Who thinks IBM is a good buy”. Sandy would reply with a list of people who have stated that IBM is a positive investment”. In this example Sandy accomplishes dynamics by leveraging the Ontology’s for the words and definitions of the question, “IBM”, “good” and “buy”.
Sandy represents a way of organizing information that can refine the data to produce a more meaningful response to a search or question. Inherent to having a program that can interpret the English language opens the door to the prospects of using everyday conversation as a user interface. Sandy is well on the road to being an application that is conversational on human terms.
Thank you for your time. If there is any interest or advice on where to go, please contact me at this email address. Regards Tom Cowley sandypondfarm@hotmail.com
|