Representing language in human robots
When dealing with language there are many AI software that tries to represent language. Among the most popular categories are: language parsers, discrete mathematics, and semantic models. None of these fields (or a combination of them) can produce a machine that can fully understand language similar to human beings. Designing a machine that can learn language requires a lot of imagination and creativity. My design of how to represent language comes from two sources: Animation and videogames. Mostly videogames because that is where my key ideas come from.
Common sense knowledge using language is very hard to represent on a computer because its "all or nothing". Either the computer can understand the language similar to human beings or they don't understand the language at all. People who clean rooms for a living not only needs knowledge about cleaning rooms but also common knowledge that humans have. Basic things like: if you drop something it falls to the ground, if you break the law you will go to jail, if you throw an egg it will fall and break, if you don't eat you will get hungry. These are basic knowledge that every human should know. Machines on the other hand has to be fed the knowledge manually, unless someone builds a learning machine similar to a human brain. Even universal learning programs like the neural network require programmers to manually feed the rules and data in order for it to work. Like I said its "all or nothing".
If there exist a robot janitor and the function of the robot janitor is to clean the house, what happens when its mowing the lawn and it begins to rain? Common sense tells a real human to take shelter. However, in the case of the robot janitor, it doesn't know that its raining, unless you program it to take shelter when it rains. Another example is what if the janitor accidentally drop food on the ground, does it know that the food is contaminated? This is why it is very important to build a machine that is similar to a human brain in order for it to do anything human. The only way to build such a machine is by making software that can understand language.
Language is important because the robot needs to learn things from a society. The only way that humans can communicate with robots is if they both have some form of common language so that both parties understand each other. People who speak English can understand each other because the grammar and words used can be understood by everyone. Think of language as the communication interface between human robots and human beings.
There are basically 3 things that the AI software has to represent in the language: objects, hidden objects, and time. I don't use English grammar because English grammar is a learned thing. These 3 things I mentioned are a better way to represent language. If you think of objects as nouns and hidden objects as verbs, then that is what I'm trying to represent.
One day when I was playing a game for playstation 2, I couldn't notice that the game was repeating itself over and over again. When the characters jumped the same images appeared on the screen. When the enemies attacked the same images appeared on the screen. These repeated images was what gave me the idea that I can treat all the images on the screen like image layers in photoshop. I can use patterns to find what sequences of images belong to what objects. When the 360 degree images of one object is formed then I can use a fixed noun to represent that object (I call this 360 degree image sequence a floater). For example, if I have the 360 degree floater for a hat I can assign the letters "hat" to the floater. If I have the 360 degree floater for a dog I can assign the letters "dog" to the floater. The image processor will dissect the image layers out and the AI program will determine what the sequential image layers are. This is done by averaging the data in memory -- taking similar training data and analyzing what the medium is. When the averaging is finished the floater has a range of how "fuzzy" the object can be.
( If you are wondering how I got the name floaters here it is: One day I went for an eye exam and I asked the doctor what the cell-like thing that is occasionally blocking my left eye is. The doctor said that is a floater.)
Things like cat, dog, hat, dave, computer, pencil, tv, book are objects that have set and defined boundaries. Things like hand, mall, united states, universe don't have set boundaries. Either it doesn't have set boundaries or they are encapsulated objects. One example is the foot, when does a foot begin and when does a foot end? Since a foot is a part of a leg it is considered an encapsulated object. Another example is mall, when does the mall end and when does it begin? Since there are many stores and roads and trees that represent the mall we can't say where the mall ends and begins. The answer is the computer will figure all this out by averaging the data in memory. Another thing is that some objects are so complex that you have to use sentences to represent what it is. the Universe is one example, when does the universe begin and end? The answer is we use complex intelligence in order to represent the meaning to the word "universe".
The two pictures below best illustrates the point about image layers and floaters. The first picture displays the videogame. This is an old game for the Nintendo ultra. It is during my play with this game that I discovered my theories. There are many things that are displayed in the game. There are the lights, the fireplace, the table, the ground, the walls, the characters, the breakable objects and so forth. The image processor will dissect the most important image layers from the computer. It will then attempt to find a copy of this image layer in memory. Based on certain patterns within all the pixels and the relationship between each other the AI will understand what image layers belong "sequentially" -- consistency and repetition is the key. The computer will normalize all the image layers (including encapsulated image layers) until it comes to an agreement of what is considered an object and what are encapsulated objects. Below is an example of 3 major image layers (objects) that the computer has found: the ninja, the monster, and the background.
On the second picture is the 360 degree floater of the ninja character. All the possible moves of the character are stored as sequences in this floater. If the game is in 360 degree, like the one below, then the floater will have 360 degree image layer for each possible outcome. If the game is a 2-d game then the floater will have only possible outcomes of the character. "The creation of the floater is kind of like reverse engineering a videogame programmers work or reverse engineering an animators work--what does videogame programmers consider an object or what are the animators cell layers".
The next step is to take the floater and treat it as an object. This is how I represent objects visually in my program -- using patterns to find the 360 degree images of an object and all its possible moves. The rules program will bring the object "Ninja" and the floater of Ninja together. The target object is the word "Ninja" and the floater is the element object. Once the floater passes the assign threshold that means the word "Ninja" has the same meaning as the floater. At this point, any sequence wither its one frame or 300 frames of the floater is still considered the same object. You can stare at a table for hours but the table will still be a table. You can also walk around and stare at the table, the sequential images you see is still a table. The question people ask is: what happens if you break the table or what happens if there are other objects that make up a table. The answer is the AI will normalize the objects and output the most likely identification.
(The image processor in my program basically takes a still picture and without the help of a human being it dissects the images from the picture. It's kind of like cutting and pasting images in Photoshop. The only difference is that there is no need for the intelligence of a human to cut out images from the picture. The image processing program is almost done. Currently, it works very well. Things like dissecting a human from a still picture is very easy for the software to do. To appreciate the fruit of the image processor you can actually zoom in and find out that the computer dissected every hair strand from the human perfectly.)
There are other topics that concern objects such as encapsulated objects (a human object can have thousands of encapsulated objects) and priority of objects and partially missing objects but I won't get into those topics. I cover all those things and more in the books.
Sometimes there are objects that don't have any physical characteristics. Action words are things that don't have physical characteristics. Things like walking, talking, jumping, running, throwing, go, towards, under, over, above, until, and so forth. These words are considered hidden objects because there is no image, sound, taste, or touch object that can represent them. The only way to represent these objects is through hidden data that is set up by the 5 senses. Let's call the the 5 senses the current pathway -- the pathway that the computer is experiencing. In order to illustrate this point I will only refer to the visual part of the current pathway.
Within the visual movie are hidden data that I have set up. This is done because I wanted the computer to find patterns within visual movies. Some of these hidden data are: the distance between pixel/s and the relationship between one image layer and another image layer (I cover this part comprehensively in my books). Let's illustrate this point by using a simple word: jump. The computer will take several training examples from the visual movie regarding jump sequences. As you already know, variation to a jump sequence can range exponentially. A person can jump from the front, back, side, at and angle, top, 10 feet away, or 100 yards away. The person doing the jumping can be other objects such as dog, rat, horse, or even a box. There are literally infinite ways that the jump sequence can be represented in our environment. The computer will take all the similar training examples and average the hidden data out. Every time that a hidden data is repeated the computer makes that hidden data stronger (hidden data are considered objects). The hidden data are also encapsulated so that groups of common hidden data are combined into one object. As more and more training are done the computer will have the same hidden data for the same fixed word: jump. The rules program will bring the word "jump" and the hidden data closer to one another. When it passes the assign threshold the word "jump" will be assigned the meaning (hidden data).
Below is an example of how the word jump is assigned a meaning. First the computer analyzes each jump sequence: R1, T1 and C1. It will analyze all the hidden data that all three jump sequences have and group those common traits into an object. Then the rules program will take the word "jump" and assign it to the closest meaning.
The rules program is another thing I want to mention. When you train the robot, timing of the training is crucial. The reason why the word jump is associated with the jump sequence is because the jump sequence happens and either during the jump sequence or closely timed is the word "jump". The close training of the word jump and the jump sequence is what brings the two together. If the word "jump" is experienced and the jump sequence happens 2 hours later, the computer will not know that there is a relationship between the word "jump" and the jump sequence. This is how the machine will learn language, by analyzing closely timed objects. This is also a way to rule out coincidences and things that happen only once or twice.
Time is another subject matter that has to be represented in terms of language. In my program there is no such thing as 1 second, 1 minute, 5 years, or 2 centuries. The time that we know are learned time and isn't used in my program. What I have done is create an internal timer that will run infinitely at intervals of 1 millisecond. The AI will use this internal clock and try to find if there are objects (words) that have relationships to the internal clock. The timing in the AI clock can also be considered an object. For example, if someone says "1 second". After many training examples the computer will find a pattern between "1 second" and 100 milliseconds in the AI's internal clock. This internal clock of 100 milliseconds will be an object that has the same meaning as "1 second".
The above information concludes how my program represent things like nouns, verbs, time, and grammar. When we are dealing with entire sentences the computer has to do all the hard work by averaging all the training examples, looking for patterns, and assigning meaning to words in the sentence. The sentence itself is considered a fixed movie sequence while the meaning to the sentence changes as the robot learns more.
Patterns and language
Now that I have discussed all the basics of how most words are represented let's get into something more complex which is finding patterns. When a question like: where is the bathroom? is asked. This form of question looks within memory and use the data (such as distance and length, timing, searching of data in memory) to find the pattern to answer the question. Things like where is the book, where is the sofa, where is Mcdonalds, where is the University, where is dave? All these questions rely on a universal question-answer pathway. The AI will look into memory and find out that there is a relationship between the question and a specific type of search data in memory. It will find that it has to know where the robot is located presently (this is done by looking around and identifying its currently location). Then the computer will look into memory for the bathroom that is located in the current location. If the bathroom location is found in memory it will output the answer: "the bathroom is located -----". If it doesn't know (no bathroom memory in current location) it will either say it doesn't know or it will attempt to find more information to answer the question.
This pattern finding doesn't just rely on questions and answers but also statements and orders. If someone said: "remember to buy cheese at the supermarket". This statement has a recurring pattern and it requires that there are many training examples so that the AI can find these patterns. The pattern is when the robot gets to the supermarket, sometime during the purchase of goods, the statement pops up in memory "remember to buy cheese". Sometimes the robot forgets (either a learned thing or the pattern wasn't trained properly).
As you can notice that this whole human level artificial intelligence program is all about finding patterns. I set up the different kind of patterns to look for and the computer uses the universal AI program to find those patterns and assign those patterns to language. Language will always be fixed (unless society changes it) but the patterns that represent language changes from one time period to the next. There are also multiple meaning to fixed words.
This type of machine to represent language is considered "universal" because the program can be applied to all languages including sign language. different languages use different words to represent the same things. "cat" in English, "neko" in Japanese, and "mau" in Chinese are all talking about the same object. Different verbs in English, German, or Latin are all talking about the same verbs. Even something like sign language uses fixed sequential hand motion to represent words and phrases. The grammar too also rely on patterns and different ways of stringing words/verbs together to mean something. This is easily done with the AI program because finding patterns is what it was designed to do. As long as the grammar in that language repeats itself or have some kind of rule (regardless of how complex) then the pattern will be recognized by the AI.
Copyright 2007 (All rights reserved)