- Sun Oct 16, 2011 1:40 pm
#38326
The following is written for programmers. Other may have trouble understanding what I'm taking about.
Help Bot currently uses the Levenschtein algorithim (http://en.wikipedia.org/wiki/Levenshtein_distance) to calculate similarity between sentences and determine a proper response. The problem is that the algorithm doesn't work well on whole sentences; It's primary application is recognizing individual words. When used on a whole sentence there is far too much room for misidentification.
If Intelli will take some advice, this thing can get fixed quickly.
1st) Use the Damerau-Levenshtein algorithm instead. (http://en.wikipedia.org/wiki/Damerau%E2 ... n_distance) This algorithm improves recognition at lower distance maximums. (btw-I can't believe they named these things. We used to just call it the spell-check algorithm.)
2nd) The algorithm needs to have 2 seperated layers: word recognition and sentence recognition.
The first step is to parse the sentence into individual words. Run the algorithm on the letters in each word seperately. This should result in words like "bake" and "BiKe" being identified as the same. It's ok, that's normal.
Once the words have been identified, look for any keywords like "realm" "help" "bones" and a plethora of swears. Looking for the appropriate keywords will prevent "How do I bake?" from being misidentified as "How do I fly?"
If any keywords are present run the algorith again, but this time on the words in the sentence. For instance: "This cake is dry." should only be 1 substitution from "This cake is yellow." (substituting the word yellow for the word dry)
Once correct sentences are identified, respond appropriately.
3rd) Use a variable-length for maximum distance based on the length of the word/sentence. that's just common sense really.
4th) You're going to need a pretty large bank of words/sentences before the Help Bot is at full functionality. This is unavoidable, time consuming, and will never stop growing. I hope you realized that when you started this project.
5th) Don't hook Help Bot back up to Cleverbot. The purpose of Help Bot is to dissiminate specific information only when needed. One of the purposes of Cleverbot is to respond to all incoming stimulous. These are mutually exclusive goals. The fact that Cleverbot has additional goals only makes the issue more complicated and less useful.
Well Intelli, I hope you take some advice from an old-timer. I would really like to see Help Bot live up to it's potential.
EDIT:
Help Bot currently uses the Levenschtein algorithim (http://en.wikipedia.org/wiki/Levenshtein_distance) to calculate similarity between sentences and determine a proper response. The problem is that the algorithm doesn't work well on whole sentences; It's primary application is recognizing individual words. When used on a whole sentence there is far too much room for misidentification.
If Intelli will take some advice, this thing can get fixed quickly.
1st) Use the Damerau-Levenshtein algorithm instead. (http://en.wikipedia.org/wiki/Damerau%E2 ... n_distance) This algorithm improves recognition at lower distance maximums. (btw-I can't believe they named these things. We used to just call it the spell-check algorithm.)
2nd) The algorithm needs to have 2 seperated layers: word recognition and sentence recognition.
The first step is to parse the sentence into individual words. Run the algorithm on the letters in each word seperately. This should result in words like "bake" and "BiKe" being identified as the same. It's ok, that's normal.
Once the words have been identified, look for any keywords like "realm" "help" "bones" and a plethora of swears. Looking for the appropriate keywords will prevent "How do I bake?" from being misidentified as "How do I fly?"
If any keywords are present run the algorith again, but this time on the words in the sentence. For instance: "This cake is dry." should only be 1 substitution from "This cake is yellow." (substituting the word yellow for the word dry)
Once correct sentences are identified, respond appropriately.
3rd) Use a variable-length for maximum distance based on the length of the word/sentence. that's just common sense really.
4th) You're going to need a pretty large bank of words/sentences before the Help Bot is at full functionality. This is unavoidable, time consuming, and will never stop growing. I hope you realized that when you started this project.
5th) Don't hook Help Bot back up to Cleverbot. The purpose of Help Bot is to dissiminate specific information only when needed. One of the purposes of Cleverbot is to respond to all incoming stimulous. These are mutually exclusive goals. The fact that Cleverbot has additional goals only makes the issue more complicated and less useful.
Well Intelli, I hope you take some advice from an old-timer. I would really like to see Help Bot live up to it's potential.
EDIT:
Intelli wrote:The help bot was not hooked up to cleverbot...I just assumed it was hooked up to cleverbot or something like it. The learning/responses were very similar.
Last edited by CirJohn on Sun Oct 16, 2011 2:54 pm, edited 1 time in total.
The only real thing you have on the internet is your reputation.
Decisions should be based on facts, not friendships.
Decisions should be based on facts, not friendships.