Advanced CAPTCHA Techniques
The basic principal of a CAPTCHA is to prevent automated submissions, thus consideration is given only to those deemed to have been submitted by a human. It is the means of implementing a test, also known as a reverse Touring test, that can be solved by most humans yet exceedingly difficult or impossible with computer software of the time.
The most egregious spammers use software, also known as robots, to post unsolicited material via the Internet. Most site owners should be are aware of the importance of moderating public and anonymous posts. However, sifting through hundreds, even thousands of inappropriate posts generated by a robot is certainly not ideal. A good CAPTCHA system can reasonably determine if submission attempts are being generated by a robot or a human, thus discarding or preventing robot interaction.
Down With The Illegible CAPTCHA!
Down with the squiggly characters and phrases! Why not a system easy enough for a young child yet overwhelmingly difficult for today's robots? Optical Character Recognition (OCR) is not a recent concept, in fact it has been around much longer than the world wide web. Object recognition on the other hand is still largely something of the future. I began to think about how my two and a half year old sees the world and even at such a young age the incredible processing power of the human intellect. Even a very young child can easily distinguish thousands of objects in a countless number of variations easily distinguishing which parts of an image are significant.
When deciding how I would design the system, my first thought was how annoyed I had been in the past with traditional methods. Severely distorted characters or phrases, that may or may not be possible to read even with perfect vision. Then of course the greater annoyance of carefully entering each character only to discover the system claims you failed and must try again.
Non Traditional Approach
The idea for the CAPTCHA used on this site is not entirely unique as I have studied advanced CAPTCHA techniques on similar principals. What I like about this system is the simple yet highly effective use of common everyday images with a very simple question. It is a lightweight system, easily updated and modified, yet a great enough challenge that it is highly unlikely a robot will crack it any time soon.
Some spammers will include partial human interaction in an attempt to crack a CAPTCHA system. For this reason a variety of possible questions may exist for the same image. This is to prevent partial human interaction to solve the test based on a particular image. The implementation of this design also restricts the number of alternate CAPTCHA tests that may be requested in order to thwart attempts to harvest the database. By limiting the number of incorrect answers a robot will "get the boot" for an unspecified length of time if too many failed attempts are made.
Consider the following image, a robot would have to “comprehend” the question presented and then have the capacity to isolate the appropriate aspects of the photograph. A huge amount of processing power would be required before it had any chance of solving the simple question, “how many of these people are wearing black suits?”. Yet a child not yet in kindergarten could solve the question with ease.
Consider the next image, my two year old immediately recognized apples and specifically 1 orange. The challenge of object recognition for this particular image may not be that far off, but then a robot must know what question is being presented as well. What if the fruit were in a basket without a clear distinction between them? Not enough to fool my two year old but adds a entirely new level of complexity for object recognition software.
Most websites, especially those with public interaction, can expect some type of attack. A good CAPTCHA system can significantly reduce the number of inappropriate submission attempts on your website but is merely an efficiency tool on the front line. Proper security measures should always be in place.