Tuesday, March 18, 2008

Creating an Army of Free Captcha Typers

People are finally starting to catch on to this technique. However I’m finding a lack of tact in how to accomplish it successfully. The objective is simple. How do we get a bunch of other people to type in Captchas for us willingly? First we’ll dive into why this is useful. Lets say you’re spamming Myspace or Yahoo accounts for instance. You could either attempt to defeat it through various OCR techniques or hire people from India to type them in for you. Captcha decoding is very tough to master and uses loads of server usage. Reliable Indians are tough to find and it poses a margin that you must beat. So now you’re starting to consider your options. There are still some out there but the best one I can recommend would be to find a way to get others to do it for you by giving them an incentive.


What Will You Need

First you will need to create the script that will actually grab the Captcha and output it so you can display it to the user without them knowing what the real reason is of they are typing it in. Next you will need to actually require the user to type it in in order to preform an action on your site. Next your script will need to grab what they type in and check to see if it was correct. Lets say your spamming Myspace. So you display the Captcha on another site of yours and when the user types it in, a second script actually makes the account using what they said the Captcha was. Make sense?


The Incentive

I’ve seen a few ideas mole around about how to do this. Some suggest adding the Captcha to your blog comments. It doesn’t really do anything and if they don’t type it in, it will still allow the comment. However anyone who does type it in. The script will create the account and boom your in like flint. However each of these ideas I’ve heard have one major flaw; lack of traffic. Even if your blog or other site gets 20 comments/day that still isn’t very many captcha types and hardly worth the trouble. I’ve also heard some ideas about offering free porn but the problem is still the same wouldn’t you agree? So you’ll need a site that will grab major traffic, lots of pageviews, and most of all the users will be obligated to type the traffic.


Here’s My Proposal

Try creating a webproxy. A web proxy is designed to bypass proxy restrictions through a web interface. For example, in a university, the IT department blocks a lot of harmless websites simply because of their popularity. So people use webproxies to access those websites. Creating a webproxy is great because they draw MAJOR traffic and TONS of pageviews very quickly. They don’t require much promotion to become very popular. They are also very easy to setup. First you’ll need to find a good webproxy script. I’d recommend CGIProxy or PHP Proxy. Use whatever is most comfortable for you to setup and edit. Next you need to do a tad bit of promotion on it. The best way is to sign it up on a few proxy top sites lists or free proxy directories. You’ll find out quickly that you won’t need much promotion to get them some serious daily traffic. I recently setup a webproxy (Unblock Myspace) and it was getting 400 unique visitors/day and 20,000 pageviews/day within days of opening with very little promotion. Currently its getting about 700 unique visitors/day and approx 72,000 pageviews/day. To put that in the math sense; if I display a Captcha for the visitor to type in every 25 random pageviews in order for them to continue to the page they wanted to go to. That comes out to about 2880 Captchas/day. In case you were paying attention thats A LOT of Myspace accounts(money). ':)'


So whats the downside? Well first web proxies are a big strain on servers. A lot of hosting providers don’t allow them for that reason. An easy way to solve this is to get your own virtual private server/dedicated server, or get a buddy to donate a dedicated box. Another would be to have multiple hosting providers and have the script disperse the traffic amongst them evenly to lighten the load. Where theres a will(money) theres a way.

Captchas Captchas Captchas

Guess what I’m in the mood to talk about? You guessed it. Captchas! In fact I feel like dedicating a whole week, maybe more depending on if any downtime occurs. ':)' to talking about nothing but captcha breaking. We’ll break every captcha in the book and even by the end of this post the captchas that haven’t been created yet. Furthermore, for this week only I am accepting any and all captcha related guest posts. So if you got a captcha solved or want to discuss techniques to breaking them feel free to write up a guest post and email it to ELI at BLUEHATSEO.COM in html form. You can stay anonymous and not only will I put it up but I’m also willing to put up any ad you’d like. Pick any text or banner ad you’d like to put up with your post and I’ll include it. With as many readers as this place has I’m sure it’ll get clicked. Also be sure to include your paypal address. If I really like your guest post I may even send you a $100 as a thank you. Also, all you bloggers are welcome to repost any of the captcha related posts on this blog. I now declare any captcha related posts on this blog public domain and republishable under full rights. For some odd reason I feel like blowing the captcha breaking industry the fuck up. Like my favorite saying goes, if you’re going to wreck a room you might as well WRECK it. Lets begin by visiting one of my first captcha related posts; the Army Of Captcha Typers.


The Army of Captcha Typers is a great technique because it doesn’t require loads of programming and is 100% adaptable to any captcha. I suggest you go back and reread it, but in interest of keeping this short here’s a quick summary.



You use a service, I used a proxy site as an example, to get the users to type in the captchas for you. It records what the user typed in as the solution to the captcha and you use that to solve it. The more pageviews the service provides per user the more effective it is to breaking captchas. Why pay Indians or tediously code it yourself?


Normally I like to leave most of the code and creative portion out of the written technique in interest of not ruining the technique and to help the methods be more effective through use of spins and unique code. I don’t write this blog to ruin techniques, and those people who claim I do are just insecure and like to claim they already know everything. As common sense as most of the stuff I post is, I haven’t met a person yet who hasn’t in some way learned something from this blog. That truth brags a lot louder than most SEO blogs I’ve seen. But! If we’re going to wreck something lets wreck it. In that spirit I see no reason why every newbie on the planet shouldn’t be able to easily throw up their own web proxy site that solves captchas for them so here’s the script to do it.


Captcha Solving Web Proxy


This a modified version of CGIPROXY that I mentioned in the post. Basically you install it following the included instructions (README file). Then you setup your web proxy site. Target a niche such as kids behind a school proxy or something similar. There is an extra file included called captcha.cgi. Upload it to the cgi-bin in the same folder as the nph-proxy.cgi and give it 755 chmod permissions. Make a folder one directory below your cgi-bin called captchas. Give it read/write permissions (777 should work all else fails). Then anytime you got a captcha to solve upload it to that directory with a unique filename. This can be done automatically with whatever script you’re using to spam a captcha protected site. On the very next pageview the webproxy will require the person to type in the captcha and disguise it as a human check to prevent abuse. Any captcha works. Once it gets their response it’ll delete the captcha from the folder and write out the solution along with the filename to a new file called solved.txt. Format: characters|image.jpg\n . Remember to make some kind of reminder or code for the filename so you know which image is which when you go to use the solutions. Get enough users to your webproxy (which is very easy) and you can solve any captcha in moments.


Enjoy!

Captcha Breaking W/ PHPBB2 Example

This is a fantastic guest post by Harry over at DarkSEO Programming. His blog has some AWESOME code examples and tutorials along with an even deeper explanation of this post so definitely check it out and subscribe so he’ll continue blogging.


This post is a practical explanation of how to crack phpBB2 easily. You need to know some basic programming but 90% of the code is written for you in free software.


Programs you Need


C++/Visual C++ express edition - On Linux everything should compile simply. On windows everything should compile simply, but it doesn’t always (normally?). Anyway the best tool I found to compile on windows is Visual C++ express edition. Download


GOCR - this program takes care of the character recognition. Also splits the characters up for us ';)' . It’s pretty easy to do that manually but hey. Download


ImageMagick - this comes with Linux. ImageMagick lets us edit images very easily from C++, php etc. Install this with the development headers and libraries. Download from here


A (modified) phpbb2 install - phpBB2 will lock you out after a number of registration attempts so we need to change a line in it for testing purposes. After you have it all working you should have a good success rate and it will be unlikely to lock you out. Find this section of code: (it’s in includes/usercp_register.php)


if ($row = $db->sql_fetchrow($result))

{

if ($row['attempts'] > 3)

{

message_die(GENERAL_MESSAGE, $lang['Too_many_registers']);

}

}

$db->sql_freeresult($result);



Make it this:


if ($row = $db->sql_fetchrow($result))

{

//if ($row[’attempts’] > 3)

//{

// message_die(GENERAL_MESSAGE, $lang[’Too_many_registers’]);

//}

}

$db->sql_freeresult($result);


Possibly a version of php and maybe apache web server on your desktop PC. I used php to automate the downloading of the captcha because it’s very good at interpreting strings and downloading static web pages.


Getting C++ Working First


The problem on windows is there is a vast number of C++ compilers, and they all need setting up differently. However I wrote the programs in C++ because it seemed the easiest language to quickly edit images with ImageMagick. I wanted to use ImageMagick because it allows us to apply a lot of effects to the image if we need to remove different types of backgrounds from the captcha.


Once you’ve installed Visual C++ 2008 express (not C#, I honestly don’t know if C# will work) you need to create a Win32 Application. In the project properties set the include path to something like (depending on your imagemagick installation) C:\Program Files\ImageMagick-6.3.7-Q16\include and the library path to C:\Program Files\ImageMagick-6.3.7-Q16\lib. Then add these to your additional library dependencies CORE_RL_magick_.lib CORE_RL_Magick++_.lib CORE_RL_wand_.lib. You can now begin typing the programs below.


If that all sounds complicated don’t worry about it. This post covers the theory of cracking phpBB2 as well. I just try to include as much code as possible so that you can see it in action. As long as you understand the theory you can code this in php, perl, C or any other language. I’ve compiled a working program at the bottom of this post so you don’t need to get it all working straight away to play with things.


Getting started


Ok this is a phpBB2 captcha:



It won’t immediately be interpreted by GOCR because GOCR can’t work out where the letters start and end. Here’s the weakness though. The background is lighter than the text so we can exclude it by getting rid of the lighter colors. With ImageMagick we can do this in a few lines of C++. Type the program below and compile/run it and it will remove the background. I’ll explain it below.




using namespace Magick;


int main( int /*argc*/, char ** argv)

{


// Initialize ImageMagick install location for Windows

InitializeMagick(*argv);


// load in the unedited image

Image phpBB("test.png");


// remove noise

phpBB.threshold(34000);


// save image

phpBB.write("convert.pnm");


return(1);

}


All this does is loads in the image, and then calls the function threshold attached to the image. Threshold filters out any pixels below a certain darkness. On linux you have to save the image as a .png however on windows GOCR will only read .pnm files so on linux we have to put the line instead:




// save image

phpBB.write("convert.png");




The background removed.


Ok that’s one part sorted. Problem 2. We now have another image that GOCR won’t be able to tell where letters start and end. It’s too grainy. What we notice though is that each unjoined dot in a letter that is surrounded by dots 3 pixels away should probably be connected together. So I add a piece of code onto the above program that looks 3 pixels to the right and 3 pixels below. If it finds any black dots it fills in the gaps. We now have chunky letters. GOCR can now identify where each letter starts and ends ':D' . We’re pretty much nearly done.




using namespace Magick;


void fill_holes(PixelPacket * pixels, int cur_pixel, int size_x, int size_y)

{

int max_pixel, found;


///////////// pixels to right /////////////////////

found = 0;

max_pixel = cur_pixel+3; // the furthest we want to search

// set a limit so that we can't go over the end of the picture and crash

if(max_pixel>=size_x*size_y)

max_pixel = size_x*size_y-1;


// first of all are we a black pixel, no point if we are not

if(*(pixels+cur_pixel)==Color("black"))

{

// start searching from the right backwards

for(int index=max_pixel; index>cur_pixel; index--)

{

// should we be coloring?

if(found)

*(pixels+index)=Color("black");


if(*(pixels+index)==Color("black"))

found=1;

}

}


///////////// pixels to bottom /////////////////////

found = 0;

max_pixel = cur_pixel+(size_x*3);

if(max_pixel>=size_x*size_y)

max_pixel = size_x*size_y-1;


if(*(pixels+cur_pixel)==Color("black"))

{

for(int index=max_pixel; index>cur_pixel; index-=size_x)

{

// should we be coloring?

if(found)

*(pixels+index)=Color("black");


if(*(pixels+index)==Color("black"))

found=1;

}

}


}


int main( int /*argc*/, char ** argv)

{


// Initialize ImageMagick install location for Windows

InitializeMagick(*argv);


// load in the unedited image

Image phpBB("test.png");


// remove noise

phpBB.threshold(34000);


/////////////////////////////////////////////////////////////////////////////////////////////////////

// Beef up "holey" parts

/////////////////////////////////////////////////////////////////////////////////////////////////////

phpBB.modifyImage(); // Ensure that there is only one reference to

// underlying image; if this is not done, then the

// image pixels *may* remain unmodified. [???]

Pixels my_pixel_cache(phpBB); // allocate an image pixel cache associated with my_image

PixelPacket* pixels; // 'pixels' is a pointer to a PixelPacket array


// define the view area that will be accessed via the image pixel cache

// literally below we are selecting the entire picture

int start_x = 0;

int start_y = 0;

int size_x = phpBB.columns();

int size_y = phpBB.rows();


// return a pointer to the pixels of the defined pixel cache

pixels = my_pixel_cache.get(start_x, start_y, size_x, size_y);


// go through each pixel and if it is black and has black neighbors fill in the gaps

// this calls the function fill_holes from above

for(int index=0; index
fill_holes(pixels, index, size_x, size_y);


// now that the operations on my_pixel_cache have been finalized

// ensure that the pixel cache is transferred back to my_image

my_pixel_cache.sync();


// save image

phpBB.write("convert.pnm");


return(1);

}


I admit this looks complicated on first view. However you definitely don’t have to do this in C++ though if you can find an easier way to perform the same task. All it does is remove the background and join close dots together.


I’ve given the C++ source code because that’s what was easier for me, however the syntax can be quite confusing if you’re new to C++. Especially the code that accesses blocks of memory to edit the pixels. This is more a study of how to crack the captcha, but in case you want to code it in another language here’s the general idea of the algorithm that fills in the holes in the letters:


1. Go through each pixel in the picture. Remember where we are in a variable called cur_pixel

2. Start three pixels to the right of cur_pixel. If it’s black color the pixels between this position and cur_pixel black.

3. Work backwards one by one until we reach cur_pixel again. If any pixels we land on are black then color the space in between them and cur_pixel black.

4. Go back to step 1 until we’ve been through every pixel in the picture


NOTE: Just make sure you don’t let any variables go over the edge of the image otherwise you might crash your program.


I used the same algorithm but modified it slightly so that it also looked 3 pixels below, however the steps were exactly the same.


Training GOCR


The font we’re left with is not recognized natively by GOCR so we have to train it. It’s not recognized partly because it’s a bit jagged.



Assuming our cleaned up picture is called convert.pnm and our training data is going to be stored in a directory call data/ we’d type this.


gocr -p ./data/ -m 256 -m 130 convert.pnm


Just make sure the directory data/ exists (and is empty). I should point out that you need to open up a command prompt to do this from. It doesn’t have nice windows. Which is good because it makes it easier to integrate into php at a later date.


Any letters it doesn’t recognize it will ask you what they are. Just make sure you type the right answer. -m 256 means use a user defined database for character recognition. -m 130 means learn new letters.


You can find my data/ directory in the zip at the end of this post. It just saves you the time of going through checking each letter and makes it all work instantly.


Speeding it up


Downloading, converting, and training for each phpbb2 captcha takes a little while. It can be sped up with a simple bit of php code but I don’t want to make this post much longer. You’ll find my script at the end in my code package. The php code runs from the command prompt though by typing “php filename.php”. It’s sort of conceptual in the sense that it works, but it’s not perfect.


Done


Ok once GOCR starts getting 90% of the letters right we can reduce the required accuracy so that it guesses the letters it doesn’t know.


Below I’ve reduced the accuracy requirement to 25% using -a 25. Otherwise GOCR prints the default underscore character even for slightly different looking characters that have already been entered. -m 2 means don’t use the default letter database. I probably could have used this earlier but didn’t. Ah well, it doesn’t do a whole lot.


gocr -p ./data/ -m 256 -m 2 -a 25 convert.pnm


We can get the output of gocr in php using:


echo exec(”/full/path/gocr -p ./data/ -m 256 -m 2 -a 25 convert.pnm”);


Alternatives


In some instances you may not have access to GOCR or you don’t want to use it. Although it should be usable if you have access to a dedicated server. In this case I would separate the letters out manually and resize them all to the same size. I would then put them through a php neural network which can be downloaded from here FANN download


It would take a bit of work but it should hopefully be as good as using GOCR. I don’t know how well each one reacts to letters which are rotated though. Neural networks simply memorize patterns. I haven’t checked the inner workings of GOCR. It looks complicated.


My code


All the code can be found here to crack phpBB2 captcha.


Zip Download


In conclusion to this tutorial it’s a nightmare trying to port over all my code from linux to windows unless it’s written in Java ':D' . If only Java was small and quick as well.


It’s worth stating that phpbb2 was easy to crack because the letters didn’t touch or overlap. If they had touched or overlapped it would probably have been very hard to crack.


I plan to look at that line and square captcha that comes with phpBB3 over on my site and document how secure it is.


Thanks for the awesome guest post Harry.