[C#] How do I get started with OCR? And how reliable is it for anything that isn’t a scanned piece of paper?
I'll try to be succint:
I'm a very novice programmer working on learning C#, and I recently decided to undertake the project of making an app that can read a screenshot from a post-match screen of FIFA and insert all of the stats into a database so I can later do a statistical analysis in SPSS.
I got the database part working really well with SQLite, but now I'm stuck on how to read the screenshot and extract the data. I thought it would be a simple use case for OCR, so I started trying to understand Tesseract/tessnet, but first I thought I'd try to use readymade OCR software on the screenshots so I knew it would actually work.
Well, it only sort of did. It's not accurate enough, at least FreeOCR (which is powered by Tesseract) isn't. A lot of the numbers, which are very distinct and clear to my human eyes, at least, translate into a jumbled mess after OCR. Here's an example of a post match screen. FreeOCR's read on the column on the right looks like this. Notice how the 5 is just ignored and the stuff after yellow cards gets really confused.
Is there a way to make this work or is OCR technology just not there yet? I find that hard to believe… Would it work better to divide the image and try to do it in chunks? Even when I isolate a "0" FreeOCR can't seem to tell if it's a 0, a D or an O. Could some sort of a priori image processing help? Is there another solution I haven't thought of? Maybe there's a way of telling it that I just want digits, even specify sort of what the font looks like?
Sorry if the answer is really obvious or this is the wrong place to post this, but I'm stuck and would really like to make progress on this. Thanks in advance!
Submitted July 17, 2017 at 02:43PM by Greedish
via reddit http://ift.tt/2uBBCww