Document scanning is the process of replicating a paper document in a digital format for use on computing systems. Generally this is accomplished through an optical scanning process using a scanner or camera.

PDF files are the document format of Adobe Acrobat. To understand how to properly scan your documents and format the resulting data in a PDF file, it is important to understand the PDF format and how it treats texts and images. Similarly it's important to understand how a scanned document will treat text and images. This is because text is not always text, sometimes it may just be an image of text.

This is the case in scanned images. The main functional difference between text and the image of text, is that text is easily exportable to different formats as well as edited, whereas an image of text presents a more difficult case for both. While it can still be accomplished, to do so requires translating the image of text into actual text.

The output of document scanning doesn't necessarily have to be converted to actual text when creating PDF files though. This is because PDF files display text and images, and if no editing needs to be done, the image of text is sometimes acceptable. In such a case, all that needs to be done is to scan the document, and paste or import it into a PDF capable editor. It's easy enough. Even if the text does not need to be edited, in some cases it is still desirable to convert the image of text into actual text. This is because of the way that PDF viewers draw text and images to the screen. While a PDF viewer can scale images, when increasing the zoom no additional information (resolution) is created. This leads to pixelation (blocks) at higher zoom levels than the native resolution of the image. The same thing will happen with images of text of course, meaning that the readability of the text will be affected by increasing the zoom level.

To some extent, the same will hold true for reduced zoom levels. As the image decreases in resolution, there is a loss of information displayed from the image to the screen. Depending on the algorithm for image reduction used by the viewer, this may cause images that contain text to become blurry or even unreadable at low zoom levels.

PDF viewers do not have this problem (at least to the same extent) when displaying native text though. This is because when a zoom level changes, the font used for the text changes as well, meaning that at any given zoom level, the text is being drawn at that resolution. There still may be some readability problems at very low zoom levels, but only if the font at that resolution is unreadable. At high zoom levels, the text will remain crisp and true to it's font rather than becoming blocky and pixelated.

So how do you go about extracting the text from images created by document scanning, so that the text can be edited and natively displayed in PDF files? That process is called optical character recognition (OCR). There are many OCR products, some even may be bundled with the scanner you are using. If not, there's plenty of freely available OCR programs and web services that can handle the task for you. Though be forewarned, not all OCR is the same, and even the best OCR will likely need to be edited afterward for accuracy as there tend to be errors. This is especially true for smaller fonts or blurry text, so make sure the document scanning is done at the proper resolution and as clearly as possible. Once you have run the scanned document through OCR, the resulting text can be pasted or imported into the PDF editor just as any other text.