Alle handleidingen

PDF OCR6 min leestijdApril 29, 2026

How to OCR a PDF for Free — Extract Text from Scanned Documents

Scanned PDFs look like normal documents, but you can't select, search, or copy any of the text. OCR fixes that by reading the image and extracting the text — and you can do it for free without uploading your file anywhere.

PDF OCR — How to OCR a PDF for Free — Extract Text from Scanned Documents — Photo via Pexels

You receive a scanned contract, a photographed receipt, or a digitized book chapter as a PDF. It looks perfectly readable. But the moment you try to select a word, copy a sentence, or search for a phrase — nothing happens. The PDF is just a stack of images pretending to be a document.

OCR (Optical Character Recognition) solves this. It analyzes the image of each page, identifies the letters and words, and extracts them as actual text you can copy, search, and edit. This guide covers what OCR is, when you need it, and how to do it for free without uploading sensitive files to a server.

What Is OCR and How Does It Work?

OCR is a technology that converts images of text into machine-readable text. When a document is scanned or photographed, the resulting PDF contains pictures of pages — not actual text data. An OCR engine looks at those images, detects the shapes of individual characters, matches them against known letter patterns, and outputs the recognized text.

Modern OCR engines are remarkably accurate on clean, printed text. They handle a wide range of fonts, sizes, and layouts. Where they struggle is with handwritten text, severely skewed pages, extremely low-resolution scans, or documents where the text is obscured by stains, stamps, or heavy formatting.

How to Tell If Your PDF Needs OCR

Open the PDF in any viewer and try to select text with your cursor. If you can highlight individual words and copy them to a clipboard, the PDF already contains real text — you don't need OCR. If clicking and dragging selects nothing, or selects the entire page as a single image, you have a scanned PDF that needs OCR to extract the text.

Another quick test: press Ctrl+F (or Cmd+F on Mac) and search for a word you can see on the page. If the search finds nothing, the PDF has no text layer and OCR is the solution.

How to OCR a PDF for Free with pdfmerger.io

Go to pdfmerger.io/ocr. The entire process takes four steps:

Step 1: Upload your scanned PDF

Click "Select PDF" or drag your file into the upload area. The file stays on your device — it is not transmitted to any server. The tool loads the document directly into your browser's memory.

Step 2: Wait for the OCR engine to process

The OCR engine renders each page at high resolution and analyzes it for text. You'll see progress as each page is processed. A typical 10-page document takes 30–60 seconds depending on your device's speed. Longer documents take proportionally longer since each page is processed individually.

Step 3: Preview the extracted text

Once processing completes, the extracted text is displayed so you can review it. Skim through to make sure the recognition looks correct — especially numbers, proper nouns, and any text near images or borders where accuracy can dip.

Step 4: Download the result

Click "Download .txt" to save the extracted text as a plain text file. From there you can paste it into Word, Google Docs, an email, a spreadsheet — wherever you need it.

Tips for Getting the Best OCR Accuracy

OCR accuracy depends heavily on the quality of the source material. Here's what makes the difference between clean extraction and garbled output:

Scan at 300 DPI or higher. If you're scanning the document yourself, use at least 300 DPI. Lower resolutions produce blurry characters that confuse the OCR engine. 300 DPI is the sweet spot — sharp enough for accurate recognition without creating enormous file sizes.
Use black text on a white background. High contrast between text and background dramatically improves accuracy. Colored paper, watermarks, and background patterns all introduce noise that can cause misreads.
Keep pages straight. Skewed or rotated pages reduce accuracy because the OCR engine expects horizontal text lines. If your scan is crooked, use the Rotate PDF tool to straighten it before running OCR.
Avoid photographs of screens. Taking a phone photo of a computer screen introduces moiré patterns and glare that confuse OCR. If the document exists digitally, export or print-to-PDF instead of photographing it.
Clean originals matter. Coffee stains, fold marks, staple shadows, and handwritten annotations over printed text all reduce accuracy in the affected areas. The cleaner the original, the better the output.

Why Privacy Matters for OCR

Most online OCR tools upload your PDF to a remote server, process it there, and send the text back. This means a copy of your document — potentially a contract, medical record, tax return, or legal filing — sits on someone else's server, even if temporarily.

pdfmerger.io takes a fundamentally different approach. The OCR engine is Tesseract.js, a WebAssembly port of the widely trusted open-source Tesseract OCR library. It runs entirely inside your browser using your device's processor and memory. Your PDF never leaves your computer. There is no upload, no server processing, and no copy stored anywhere. When you close the tab, everything is gone.

This is especially important for documents like scanned IDs, financial statements, signed contracts, and medical records — anything you wouldn't want sitting on a third-party server.

PDF OCR tutorial

Common Uses for PDF OCR

Scanned contracts and legal documents — Extract text to quote specific clauses, search for terms, or paste sections into emails.
Receipts and invoices — Pull out amounts, dates, and vendor names for expense reports or bookkeeping without manual retyping.
Old or archived documents — Digitize text from scanned books, academic papers, or historical records that only exist as image PDFs.
Immigration and government forms — Extract information from scanned official documents you need to reference or resubmit.
Photographed whiteboards or notes — Convert photos of handwritten or printed meeting notes into editable text (works best with printed text).

Continue working with PDF OCR — Photo via Pexels

How pdfmerger.io Compares to Other OCR Tools

There are several ways to OCR a PDF. Here's how the main options compare:

Adobe Acrobat Pro

Adobe offers excellent OCR with high accuracy and the ability to create searchable PDF layers. The downside: it requires a paid subscription ($22.99/month or more) and a desktop application. Overkill if you just need to extract text from a few pages occasionally.

Google Drive

Upload a scanned PDF to Google Drive and open it with Google Docs — Google will attempt OCR automatically. It's free and fairly accurate, but you're uploading your document to Google's servers. The formatting often comes out messy, and it struggles with multi-column layouts.

Other online OCR websites

Sites like OnlineOCR.net and OCR.space offer free tiers but typically impose file size limits, page limits, or daily usage caps. All of them upload your file to their servers. Many inject ads or push you toward a paid plan after a few uses.

pdfmerger.io

Free with no limits, no sign up, and no upload. Accuracy is comparable to other Tesseract-based solutions — excellent for clean printed text, decent for lower-quality scans. The key advantage is privacy: everything happens in your browser, which no other free online tool offers.

Limitations to Be Aware Of

Browser-based OCR is powerful, but it has practical limits. Processing happens on your device, so older or low-powered devices will be slower — a 50-page scanned document might take several minutes on a budget laptop. Very large PDFs (hundreds of pages) may push the browser's memory limits.

The output is plain text, not a searchable PDF with an invisible text layer. If you specifically need a searchable PDF (where the original scan is preserved with a hidden text layer on top), desktop tools like Adobe Acrobat are the right choice. pdfmerger.io's OCR tool is designed for extracting text — getting the words out of the image so you can use them elsewhere.

Handwritten text is mostly unsupported. OCR engines are optimized for printed typefaces. If your document is handwritten, expect poor results regardless of which tool you use — this is a fundamental limitation of current OCR technology for general-purpose tools.

Frequently Asked Questions

Is OCR the same as converting PDF to text?

Not exactly. If a PDF already contains real text (created digitally in Word, for example), you can extract that text directly without OCR. OCR is specifically needed when the PDF contains images of text — scanned pages or photographs — where no actual text data exists in the file.

What languages does the OCR tool support?

The tool uses Tesseract.js, which supports over 100 languages including English, Spanish, French, German, Chinese, Japanese, Arabic, and Hindi. It works best with Latin-script languages but handles other scripts well for clearly printed text.

Can I OCR a multi-page PDF?

Yes. The tool processes every page in the document sequentially. Each page is rendered and analyzed independently. The extracted text from all pages is combined into a single output file.

Why is the OCR output not 100% accurate?

OCR accuracy depends on scan quality, font clarity, and page layout. No OCR engine achieves 100% accuracy on every document. For clean, high-resolution scans of standard printed text, expect 95–99% accuracy. For degraded scans, unusual fonts, or complex layouts, accuracy will be lower. Always review the output before relying on it.

Bottom line

If you have a scanned PDF and need the text out of it, go to pdfmerger.io/ocr. It's free, it's private, and it takes about a minute. No account, no upload, no catches.

Klaar om het te proberen?

Gratis, geen aanmelding nodig, werkt volledig in uw browser.

Open PDF OCR-tool