PDF to Markdown API

Converts PDFs to clean markdown. Actually preserves code blocks and tables.

I built this because existing converters suck

→ They add fake strikethrough (~~) to tables

→ Code blocks lose their syntax highlighting

→ Images get extracted 5 times from the same PDF

→ You get raw text dumps instead of structured data

I was preparing PDFs for RAG pipelines and kept hitting these issues. So I fixed them.

What this does differently

Auto-detects code languages

Uses Pygments to figure out if code blocks are Python, JavaScript, SQL, etc. Most converters don't do this.

Deduplicates images

Extracts each unique image once, not 5 times. Saves them with proper metadata.

Fixes pymupdf4llm bugs

Automatically removes the fake strikethrough that pymupdf4llm adds to tables.

Returns structured JSON

Get markdown plus metadata (page count, word count, code blocks found, etc) in one call.

Upload a PDF and see it work

Use it in your code

curl -X POST https://pdf-to-md-30rq.onrender.com/api/v1/convert \
  -F "file=@document.pdf" \
  -F "include_content=true"

No API key needed (for now). Free tier: 100 conversions/month.

Full API documentation →

Who uses this

RAG pipelines: Convert technical docs to markdown for LLM context

Research tools: Extract text from academic papers with code/equations preserved

Document processing: Batch convert PDFs to searchable text

Built by Eswar Sethu in Melbourne

Questions? Email api@eswarsethu.dev

This is free because I needed it for my own projects. If it helps you, great.