PDF to Markdown API

Converts PDFs to clean markdown. Actually preserves code blocks and tables.

I built this because existing converters suck

→ They add fake strikethrough (~~) to tables

→ Code blocks lose their syntax highlighting

→ Images get extracted 5 times from the same PDF

→ You get raw text dumps instead of structured data

I was preparing PDFs for RAG pipelines and kept hitting these issues. So I fixed them.

Uses Pygments to figure out if code blocks are Python, JavaScript, SQL, etc. Most converters don't do this.

Extracts each unique image once, not 5 times. Saves them with proper metadata.

Automatically removes the fake strikethrough that pymupdf4llm adds to tables.

Get markdown plus metadata (page count, word count, code blocks found, etc) in one call.

📄

Click to upload a PDF

Up to 50MB, 500 pages

curl -X POST https://pdf-to-md-30rq.onrender.com/api/v1/convert \
  -F "file=@document.pdf" \
  -F "include_content=true"

No API key needed (for now). Free tier: 100 conversions/month.

RAG pipelines: Convert technical docs to markdown for LLM context

Research tools: Extract text from academic papers with code/equations preserved

Document processing: Batch convert PDFs to searchable text

Built by Eswar Sethu in Melbourne

This is free because I needed it for my own projects. If it helps you, great.