Welcome to depdf’s documentation!

depdf

An ultimate pdf file disintegration tool. DePDF is designed to extract tables and paragraphs into structured markup language [eg. html] from embedding pdf pages. You can also use it to convert page/pdf to html.

depdf.convert_pdf_to_html(pdf, **kwargs)
Parameters:
  • pdf – pdf file path
  • kwargs – config keyword arguments
Returns:

pdf html string

depdf.convert_page_to_html(pdf, pid, **kwargs)
Parameters:
  • pdf – pdf file path
  • pid – page number start from 1
  • kwargs – config keyword arguments
Returns:

page html string

depdf.extract_page_tables(pdf, pid, **kwargs)
Parameters:
  • pdf – pdf file path
  • pid – page number start from 1
  • kwargs – config keyword arguments
Returns:

page tables list

depdf.extract_page_paragraphs(pdf, pid, **kwargs)
Parameters:
  • pdf – pdf file path
  • pid – page number start from 1
  • kwargs – config keyword arguments
Returns:

page paragraphs list

Contents:

Indices and tables