Welcome to depdf’s documentation!¶
depdf¶
An ultimate pdf file disintegration tool. DePDF is designed to extract tables and paragraphs into structured markup language [eg. html] from embedding pdf pages. You can also use it to convert page/pdf to html.
-
depdf.
convert_pdf_to_html
(pdf, **kwargs)¶ Parameters: - pdf – pdf file path
- kwargs – config keyword arguments
Returns: pdf html string
-
depdf.
convert_page_to_html
(pdf, pid, **kwargs)¶ Parameters: - pdf – pdf file path
- pid – page number start from 1
- kwargs – config keyword arguments
Returns: page html string
-
depdf.
extract_page_tables
(pdf, pid, **kwargs)¶ Parameters: - pdf – pdf file path
- pid – page number start from 1
- kwargs – config keyword arguments
Returns: page tables list
-
depdf.
extract_page_paragraphs
(pdf, pid, **kwargs)¶ Parameters: - pdf – pdf file path
- pid – page number start from 1
- kwargs – config keyword arguments
Returns: page paragraphs list