Extract Image from PDF
Extract Images from PDFs Using PyMuPDF in Python
I/p:-
PDF-
O/p - Images -
Why PyMuPDF?
PyMuPDF (fitz) is a powerful library for PDF processing. It allows easy access to various elements within a PDF, including text, images, and metadata. It’s particularly well-suited for tasks that involve reading and manipulating PDFs.
Prerequisites
pip install pymupdf
Code Steps :
1. Importing Required Libraries
import fitz # PyMuPDF
import os
2. Function to Extract Images from a PDF
def extract_images_from_pdf(pdf_path, output_folder):
# Open the PDF file
pdf_document = fitz.open(pdf_path)
filename = os.path.basename(pdf_path).split('.')[0]
# Create the output directory if it doesn't exist
if not os.path.exists(output_folder):
os.makedirs(output_folder)
# Loop through each page
for page_number in range(len(pdf_document)):
page = pdf_document.load_page(page_number)
images = page.get_images(full=True)
for image_index, image in enumerate(images):
# Extract the image reference
xref = image[0]
# Extract the image bytes
base_image = pdf_document.extract_image(xref)
image_bytes = base_image["image"]
image_ext = base_image["ext"] # image extension, e.g., png, jpeg
# Save the image
image_filename = f"{filename}_page_{page_number + 1}_image_{image_index + 1}.{image_ext}"
image_filepath = os.path.join(output_folder, image_filename)
with open(image_filepath, "wb") as image_file:
image_file.write(image_bytes)
print(f"Saved image {image_filename}")
print(f"Extraction complete. Images saved to {output_folder}")
3. Example Usage
pdf_folder = "/data/imageExtraction/PDFs" # Path to your folder containing PDFs
output_folder = "/data/imageExtraction/pdfImages" # Folder where images will be saved
# Loop through each PDF in the folder and extract images
for filename in os.listdir(pdf_folder):
pdf_path = os.path.join(pdf_folder, filename)
extract_images_from_pdf(pdf_path, output_folder)
Conclusion
Extracting images from PDFs using PyMuPDF is not only easy but also highly customizable. This script provides a basic framework that you can adapt for various needs, such as filtering images by size, resolution, or type.
Comments
Post a Comment