Apache PDFBox

PDFBox
Developer(s)	Apache Software Foundation
Stable release
1.8.x:	1.8.17 / 15 September 2022; 2 years ago
2.0.x:	2.0.32 / 24 July 2024; 3 months ago
3.0.x:	3.0.3 / 8 August 2024; 3 months ago
Repository	PDFBox Repository (Mirror)
Written in	Java
Operating system	Cross-platform
Type	Portable Document Format (PDF)
License	Apache License 2.0
Website	pdfbox.apache.org

Apache PDFBox is an open source pure-Java library that can be used to create, render, print, split, merge, alter, verify and extract text and meta-data of PDF files.

Open Hub reports over 11,000 commits (since the start as an Apache project) by 18 contributors representing more than 140,000 lines of code. PDFBox has a well established, mature codebase maintained by an average size development team with increasing year-over-year commits. Using the COCOMO model, it took an estimated 46 person-years of effort.^[2]

Structure

Apache PDFBox has these components:

PDFBox: the main part
FontBox: handles font information
XmpBox: handles XMP metadata
Preflight (optional): checks PDF files for PDF/A-1b conformity.

History

PDFBox was started in 2002 in SourceForge by Ben Litchfield who wanted to be able to extract text of PDF files for Lucene.^[3] It became an Apache Incubator project in 2008, and an Apache top level project in 2009.^[4]

Preflight was originally named PaDaF and developed by Atos worldline, and donated to the project in 2011.^[5]

In February 2015, Apache PDFBox was named an Open Source Partner Organization of the PDF Association.^[6]

References

^ ^a ^b ^c "Apache PDFBox - Blog". pdfbox.apache.org. Apache Software Foundation. Retrieved 2024-10-30.
^ "The Apache PDFBox Open Source Project on Open Hub". openhub.net. 2017-03-18. Retrieved 2017-03-18.
^ Apache PDFBox and FontBox 1.0.0 released, The H Open, 16 February 2010
^ PDFBox Project Incubation Status
^ PaDaF Preflight Codebase Intellectual Property (IP) Clearance Status
^ Apache™ PDFBox™ named an Open Source Partner Organization of the PDF Association, February 3, 2015

External links

Apache PDFBox Project

[releases-1] "Apache PDFBox - Blog". pdfbox.apache.org. Apache Software Foundation. Retrieved 2024-10-30.

[2] "The Apache PDFBox Open Source Project on Open Hub". openhub.net. 2017-03-18. Retrieved 2017-03-18.

[3] Apache PDFBox and FontBox 1.0.0 released, The H Open, 16 February 2010

[4] PDFBox Project Incubation Status

[5] PaDaF Preflight Codebase Intellectual Property (IP) Clearance Status

[6] Apache™ PDFBox™ named an Open Source Partner Organization of the PDF Association, February 3, 2015

[1]

[2]

[3]

[4]

[5]

[6]

v t e The Apache Software Foundation
Top-level projects	Accumulo ActiveMQ Airavata Airflow Allura Ambari Ant Aries Arrow Apache HTTP Server APR Avro Axis Axis2 Beam Bloodhound Brooklyn Calcite Camel CarbonData Cassandra Cayenne CloudStack Cocoon Cordova CouchDB cTAKES CXF Derby Directory Drill Druid Empire-db Felix Flex Flink Flume FreeMarker Geronimo Groovy Guacamole Gump Hadoop HBase Helix Hive Iceberg Ignite Impala Jackrabbit James Jena JMeter Kafka Kudu Kylin Lucene Mahout Maven MINA mod_perl MyFaces Mynewt NiFi NetBeans Nutch NuttX OFBiz Oozie OpenEJB OpenJPA OpenNLP OрenOffice ORC PDFBox Parquet Phoenix POI Pig Pinot Pivot Qpid Roller RocketMQ Samza Shiro SINGA Sling Solr Spark Storm SpamAssassin Struts 1 Struts 2 Subversion Superset SystemDS Tapestry Thrift Tika TinkerPop Tomcat Trafodion Traffic Server UIMA Velocity Wicket Xalan Xerces XMLBeans Yetus ZooKeeper
Commons	BCEL BSF Daemon Jelly Logging
Incubator	Taverna
Other projects	Batik FOP Ivy Log4j
Attic	Apex AxKit Beehive iBATIS Click Continuum Deltacloud Etch Giraph Hama Harmony Jakarta Marmotta MXNet ODE River Shale Slide Sqoop Stanbol Tuscany Wave XML
Licenses	Apache License
Category

Structure

History

See also

References

External links