top of page

Automatic PDF Digitization

400% Return on Investment through AI | A Case Study

FIXTEST is a medium-sized company that specializes in the manufacture of contact pins. These essential electrical contact parts are primarily used in connectors and are crucial for connecting electronic components in many industries such as automotive, aerospace, and medical technology.

Precision-engineered contact pins by FIXTEST used in connectors for automotive, aerospace, and medical industries, showcasing high-quality manufacturing.

Due to the wide applicability and the variety of requirements that different industries impose, FIXTEST has developed an extensive portfolio. The company offers thousands of contact pin designs, all documented in a detailed product catalog.

Motivation

FIXTEST faced the challenge of modernizing its extensive product database. The previous use of unstructured PDF files as the sole source of technical data for their products proved to be increasingly inefficient and inadequate.

To ensure a better overview of their product offerings and to provide customers with technical metadata along with CAD data automatically when needed, it was necessary to capture this data in a Product Data Management (PDM) system. This step was an investment for FIXTEST to respond faster to market requirements while simultaneously improving customer service. The company had already begun manually digitizing the product PDFs, with the effort estimated at one person-year.

Unstructured PDFs: A Common Scenario

Specifically, this meant that an employee manually opened the PDFs, read the values contained therein, and then entered them into FIXTEST's Product Data Management System (PDM). This task was not only tedious but also required expertise, as the PDFs contained technical data that laypeople might not necessarily understand. The information in the PDFs was historically grown and varied from document to document, leading to the typical problems that so often cause classic automations to fail:

  • Incomplete or additional information

  • Inconsistent and unclear table structures

  • No uniform naming for the same content; use of synonyms or similar terms

  • Different units of measurement and occasional typos

Screenshot of an unstructured PDF from FIXTEST displaying detailed technical data and diagrams of contact pins before AI-driven digitalization.

This situation is typical for SMEs, which often have rich but poorly structured datasets. With conventional software, it is extremely difficult to automate such processes. One would have to define precise rules on where and how specific data is to be found in the unstructured PDFs—a challenge that is practically unworkable. Therefore, this task had to be performed manually by a qualified employee, which—as mentioned—would have cost about a full person-year.

Our Solution: AI for Digitization

To automate PDF digitization, we developed a tailor-made AI solution based on OpenAI's GPT technology. We designed a customized prompt in an iterative approach that is capable of correctly transforming the diverse PDF documents into a structured format. Our AI not only recognizes various technical terms and synonyms but also understands the frequent inconsistencies in the presentation of technical features.

Another essential part of our solution was the design of a specific output format for the data. This format defined the required features, optional fields, preferred units and ranges, and conditional relationships between features. Additionally, we implemented a validation layer that checked the outputs of the GPT model to ensure compliance with the defined data format.

Process and Components of the Solution

Workflow diagram showing the AI-driven process of extracting and structuring data from unstructured PDFs into a Product Data Management system, highlighting steps from PDF extraction to data validation.

Our system consists of several main components, as shown in the diagram above:

  • Text extraction: First, the entire text is extracted from the PDFs using traditional software. This text is unstructured and error-prone.

  • AI-driven data extraction: We then combine the extracted text with carefully designed prompts for the AI. These instructions precisely explain which information is of interest and how to extract it from the raw PDF text in the desired format.

  • Validation of the AI outputs: The outputs of the AI model are checked to ensure they conform to the specified data structure.

  • Data import into PDM: After successful validation, the structured data can be directly imported into FIXTEST's Product Data Management System.

  • Comparison and correction: The accuracy of the extracted data is validated by comparing it with data already manually entered in the PDM. We check whether the AI data matches the data entered by humans.

Results

  • Efficiency and time savings: Through AI-driven digitization, FIXTEST saved an entire person-year. Our solution transferred data quickly and precisely into the Product Data Management System, significantly improving overall efficiency.

  • Cost savings and ROI: The costs of our AI solution were significantly lower than a year's salary for manual data entry, resulting in a savings of about 400%. This confirms the high return on investment and the effectiveness of this technological investment.

  • Quality improvement: The data quality with AI was at least as high as with manual processing and often better, as errors such as typos were reduced. This improved the usability and accessibility of product data for customers.

  • Employee satisfaction: By replacing repetitive tasks with AI, FIXTEST was able to increase job satisfaction and thus contribute to employee retention, which is particularly valuable in a time of skilled labor shortages.

Overall, the AI-based digitization at FIXTEST has led to significant improvements in efficiency, cost savings, data quality, and employee satisfaction. This reaffirms the potential of AI technologies in medium-sized enterprises.

Scattered array of gold-plated contact pins produced by FIXTEST, essential for electrical connections in connectors across various high-tech industries.

Are you also ready for AI?

If you are also interested in optimizing your processes and achieving similar results in your company, then do not wait any longer.

bottom of page