Intelligent Extraction
Extraction | Interpretation | Object Management | Resource Saving
elevait and PyMuPDF
elevait work with the intelligent digitization of data in the construction industry, as such they created a reliable solution to assist with their analysis of PDF output generated by technical CAD software. Here's how PyMuPDF helps them achieve their goals.
>>> Intelligent Extraction
- Imagine your mapping contains all the data that a human eye can scan, interpret and conceptualize from.
- Imagine maps & diagrams with rich visualizations and multiple references and notes, beautifully presented.
- Imagine we want to extract certain elements and groups from what we see, or turn on and off layers of information.
- Imagine again that we have multiple PDF files which need to be considered together in order to understand the big picture.
Finally, imagine that this data is perhaps not so easily understood by the software running your required technology.
We need software that is able to directly read, interpret, and assign semantic meaning to objects in such PDF documents.
This is where elevait comes in with "intelligent extraction".
Extraction <<<
It is critical to ensure that all important data can be extracted from an input file. By nature, PDF files present data which is first and foremost meant to be human readable. Indeed files are mostly designed by humans and therefore are intended for the human eye. However, when it comes to the computer context, because of the "designed by human" aspect we can consider data to effectively be "unstructured" from the point of view of a "computer eye".
Unstructured -> Structured data
Using PyMuPDF, alongside bespoke AI, we are able to source structured data from unstructured content. As well as extracting text, PyMuPDF's extraction API also extracts vector image data (see:page.get_drawings()
). Next, by analysing text & vector graphics, this content can then be interpretated. PyMuPDF is an integral part of the pipeline as it pre-processes the PDF data for subsequent elevait AI algorithms.
>>> Interpretation
elevait employ semantic interpretation on the output data. This enables transformation of unstructured data into semantic data models.
By utilizing the information within these data models we are then able to make more intelligent and informed mapping solutions for use across a wide range of scenarios and systems - for example an elevait solution can be viewed within a web-app environment.
Additionally, map "key" or "legend" data, which often requires much human-interpretation to understand, can be isolated from an input file - this allows for improved semantic models for use in your toolchain.
Object Management <<<
Project data can be enriched so that the contained geometries contain semantic meaning. This data can then be exported and used in other systems such as CAD, GIS or BIM
For example, through this kind of object analysis, multiple plans can be stitched together to evolve a connected plan covering a complete construction area.
As such, the solution can take multiple input's from multiple PDF file sources, interpret them and provide critical relationships between them (this is important when stiching plans together). Results can then be accurately placed on top of an Open Street Map view to get an idea of the "big picture".
Resource Saving <<<
In summary - if PyMuPDF understand the "trees", then elevait creates the "wood" ( or perhaps even a forest! ). Together this allows for the best of both micro & macro analysis.
Construction costs can be more accurately made as the solution provides time savings in the realization of construction projects.
This is just one example of how PyMuPDF can be utilized within your software to help create more intelligent solutions to give better results and improve efficiency.
For more on the solution with elevait see: https://www.elevait.de/en/blog/extraction-pdf-plan-f6768.