Transcribe Geometry Model Data from a PDF report to an ASCII file Helicopter #2
completed by: Vladimir Kuznetsov
mentors: Gauravjeet Singh, Ishwerdas
We have scans (PDF) of a number of reports documenting early geometric models in the COMGEOM format (a now obsolete format, but the models are interesting nonetheless). These reports contain the actual geometry defining the model as pages and pages of numbers and letters. Unfortunately, the quality is sufficiently poor that optical character recognition (OCR) has a very high rate of error.
This task is to attempt the manual transcription of a portion of the Black Hawk Helicopter model described in the report ''Computer Description of Black Hawk Helicopter'' (see the References list below for the link that will let you download the PDF). One possible approach is to use Acrobat Reader or some other PDF reader select and copy the OCR text, paste that to a text file as a starting point, and then manually correct it. There may also be some patterns that will allow for semi-automated processing (for example, if 5 zeros in a row are commonly replaced with the character ''O'' instead of 0, a search and replace is in order.) However you wish to approach it is fine, but remember that the goal is not just the extraction of the OCR text but the production of an accurate transcription of the file. The OCR text can be used as a starting point but it will NOT be accurate.
The preferred format to provide the pages in is a comma-separated value ASCII text file, which is suitable for post-processing.
The eventual goal is to have a file that can be fed to BRL-CAD's comgeom-g importer to generate an accurate .g file. The description of this target is a couple hundred text pages (which will take much longer than a single GCI task if you're doing correctness checking!) so there will be multiple tasks for pieces of the file. For this task, pleas submit a csv file with the content of the tables on pages
Please discuss your progress with the developers.
Additional information on comgeom
- Source at: src/conv/comgeom