P-OSRA
Engineering
CMPS 116, Software Design Project
The recognition of polymer images in literature and patents is key for automatic understanding of the wealth of polymer data already known. While tools such as OSRA (Optical Structure Recognition Application) exist to identify and interpret chemical structure images, these tools do not yet work for polymers. Our tool, Polymer OSRA (P-OSRA) extends OSRA’s capabilities by being able to recognize brackets and parentheses in chemical diagrams. To date, P-OSRA pre-scans the image; records and alters the image by removing brackets and parentheses from the diagram; calls OSRA and then collects and edits the resulting SMILES string from OpenBabel to reflect changes needed for describing a polymer image. P-OSRA then populates a datamodel to allow for subsequent querying of polymer substructures (such as repeat units).