Introduction
Labguru’s compound registration system includes powerful automated features to maintain high-quality chemical data and understand the relationships between different forms of the same molecule. This guide explains three key features:
- Structure Validation - Automated checking to prevent invalid structures from entering the database
- Structure Standardization - Automatic formatting of structures according to industry standards
- Parent Structure Calculation - Intelligent grouping of related compound forms under a single and unique parent structure
Labguru’s structure validation, standardization, and parent structure features work together to:
- Ensure high-quality chemical data by blocking invalid structures
- Create consistency through automated standardization
- Reveal relationships by automatically grouping related compound forms
- Prevent duplicate work by showing all existing forms of a molecule
- Enable better analysis through improved searchability and data aggregation
These features work automatically in the background, requiring minimal effort while delivering powerful chemical intelligence and data quality improvements.
Understanding Labguru’s Compound Hierarchy
Labguru organizes compounds in a three-level hierarchy that reflects real-world chemical relationships:
-
Parent Structure (Level 1)
A Parent Structure is the standardized “core” of a molecule used to group all related forms (e.g., free base, salts, solvates, isotopic variants) under one umbrella.
It is derived by removing salts/solvents and isotopes and attempting neutralization where possible. -
Compound Version (Level 2)
A Compound Version is the compound item record registered in the compounds collection in your inventory, often a specific form (salt, stereochemical variant, charged form, etc.).
Each version (compound inventory item) is automatically linked to one Parent Structure (no manual assignment). -
Stock / Batch (Level 3)
Stock entries represent the physical sample for a specific compound inventory item. Stocks may vary in concentration, weight, volume, or storage location.
The Parent Structure calculation and linkage to the relevant compound versions is automatic and is included in every compound registration flow - manually via the inventory compounds collection, via Excel import, SDF import or API call.
The calculation process flow consists of three steps:
A. Structure Validation: Ensuring Data Quality
What Is Structure Validation?
The validation step is designed to identify errors and potential issues in chemical structures before they are added to Labguru.
During the compound registration process, the system automatically checks each chemical structure for a range of possible issues:
Critical Errors:
- Unreadable or corrupted structure files
- Unknown or illegal chemical elements
- Empty or missing structures
- Chemical reactions submitted as individual compounds
- Molecules with impossible chemical bonding or valence states
Serious Errors:
- Significant stereochemistry mismatches between different representations
- Multiple atoms with exactly the same coordinates
- Structures with 3D coordinates but no valid 2D representation
What Happens If Validation Fails?
When a structural error is detected, the system takes the following actions:
- Registration is blocked - The compound will not be created in Labguru
- Error message displayed - A clear explanation of the issue is shown
- Data preserved - All entered information remains so the issue can be corrected
- Edit and resubmit - The structure can be corrected and resubmitted without losing other data
B. Structure Standardization: Creating Consistency
What Is Structure Standardization?
Structure standardization is an automated process that converts chemical structures into a consistent, standardized format based on FDA and IUPAC guidelines. This ensures that structures are represented uniformly across the database, making them easier to search, compare, and analyze.
Important: The standardization process creates a calculated version of the structure – it does not replace or modify the original structure drawing.
How Does Standardization Work?
The standardization process follows these steps:
-
Exclusion Check
First, the system checks if the molecule should be excluded from standardization. Certain types of compounds (like organometallics or molecules with many boron atoms) are excluded because standard rules don’t apply well to them. -
Stereochemistry Resolution
Resolves ambiguous drawing styles by converting “wiggly” bonds to proper stereo representations and standardizing cis/trans bond notation. -
Clean-up
Removes unnecessary structural information, generates a proper aromatic representation (Kekulé form), and removes explicit hydrogen atoms unless they’re chemically necessary (like isotopic labels or chiral centers). -
Normalization
Converts functional groups into their standard chemical representations, such as:- Fixing hypervalent nitro groups
- Correcting incorrect amide tautomers
- Converting ionic metals to their proper ionic forms
- Ensuring correct charges on quaternary nitrogens and trivalent oxygens/sulfurs
-
Neutralization
Attempts to neutralize the molecule by adding, removing, or moving hydrogen atoms where chemically appropriate. -
Geometry Correction
Straightens triple bonds and allenes for proper visual and computational representation.
Where Can the Standardized Structure Be Found?
The standardized structure is saved in SMILES notation in a read-only field called “Standardized SMILES” that appears in:
- Compound show pages (view/info tab)
- Compound index tables (if enabled by an admin)
- Excel import/export templates
Note: The Standardized SMILES field is automatically calculated and cannot be edited manually. To change it, the original structure must be modified.
C. Parent Structure Calculation: Understanding Compound Families
What Is Parent Structure Calculation?
Parent structure calculation is an intelligent background process that automatically identifies the “core” molecule from each compound registered and creates (or links to) a parent structure record. This happens seamlessly during compound registration without any user action required.
How Does It Work?
After a compound passes validation and standardization, Labguru calculates the parent structure through three key steps:
-
Stripping Salts and Solvents
The system identifies and removes salts and solvents by comparing molecular components against a predefined list based on the USAN Council standards. -
Isotope Removal
Specific isotopic labels (like deuterium or carbon-13) are removed to create a standardized form with natural isotopic distribution. -
Neutralization
After removing counterions, the system attempts to neutralize the remaining molecule. For example, if a sodium cation was removed, a hydrogen is added to neutralize a negatively charged carboxylate group. However, permanent charges (like quaternary nitrogens) remain charged.
Once calculated, Labguru checks if an identical parent structure already exists. If it does, the compound is linked to the existing parent. If not, a new parent structure record is created.
What Information Does a Parent Structure Contain?
Each parent structure record includes:
- Parent Structure ID – A unique identifier for the parent
- Name – Initially set to the name of the first compound that created it (this field is editable)
- Structure – The calculated parent structure in SMILES format (read-only)
- Structure Image – Visual representation of the parent structure
- Chemical Properties – Auto-calculated properties like molecular weight, LogP, polar surface area, etc.
- Linked Compounds – A searchable table showing all compound versions that share this parent.
How to Access Parent Structure Information
From a Compound Record:
In any compound show page, a “Parent Structure ID” field displays the linked parent’s ID. This ID is clickable and will navigate directly to the parent structure page.
On the Parent Structure Page:
The parent structure page has two tabs:
- Info Tab – Displays the parent structure information, properties, and metadata
- Compounds Tab – Shows a searchable, filterable table of all compound versions linked to this parent. Any compound can be clicked to view its full record.
FAQ - Understanding System Behavior
“The compound saved, but Parent Structure ID is empty”
This is normal behavior. Parent structure calculation and standardization run asynchronously in the background after a compound is successfully created. This design ensures compound registration isn’t delayed.
What to do:
- Wait a few moments and refresh the compound page
- If the field remains empty after several minutes, the background calculation may have encountered an issue (this doesn’t affect the compound’s validity)
“Attempts to set Parent Structure ID via Excel didn’t work”
Both “Parent Structure ID” and “Standardized SMILES” are automatically calculated fields. The system ignores any values entered in these columns during Excel import, just like chemical properties.
What to do:
Simply leave these fields blank in import templates. Labguru will automatically calculate and populate them.
“Can the parent structure linkage be manually changed?”
No. Parent structure linkage is fully automated and based on the chemical structure. This ensures consistency and prevents arbitrary groupings. The parent structure’s name can be edited, but not its structure or which compounds link to it.
“What if structure validation or standardization fails?”
If structure validation fails (critical error), compound registration is blocked, and an error message is displayed. If standardization or parent calculation fails, the compound is still created, but the Standardized SMILES and Parent Structure ID fields will remain empty.
This design prioritizes getting compounds registered while still providing enhanced features when possible.
Troubleshooting: Common Validation Issues
Empty or Missing Structure
What it means: The structure field is empty or wasn’t provided in a supported format.
How to fix: Add a valid structure using the Ketcher editor or provide structure notation (SMILES or InChI where supported).
Illegal or Unknown Elements
What it means: The structure contains unknown elements or cannot be processed (e.g., aromatic bond processing failure).
How to fix: Redraw the structure carefully. Ensure all atoms are valid chemical elements and bonds are correctly defined.
Chemical Reaction Instead of Compound
What it means: The system detected reaction arrows or multiple reactant/product components suggesting a chemical reaction rather than a single compound.
How to fix: Register each reactant or product as separate compounds. Do not include reaction arrows.
Stereochemistry Mismatch
What it means: There are significant inconsistencies in how stereochemistry is represented (e.g., conflicting wedge bonds or incorrect 3D coordinates).
How to fix: Review and correct stereochemistry. Ensure wedge bonds are used correctly and consistently. If copying from another source, try redrawing the structure.
Invalid Valence or Bonding
What it means: The structure contains atoms with impossible valence states or chemically unrealistic bonding.
How to fix: Check that all atoms have chemically valid bonding. For example, ensure carbon has four bonds, oxygen has two, nitrogen has three, etc. Review radical species and charges.
For more details, please contact us at support@labguru.com
Comments
0 comments
Please sign in to leave a comment.