Text mining

  • extraction of process-environment-organism associations buried in the scientific literature, free text descriptions of biological database records, and in dedicated community web pages
  • development of a named entity recognition module to identify environmental process mentions in text
  • association strength based on comention statistics

Knowledge gathering and association extraction

  • collection of process-environment-organism evidence from public data record metadata and computational analysis record annotations
  • confidence score assignment to each annotation based on its evidence source

Homology-based annotation transfer

  • prediction (based on sequence homology) of related processes and environments for novel sequences and/or sequences with insufficient metadata
  • sequence-based searches against the PREGO platform facilitation
  • in-house sequence upload (along with their metadata) in the PREGO platform

Association unification

  • unification of the calculated process-environment-organism associations
  • design and implementation of an overall confidence score for the PREGO associations
  • management and storing the unified associations in a database
  • periodic update of the PREGO associations and liaison to emerging data infrastructures

Association presentation

  • PREGO associations: made accessible to researchers via a web platform
  • facilitation of text and sequence searches against the PREGO process-environment-organism associations
  • confidence and supporting evidence display for retrieved associations
  • enrichment analysis and network-based assocation exploration
  • programmatic access and bulk download options made available