Uploaded image for project: 'NIF'
  1. NIF
  2. NIF-11655

species-centric phenotype distribution via orthology as a service

    XMLWordPrintable

Details

    • NIF

    Description

      We want to be able to analyze (graph, or other kinds of operations) sets of genes/proteins, to obtain a list of phenotypes attributed to them (either directly or via genotypes/alleles that are annotated). Additionally, we want to operate on phenotypes annotated to orthologs of our query set.

      This is functionality that should be :
      1. accessible via our monarch-app (text box or upload allows the user to enter a bunch of ids, and then we can perform the analysis function).
      2. perhaps precomputed; performed regularly, as with jenkins, or precomputed in a disco view.
      3. results should be available as either a list (which the user could download) and/or as a columnar graph. the graphs should be similar to what was on the ASHG poster, showing the proportion of orthologs to our query species set in each target species, as well as the proportion of those with phenotypes. Eventually, we could get creative with this operation, where we might be able to interactively facet the data that is included in the graphs.
      4. by default, this will include all genes/proteins in our query species, and all of the "major" target species. the species currently permitted would be: human, rat, mouse, fish, fly, worm.
      5. this ought to use the NIF REST services, but this will require boundless fetching. we can implement this right away using SQL queries directly to the disco_crawler database, but should look toward using the NIF services directly. We need to decide what will be best performance-wise. Shall we completely precompute the species-centric phenotype orthology mapping, and then just query that? Or should this be done on-the-fly, with the joins happening on the Monarch server. The latter would be preferable if performance can be guaranteed.

      To perform the data fetching:
      1. given ids (or *) for a single organism (query organism), map these to uniprot (those are the ids used in PANTHER for ortholog mapping), or to gene ids in the relevant MOD.
      2. get orthologs and their annotated phenotypes for all desired target species. (either direct gene annotation, or via genotypes or alleles).
      use left outer join so that all ortholog gene/protein ids are obtained as well. (those with 0 phenotypes will have null phenotypes.)
      3. get query species phenotypes in the same format.

      The data should be marked with the species to which it belongs, and flagged as either belonging to QUERY or NON-QUERY for downstream analysis.

      Attachments

        Activity

          People

            cmungall Chris Mungall
            nlw Nicole Washington
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: