Details
-
Task
-
Resolution: Canceled
-
Major
-
None
-
None
-
None
-
NIF
-
Issues closed as MONARCH has transitioned from UCSD services
Description
One of the things we'll need to precompute and have available for bigger analysis are genes that overlap. Or, more generally, sequence features that overlap in genomic space. This ticket does not take into account transcription...so it is not to include any expression analysis, only static genomic analysis.
This could start out by using either GFF3 files as input, or an even easier start might be with the ENSEMBL toolkit which has built-in functions for looking for overlapping genes. The goal here is to produce a list of overlapping feature pairs in genomic space. Precomputing this information will allow us to use it more quickly at analysis time. The features might include:
(pretty easy)
genes (overlapping on opposite strands, genes hidden in introns, genes that share exons)
miRNAs and their targets
part_of operons
(somewhat harder)
regulatory regions (enhancers, silencers, shared TF binding sites)
(much harder)
known network groups (barbara wold's ENCODE data?)
shared histone markers (this might be better in another ticket)
early/late replication groups
eventually, we will need to make a pipeline do the more-general task of overlapping any genomic variant with these categories of genomic features...that way we can ask questions like, "what are all of the genomic features that are affected by variation Q", or "are there any enhancers disrupted by variation X, and what genes might that affect", or "what pathways would be disrupted by variation Y that affects a specific TF's DNA-binding domain?"