Details
-
Task
-
Resolution: Canceled
-
Major
-
None
-
None
-
None
-
NIF
-
Issues closed as MONARCH has transitioned from UCSD services
Description
Until the time that we have owlsim fully integrated as services wrapping the entire NIF db, we need to do some periodic data dumps into our Jenkins host in order to regularly run the owlsim precompute pipeline.
These queries will use the NIF REST services to pull data matching a given set of guidelines. Overall, the jobs are:
1. fetching data directly from the NIF genotype/variant and phenotype tables (as specified by me), producing a series of 2-column files. Specific types/resources will be made into sub-task tickets.
2. either (a) filtering set (1) above by using our "filters", or (b) doing a fresh query for each "filter", embedding the filtering at query time. of special interest here is how to build the filtering mechanism. i would like these to be available via services too, perhaps even part of our monarch-app, but not sure about implementation here. chris, we should discuss strategies soon for filter implementation. Kent, an example of these filters is, "get me all genotype ids where there is a variation only one gene", or "get me all the genotype ids where the variation(s) are generated with treatments (such as RNAi/Morpholino). The filter types will also be separate sub-tasks
I would like the code to be generic enough that we can give it a resource id, data types, and filter(s) as arguments, and then the code will do it's business. I think the NIF-specific resource ids should probably live in an external configuration file that is not committed within the public code, considering we will need to keep an API key in there too.
NIF has nearly finished the requirements for the APIkey infrastructure necessary for us to use services to pull data without limits. in the mean time, we can write the methods with the 1000-record cap that currently exists, knowing that it will go away soon. Besides, I suppose that we might need to have a fallback mechanism built in anyway, to iterate through at 1000-record chunks at a time, in case the APIkey gets corrupted on one end or the other.
I will attach the SQL that I currently use to produce the tables to the specific tickets.