Data Entry and Management
Data Entry and Management
Objective: To ensure accurate and consistent data entry and management within GeneNetwork.
Scope: This SOP covers the procedures for entering, updating, and managing data in GeneNetwork.
Standard Operating Protocol (Data Entry in GeneNetwork)
Last updated: 1/13/21 by A. Centeno
1. Introduction
Before entering new data into GeneNetwork (GN), we need to determine if the data belongs to an existing group or if a new group needs to be created. Additionally, we must check if annotations are available or if new ones need to be created.
2. Steps for Data Entry
- Determine if the data belongs to an existing group or if a new group needs to be created.
- Check if annotations are available or if new ones need to be created.
- Ensure the data belongs to an existing study by referencing the
ProbeFreeze
table. - If all conditions are met, proceed to introduce the new data into the
ProbeSetFreeze
table, linking it to an existing study (ProbeFreezeID
).
3. Terminal Commands
To begin, open a terminal and type the following commands:
ssh user@ip_address -i /Users/username/key
Enter the passphrase for the key when prompted.
Next, log into MariaDB:
mysql -u gn_db_user -p
Enter the password when prompted.
4. Database Operations
Once logged into MariaDB, select the database db_webqtl
:
MariaDB [(none)]> use db_webqtl;
Query the ProbeFreeze
table to check for existing studies:
MariaDB [db_webqtl]> select * from ProbeFreeze where Id=423;
If the new dataset belongs to an existing study (e.g., study 423), create a new record in the ProbeSetFreeze
table:
MariaDB [db_webqtl]> select max(Id) from ProbeSetFreeze;
Insert the new record:
INSERT INTO ProbeSetFreeze VALUES(1027,423,24,"UTHSC_BXD_All_Ages_Eye_RNAseq_TPM_Log","UTHSC_BXD_All_Ages_Eye_RNAseq_TPM_LogNov20","UTHSC BXD All Ages Eye RNA-Seq (Nov20) TPM Log2","UTHSC BXD All Ages Eye RNA-Seq (Nov20) TPM Log2",CURDATE(),3,1,1,"williamslab,labwilliams","log2");
5. Data Preparation
Ensure the top row of the data contains the correct strain or case names (e.g., C57BL/6J
). Compare the number of ProbeSetIDs
against the annotation file from the ProbeFreeze
and ProbeSet
tables.
If the data contains only raw values, perform the classic zScore+8
normalization and calculate the standard error (SE) values before entering the data.
6. Moving Data to the Server
Move the data to the server lily.uthsc.edu
under the directory /home/acenteno/GN-Data
. Create a new folder for the dataset (e.g., GN1027_EyeRNA-Seq-Nov20
).
Three important scripts are needed to enter the data:
readProbeSetSE_TUX01_v9_Mouse_ONLY.py
(for entering standard error values)readProbeSetMean_TUX01_v9_Mouse_ONLY.py
(for entering normalized average data)
QTL_Reaper_v7.py
(for running QTL reaper)
Run the scripts using the following commands:
python readProbeSetMean_TUX01_v9_Mouse_ONLY.py
python QTL_Reaper_v7.py 1027
7. Calculating the Mean
After running the scripts, calculate the mean using the following command in the MariaDB terminal:
update ProbeSetXRef set mean = (select AVG(value) from ProbeSetData where ProbeSetData.Id = ProbeSetXRef.DataId) where ProbeSetXRef.ProbeSetFreezeId = 1027;
8. Entering a New Record in the DBList Table
Add a new record to the DBList
table to allow switching between GN2 and GN1:
MariaDB [db_webqtl]> desc DBList;
Insert the new record:
MariaDB [db_webqtl]> select * from DBList where FreezeId=1027;
For GN1, run the following script to generate the selectDatasetMenu.js
file:
/gnshare/gn/web/webqtl/maintainance/genSelectDatasetJS.py
9. Conclusion
This SOP covers the basic steps for entering new data into GeneNetwork. In the next version of the SOP (Intermediate), we will discuss entering phenotypes, genotypes, and case attributes.
Note: This dataset does not require entering phenotypes, genotypes, or case attributes as it is part of the BXD group.
End of Document