Basic Unix for Biologists EP2 is aiming to helps anyone who would like to learn basic unix programming. This introduction/tutorial dose not require installation, you can simply click you can simply use Rstudio Cloud on your browser.
เว็บเพจนี้สอน Unix Shell เบื้องต้น โดยผู้เรียนไม่ต้องดาวน์โหลดโปรแกรมลงบนคอมพิวเตอร์ส่วนตัว เพียงใช้ Rstudio Cloud บนเว็บบราวเชอร์
Open Binder and Launch Terminal
Step A: Open Rstudio cloud and Launch Terminal
Once you log in to Rstudio cloud, your web browser should bring up a similar window as the picture shown above. Click the button on the top right corner to create a new Rstudio project. Then, the next step is to click “Terminal” which should look like a picture below after you click on it.
Download example files (If you have done this for EP1, you can skip this part.)
/cloud/project$ svn export https://github.com/NatPombubpa/Binder_Intro_Unix/trunk/unix_intro
/cloud/project$ svn export https://github.com/NatPombubpa/Binder_Intro_Unix/trunk/data-shell
If everything work perfectly for you, you are ready for the tutorial.
Very useful commands
We will learn some useful commands that are used ofetn in Bioinformatics.
/cloud/project$ cd unix_intro/six_commands/
We’ll be working with gene_annotations.tsv
which contains information including gene_ID
, genome
, KO_ID
, and KO_annotation
(KO is Kegg Orthology - functional database).
Let’s checkout the file
/cloud/project/unix_intro/six_commands$ head gene_annotations.tsv
gene_ID genome KO_ID KO_annotation
1 CC9311 K02338 DPO3B; DNA polymerase III subunit beta [EC:2.7.7.7]
2 CC9311 NA NA
3 CC9311 K01952 purL; phosphoribosylformylglycinamidine synthase [EC:6.3.5.3]
4 CC9311 K00764 purF; amidophosphoribosyltransferase [EC:2.4.2.14]
5 CC9311 K02469 gyrA; DNA gyrase subunit A [EC:5.99.1.3]
6 CC9311 NA NA
7 CC9311 K18979 queG; epoxyqueuosine reductase [EC:1.17.99.6]
8 CC9311 NA NA
9 CC9311 NA NA
Let’s take a look at the first few lines
/cloud/project/unix_intro/six_commands$ head -n 3 gene_annotations.tsv
gene_ID genome KO_ID KO_annotation
1 CC9311 K02338 DPO3B; DNA polymerase III subunit beta [EC:2.7.7.7]
2 CC9311 NA NA
We can also count number of rows in the file
/cloud/project/unix_intro/six_commands$ wc -l gene_annotations.tsv
101 gene_annotations.tsv
cut command
using cut to extract column from tab delimted file
/cloud/project/unix_intro/six_commands$ cut -f 1 gene_annotations.tsv
cut and print out just few lines
/cloud/project/unix_intro/six_commands$ cut -f 1 gene_annotations.tsv | head
/cloud/project/unix_intro/six_commands$ cut -f 1,3 gene_annotations.tsv | head
/cloud/project/unix_intro/six_commands$ cut -f 1-3 gene_annotations.tsv | head
However, it we use other types of file, we might have to add a delimiter.
/cloud/project/unix_intro/six_commands$ cut -d "," -f 1-3 example_gene_annotations.csv | head
cut command practice
Create a new file that contian 2 columns including gene_ID and KO_annotation. Hint: >
is a redirector.
grep command
grep = global regular expression grep can be used to search through a text file and print out the match.
/cloud/project/unix_intro/six_commands$ grep re colors.txt
let’s imagine we’re looking for genes that are predicted to encode the enzyme epoxyqueuosine reductase. When we search for this on the KO website, we find two KO_IDs linked with it: K09765 and K18979. use grp to find these IDs
/cloud/project/unix_intro/six_commands$ grep K09765 gene_annotations.tsv
/cloud/project/unix_intro/six_commands$ grep K18979 gene_annotations.tsv
To report how mant lines match the pattern, we can add -c
flag
/cloud/project/unix_intro/six_commands$ grep -c K18979 gene_annotations.tsv
grep command practice
using grep
and cut
to print out just column 2 (genomes) that have K18979
annotation. Hint: |
is a redirector.
References