Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

join to files does not work

Status
Not open for further replies.

psolar

Programmer
Apr 5, 2016
1
0
0
ES
Hi all,

I have these two .dat files (I only show the first 20 lines for both):

Code:
GO:0005509	PDCD6
GO:0004672	CDK1
GO:0005524	CDK1
GO:0005634	CDK1
GO:0005737	CDK1
GO:0006468	CDK1
GO:0005615	SERPINB6
GO:0006629	APOC2
GO:0006869	APOC2
GO:0008047	APOC2
GO:0042627	APOC2
GO:0043085	APOC2
GO:0001932	TADA2L
GO:0003677	TADA2L
GO:0005671	TADA2L
GO:0006357	TADA2L
GO:0007067	TADA2L
GO:0008270	TADA2L
GO:0016573	TADA2L
(...)

Code:
GO:0000001	mitochondrion inheritance
GO:0000002	mitochondrial genome maintenance
GO:0000003	reproduction
GO:0000005	ribosomal chaperone activity
GO:0000006	high affinity zinc uptake transmembrane transporter activity
GO:0000007	low-affinity zinc ion transmembrane transporter activity
GO:0000008	thioredoxin
GO:0000009	alpha-1,6-mannosyltransferase activity
GO:0000010	trans-hexaprenyltranstransferase activity
GO:0000011	vacuole inheritance
GO:0000012	single strand break repair
GO:0000014	single-stranded DNA specific endodeoxyribonuclease activity
GO:0000015	phosphopyruvate hydratase complex
GO:0000016	lactase activity
GO:0000017	alpha-glucoside transport
GO:0000018	regulation of DNA recombination
GO:0000019	regulation of mitotic recombination
GO:0000020	negative regulation of recombination within rDNA repeats
(...)

When I try to make a join for both files, I only get a few results (exactly 10). The complete code is:
Code:
ls *gene_association* | while read file;
do
echo;
echo @@@ Archivo: $file;
echo;

# New file "assoc_specie.txt"
IFS='_' read -r -a array <<< "$file"
SPECIE=${array[2]}

#Filtering comments (!comment...)
cat $file | grep -v '!' > assoc_$SPECIE.txt;
[b]gawk 'BEGIN{OFS="\t";FS="\t"}{print $5, $3}' assoc_$ESPECIE.txt > goTerms_$ESPECIE.dat;[/b]
join goTerms_$SPECIE.dat gene_ontology.dat > join.dat

echo
done;

I don't know what I am doing wrong, but it's obvious that join is not showing all the results.

Thanks in advance

PS: assoc_specie.txt file has this format (only showing first line):

Code:
UniProtKB	A0A024QZ42	PDCD6		GO:0005509	GO_REF:0000002	IEA	InterPro:IPR002048	F	HCG1985580, isoform CRA_c	A0A024QZ42_HUMAN|PDCD6|hCG_1985580	protein	taxon:9606	20160312	InterPro
(...)
 
The two files must be sorted on the join fields so it looks like your first .DAT file needs to be sorted

In order to understand recursion, you must first understand recursion.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top