                INSTALL instructions for Calanguize
          (last updated 11/30/2020 - mm/dd/yyyy format)

AUTHORS
-=-=-=-

Thieres Tayroni Martins da Silva (thierestayroni@gmail.com)
Francisco Pereira Lobo (franciscolobo@gmail.com)


First things first: dependencies
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

A - Third-party software
-=-=-=-=-=-=-=-=-=-=-=-=

You need to install the following third-party software to use Calanguize:

- Busco  - https://busco.ezlab.org/

- Interproscan - https://www.ebi.ac.uk/interpro/download.html

B - Busco Datasets

You need to dowload the busco dataset according to the group being studied. Busco datasets can be found in: https://busco.ezlab.org/

C - Assembly summary

You need a summary file of the RefSeq or Genbank
refseq - ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/assembly_summary_refseq.txt
genbank - ftp://ftp.ncbi.nlm.nih.gov/genomes/genbank/assembly_summary_genbank.txt

D - Perl modules
-=-=-=-=-=-=-=-=

You also need to have Perl, BioPerl, and some of its modules installed. You can
check if you have them installed in your machine with:

> perl -e 1 -M<module> 
It will return an error message if it isn't installed.

Use the commands below to check if they are installed:

perl -e 1 -MBio::SeqIO
perl -e 1 -MCwd
perl -e 1 -MData::Dumper
perl -e 1 -MFile::Spec::Functions
perl -e 1 -MFile::Basename
perl -e 1 -MGetopt::Long
perl -e 1 -MParallel:ForkManager
You can install these modules through the CPAN, or manually download
(http://search.cpan.org/) and compile them. To use CPAN, you can do by 
writing:

> perl -MCPAN -e 'install "<module>"'

Use the commands below to install with CPAN:

perl -MCPAN -e 'install "Bio::SeqIO"'
perl -MCPAN -e 'install "Cwd"'
perl -MCPAN -e 'install "Data::Dumper"'
perl -MCPAN -e 'install "File::Spec::Functions"'
perl -MCPAN -e 'install "File::Basename"'
perl -MCPAN -e 'install "Getopt::Long"'
perl -MCPAN -e 'install "Parallel:ForkManager"'

To install manually, search for the most recent version of these modules and,
for each, download and type the following (should work most of the time, except
for BioPerl):

> tar -zxvf <module.tar.gz>
> perl Makefile.PL
> make
> make test
> make install


Setting Calanguize environment
-=-=-=-=-=-=-=-=-=-=-=-=-=

Editing 'calanguize_config'
-=-=-=-=-=-=-=-=-=-=-

phyton=		#path to python; same as you used to compile Busco
busco =		#path to run_BUSCO.py 
interpro = 	#path to interproscan.sh

D - Assembly Summary Refseq

https://ftp.ncbi.nlm.nih.gov/genomes/refseq/assembly_summary_refseq.txt
