找回密码
 立即注册

QQ登录

只需一步,快速开始

搜索
yeec近年来原创帖合集 本站基础知识下载汇总 yeec网站学习币充值链接 学习中心正式上线

[转发]ForCon 1.0使用手册(ForCon 1.0 manual)

[复制链接]
郑振寰 发表于 2007-6-6 10:04 | 显示全部楼层 |阅读模式

来源:bioinformatics.psb.ugent.be

Installing ForCon

After downloading ForCon, you have a .zip file. Unzip it using WinZip or PKUNZIP in a temporary directory. Double-click the setup.exe file and follow the instructions.

General description

ForCon is a user-friendly software tool for the easy conversion of nucleic and amino acid sequence alignments into different formats.

At the moment, ForCon is able to convert ? in both ways, i.e. reading and writing - the
following formats (or formats used by the following software packages):

  • CLUSTAL
  • EMBL
  • FASTA
  • GCG/MSF
  • Hennig86
  • MEGA
  • NBRF/PIR
  • PAUP/Nexus
  • Parsimony Jackknifer
  • PHYLIP
  • TREECON

Software packages not included in the list are usually able to read one of the formats mentioned. For the publication of sequence alignments, a format with codon positions can be generated ("Pretty").

Sequential and interleaved formats are supported by ForCon. (see next paragraph)

File formats

The use of correct formats is extremely important: incorrect formats cannot be correctly interpreted by the program. For this reason a description and example of all the formats is presented below.

Overall, two major types of formats exist: interleaved and noninterleaved (sequential). In the interleaved format, sequences are written in the form of an alignment:

Usually the symbol for missing data is 'N' (nucleotides) or 'X' (proteins). For insertions/deletions ('gaps') the most commonly used symbol is a hyphen '-'.

Regarding the different formats:

    1) CLUSTAL

    The CLUSTAL program is a program for creating sequence alignments.
    The CLUSTAL format can be described as follows:

    - the word CLUSTAL should be on the first non-space line of the file
    - the alignment is displayed in blocks of a fixed length
    - each line in the block corresponds to one sequence
    - the line starts with the sequence name (of any length), followed by at least one space character
    - then the sequence itself is displayed (upper- or lowercase) ( '-' : gaps )
    (optional : residue number at the end)
    - in between blocks: line with conservation info ( ForCon only writes stars for now ; for more info: https://www-igbmc.u-strasbg.fr/BioInfo/ClustalX/#G )

Example :

2) EMBL

The EMBL database is the primary nucleotide database in Europe.
The format is described in detail at: https://www.ebi.ac.uk/ebi_docs/embl_db/usrman/structure_entry.html

Multiple sequence files also follow these rules. They are separated by the '//' that ends each entry.
Only the information used in multiple sequence alignments is used by ForCon.

Example ( as generated by ForCon; for input, any EMBL file is allowed ):


2) EMBL

The EMBL database is the primary nucleotide database in Europe.
The format is described in detail at: https://www.ebi.ac.uk/ebi_docs/embl_db/usrman/structure_entry.html

Multiple sequence files also follow these rules. They are separated by the '//' that ends each entry.
Only the information used in multiple sequence alignments is used by ForCon.

Example ( as generated by ForCon; for input, any EMBL file is allowed ):

3) FASTA

The FASTA program is used for database searches.
The format is described at : https://www.ncbi.nlm.nih.gov/BLAST/fasta.html

Example:

4) GCG/MSF

The Multiple Sequence File format by the Genetics Computer Group Wisconsin package is thoroughly described in their user manual. In brief:

- on the first line : file type identifier like '!!AA_MULTIPLE_ALIGNMENT 1.0',
'!!NA_MULTIPLE_ALIGNMENT 1.0' or 'PileUp'. ( optional )
- second line: optional title/description
- dividing line with obligatory 'MSF: sequence length', checksum value and two points '..'
- name/weight section with checksum
- separating line : //
- alignment : interleaved

Example ( as generated by ForCon )

5) Hennig86

The parsimony phylogeny program by Farris uses an unusual format: the different IUPAC nucleotide letter codes are replaced by a number code. ForCon uses the following standard translation :

      A
      to:
      0
      U,T
      to:
      1
      G
      to:
      2
      C
      to:
      3
      N
      to:
      ?

      When converting from the Hennig86 format, the user will be prompted to enter his/her own translation preferences.
      The format is a sequential format. On the first line there is the word 'xread', used for recognition of the file. On the following line a title/description can be placed in between single quotes. The third line consists of the sequence length and the number of sequences. After the alignment ( is sequential format ), the file is closed by a semicolon (;). The symbol used for missing data is '?'. There is no separate character for defining gaps.

Example:

6) MEGA

The Molecular Evolutionary Genetic Analysis program by Kumar, Tamura & Nei is a tree construction program based on distance- and parsimony methods.
The format is described in the MEGA manual. In brief:
The format exists in the interleaved and noninterleaved format.
Disregarding the format type, the file always starts with the word '#mega' on the first line. On the following line, a title can be stated, preceded by the term 'TITLE:'. In between the title and the sequence data, a description or extra comments can be placed. Even inside the sequences, comments are allowed in between quotes (""). The sequence names are preceded by a '#'.

Examples:

#mega
TITLE: Four Anthropoidea

The interleaved format

#Homo_sapiens AGUCGAGUC---GCAGAAACGCAUGAC-GACC
#Pan_paniscus AGUCGCGUCG--GCAGAAACGCAUGACGGACC
#Gorilla_gorilla AGUCGCGUCG--GCAGAUACGCAUCACGGAC-
#Pongo_pigmaeus AGUCGCGUCGAAGCAGA--CGCAUGACGGACC

#Homo_sapiens ACAUUUU-CCUUGCAAAG
#Pan_paniscus ACAUCAU-CCUUGCAAAG
#Gorilla_gorilla ACAUCAUCCCUCGCAGAG
#Pongo_pigmaeus ACAUCAUCCCUUGCAGAG

---

#mega
TITLE: Four Anthropoidea

The noninterleaved format

#Homo_sapiens
AGUCGAGUC---GCAGAAACGCAUGAC-GACCACAUUUU-CCUUGCAAAG
#Pan_paniscus
AGUCGCGUCG--GCAGAAACGCAUGACGGACCACAUCAU-CCUUGCAAAG
#Gorilla_gorilla
AGUCGCGUCG--GCAGAUACGCAUCACGGAC-ACAUCAUCCCUCGCAGAG
#Pongo_pigmaeus
AGUCGCGUCGAAGCAGA--CGCAUGACGGACCACAUCAUCCCUUGCAGAG

7) NBRF/PIR

The format of this large protein database is similar to the FASTA format. Each sequence, though, starts with a '>[sequence type code];', followed by the sequence name and a description ( on the next line ).
This description is ignored by ForCon.
On the following line the actual sequence is written and is ended with an asterisk (*).

The sequence type codes are as follows:


Code
Sequence type
P1 Protein (complete)
F1
Protein (fragment)
DL
DNA (linear)
DC
DNA (circular)
RL
RNA (linear)
RC
RNA (circular)
N3
tRNA
N1
other functional RNA

ForCon accepts all these codes, but only writes down codes P1, D1 and RL.

Example :

>RL;Homo sapiens
Homo sapiens RNA sequence
AGUCGAGUC---GCAGAAACGCAUGAC-GACCACAUUUU-CCUUGCAAAG*
>RL;Pan paniscus
Pan paniscus RNA sequence
AGUCGCGUCG--GCAGAAACGCAUGACGGACCACAUCAU-CCUUGCAAAG*
>RL;Gorilla gorilla
Gorilla gorilla RNA sequence
AGUCGCGUCG--GCAGAUACGCAUCACGGAC-ACAUCAUCCCUCGCAGAG*
>RL;Pongo pigmaeus
Pongo pigmaeus RNA sequence
AGUCGCGUCGAAGCAGA--CGCAUGACGGACCACAUCAUCCCUUGCAGAG*

8) PAUP/NEXUS

The Nexus format is used by several programs: PAUP, MacClade, Spectrum,... .
For a detailed description of the format, I'd like to refer to the article written by Maddison et al. :

Maddison, D.R., Swofford, D.L., Maddison, W.P. (1997) NEXUS: An extendible file format for systematic information. Syst.Biol. 46, 590-621.

ForCon is limited in the use of this extremely versatile format. Only the information on the alignment itself is used and generated, although any NEXUS file can be used as input file. The program will ignore all information that is not used.
Here is an example of a NEXUS file generated by the ForCon program:

#NEXUS
[TITLE: Four Anthropoidea]

begin data;
dimensions ntax=4 nchar=50;
format datatype=RNA missing=N gap=-;

matrix
Homo_sapiens
AGUCGAGUC---GCAGAAACGCAUGAC-GACCACAUUUU-CCUUGCAAAG
Pan_paniscus
AGUCGCGUCG--GCAGAAACGCAUGACGGACCACAUCAU-CCUUGCAAAG
Gorilla_gorilla
AGUCGCGUCG--GCAGAUACGCAUCACGGAC-ACAUCAUCCCUCGCAGAG
Pongo_pigmaeus
AGUCGCGUCGAAGCAGA--CGCAUGACGGACCACAUCAUCCCUUGCAGAG
;
endblock;
begin assumptions;
options deftype=unord;

---

#NEXUS
[TITLE: Four Anthropoidea]

begin data;
dimensions ntax=4 nchar=50;
format interleave datatype=RNA missing=N gap=-;

matrix
Homo_sapiens AGUCGAGUC---GCAGAAACGCAUGAC-GAC
Pan_paniscus AGUCGCGUCG--GCAGAAACGCAUGACGGAC
Gorilla_gorilla AGUCGCGUCG--GCAGAUACGCAUCACGGAC
Pongo_pigmaeus AGUCGCGUCGAAGCAGA--CGCAUGACGGAC

Homo_sapiens CACAUUUU-CCUUGCAAAG
Pan_paniscus CACAUCAU-CCUUGCAAAG
Gorilla_gorilla -ACAUCAUCCCUCGCAGAG
Pongo_pigmaeus CACAUCAUCCCUUGCAGAG
;

endblock;
begin assumptions;
options deftype=unord;

本帖子中包含更多资源

您需要 登录 才可以下载或查看,没有账号?立即注册

×
看贴要回是本分,有问必答是人才,解决问题回贴是公德.
医疗设备维修.维修咨询(请尽可能在论坛提问),协助维修,上门服务.
电话:13991827712

yeec维修网视频培训资料购买链接
BeckmanCoulter DXA系列培训资料
Ortho VITROS 系列培训资料
Ortho enGen_ThermoFisher TCA 实验室自动化系统培训资料
Roche Cobas 实验室自动化系统培训资料
Roche Cobas modular系列分析仪培训资料
Horiba-ABX Yumizen系列培训资料
DiaSorin Liaison系列培训资料
Advia2120培训资料
Inpeco-Aptio系列培训资料
Atellica Solution系列培训资料
Siemens Immunoassay系列培训资料 西门子化学发光系列
SIEMENS Advia系列培训资料 西门子生化系列
Toshiba/Abbott系列培训资料 东芝雅培生化系列
Abbott Architect 系列培训资料 雅培生化化学发光系列
ACL TOP 系列培训资料 沃芬TOP血凝系列
BeckmanCoulter Immunoassay系列培训资料 贝克曼化学发光系列
BeckmanCoulter DXH 系列培训资料 贝克曼DXH血球系列
BeckmanCoulter自动样品处理系统介绍性培训资料 贝克曼前后处理流水线系列
BeckmanCoulter AU系列培训资料 贝克曼AU生化系列
BeckmanCoulter DXC系列培训资料 贝克曼DXC生化系列
LaboSpect003/008/AS 7100/7180分析仪培训资料
Horiba-ABX系列培训资料 Horiba-ABX血球系列
Sysmex 血凝系列培训(CA/CS)
Sysmex 尿液分析系列培训(UF1000/5000/UC3500)
Sysmex 血球系列培训(KX21/POCH/XS/XT/XE)
Sysmex XN系列培训(XN-L/XN1000/XN2000/XN3000/XN9000)
Sysmex HISCL系列培训
可直接淘宝店铺购买https://yeec.taobao.com,或咨询手机/微信:13991827712,QQ:67708237
 

 楼主| 郑振寰 发表于 2007-6-6 10:11 | 显示全部楼层
9) Parsimony Jackknifer

The program by Farris is a parsimony program that also implements the jackknife method to test the reliability of branches.
The format is similar to the MEGA format. On the first line a title/description is placed in between single quotes. The alignment can be written in sequential or interleaved format, but the sequence names have to be placed between brackets. Also no blanks are allowed in the names. They should be replaced by underscores ( _ ). The file is ended by a semicolon.

Examples:

' Four Anthropoidea '
(Homo_sapiens) AGUCGAGUC---GCAGAAACGCAUGAC-GAC
CACAUUUU-CCUUGCAAAG
(Pan_paniscus) AGUCGCGUCG--GCAGAAACGCAUGACGGAC
CACAUCAU-CCUUGCAAAG
(Gorilla_gorilla) AGUCGCGUCG--GCAGAUACGCAUCACGGAC
-ACAUCAUCCCUCGCAGAG
(Pongo_pigmaeus) AGUCGCGUCGAAGCAGA--CGCAUGACGGAC
CACAUCAUCCCUUGCAGAG
;

---

' Four Anthropoidea '
(Homo_sapiens) AGUCGAGUC---GCAGAAACGCAUGAC-GAC
(Pan_paniscus) AGUCGCGUCG--GCAGAAACGCAUGACGGAC
(Gorilla_gorilla) AGUCGCGUCG--GCAGAUACGCAUCACGGAC
(Pongo_pigmaeus) AGUCGCGUCGAAGCAGA--CGCAUGACGGAC
(Homo_sapiens) CACAUUUU-CCUUGCAAAG
(Pan_paniscus) CACAUCAU-CCUUGCAAAG
(Gorilla_gorilla) -ACAUCAUCCCUCGCAGAG
(Pongo_pigmaeus) CACAUCAUCCCUUGCAGAG
;

10) PHYLIP

The PHYLIP package is a tree construction package that implements parsimony, distance and maximum likelihood.
The format is pretty straightforward : on the first line the number of sequences and their length (in characters) is displayed. Then the alignment is displayed in an interleaved or sequential format. The sequence names are allowed to contain blanks, but may not consist of more than 10 characters. The interleaved format is slightly different from the other formats in the way that the sequence names are only displayed in the first block, while other interleaved formats repeat the names every block.

For example:

4 50
Homo sapie AGUCGAGUC---GCAGAAACGCAUGAC-GACC
Pan panisc AGUCGCGUCG--GCAGAAACGCAUGACGGACC
Gorilla go AGUCGCGUCG--GCAGAUACGCAUCACGGAC-
Pongo pigm AGUCGCGUCGAAGCAGA--CGCAUGACGGACC

ACAUUUU-CCUUGCAAAG
ACAUCAU-CCUUGCAAAG
ACAUCAUCCCUCGCAGAG
ACAUCAUCCCUUGCAGAG

The sequential format looks like this:

4 50
Homo sapie AGUCGAGUC---GCAGAAACGCAUGAC-GACC
ACAUUUU-CCUUGCAAAG
Pan panisc AGUCGCGUCG--GCAGAAACGCAUGACGGACC
ACAUCAU-CCUUGCAAAG
Gorilla go AGUCGCGUCG--GCAGAUACGCAUCACGGAC-
ACAUCAUCCCUCGCAGAG
Pongo pigm AGUCGCGUCGAAGCAGA--CGCAUGACGGACC
ACAUCAUCCCUUGCAGAG

You can find more info in the PHYLIP package documentation.

11) TREECON

TREECON is a software package for construction and drawing of phylogenetic trees on the basis of
evolutionary distances.
A full description of the TREECON format can be found right here.

Example:

50
Homo sapiens
AGUCGAGUC---GCAGAAACGCAUGAC-GACCACAUUUU-CCUUGCAAAG
Pan paniscus
AGUCGCGUCG--GCAGAAACGCAUGACGGACCACAUCAU-CCUUGCAAAG
Gorilla gorilla
AGUCGCGUCG--GCAGAUACGCAUCACGGAC-ACAUCAUCCCUCGCAGAG
Pongo pigmaeus
AGUCGCGUCGAAGCAGA--CGCAUGACGGACCACAUCAUCCCUUGCAGAG

Walk-through

After succesfully installing ForCon, run the forcon.exe executable (or just double-click the shortcut).

The start-up screen appears:

Pressing the 'Enter' button will continue the program.
First, you will be asked to specify the format of the input- and output file:

Just select the format from each list and press OK.
After doing this, you will be prompted to specify the input file.

After this, the program will ask you for the blocksize/cutoff. Here you can specify the number of characters that each block/sequence line will consist of.

Fill in the text box and press OK.

If your input file was a Hennig86 file, you are asked for the 'translation':

So, in this case, every 0 is translated into an A, 1 to T, etc.
Make sure just to enter one character for each box !

Specify the file you would like to save the new alignment in:

After doing this, you can make a selection of the sequences you would like convert.


Click a name on the list to select that sequence. To select multiple sequences, hold down the Control key on your keyboard while selecting. Large blocks of sequences can be selected using the Shift key. Use the Select All button to select all the sequences at once. The deselect all button does the opposite.
After you made your selection, press OK.

You now get the chance to select certain positions of the alignment:

You can choose between 4 options:

  • use all of the alignment ( no change )
  • use the 1st and 2nd codon positions, e.g. AAU GCU ACU ACG becomes AAGCACAC
  • only use the third codon positions, e.g. AAU GCU ACU ACG becomes UUUG
  • use specific user-defined codon positions: to cut parts out of your alignment; areas should be separated by commas.

Just check the button of you choice, press OK, and we're off to:

The end.
You can find your file in the directory you specified earlier.

Disclaimer

This software is distributed freely 'as-is'. The programmer cannot be held responsible for any damage that may occur. You can distribute the program among your friends, colleagues, etc. in the original .ZIP file. Please always register your program, if you should get a copy. It's free, won't take much of your time, and you will be notified of any new releases or bugs.

本帖子中包含更多资源

您需要 登录 才可以下载或查看,没有账号?立即注册

×
看贴要回是本分,有问必答是人才,解决问题回贴是公德.
医疗设备维修.维修咨询(请尽可能在论坛提问),协助维修,上门服务.
电话:13991827712
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

第十九届检验仪器(西安)培训班通知

QQ|申请友链|手机版|小黑屋|加入QQ群|注销账号|yeec维修网

GMT+8, 2024-5-15 05:15 , Processed in 0.380884 second(s), 34 queries .

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表