发信人: ldg (三十三块★breathe), 信区: Biology
标 题: Re: About blast
发信站: Unknown Space - 未名空间 (Sat Sep 13 22:46:36 2003), 站内信件
下面是我一堂课上的讲义
Repetive elements in query sequences can cause spurious database
matches. These sequences lead to artifically high alignment scores
with database sequences when they are acturally not related to databases
sequences. Alu repeats, low-complexity regions (LCRs) with short period
repeats or overrepresented residues. In fact, one half of protein
sequences in databases contain at least one LCR.
Reason: the LCR sequences do not fit the residue-by-residue sequence
conservation and therefore do not reflect evolutionary relationship.
Methods for measuring statistical significance of alignment are based on
certain degree of randomness. However, certain patterns in unrelated
sequences violate this rule.
【 在 feizj (cornell) 的大作中提到: 】
: Why all the public blast servers (NCBI, TIGR etc) use default filter (DUST and
: SEG) to mask such as repeat sequences?
: If I compare a DNA sequence having several repeats against itself using the
: above filter, it will not give me 100% identity. Sometimes even less than 90%
: identity. That is not true. But there should be some reasons for blast to do
: this. Can anyone tell me? Thanks.
--
※ 来源:.Unknown Space - 未名空间 mitbbs.com.[FROM: 165.91.]
|