Retrieve sequences from NCBI

Description

This tool retrieves sequences from the NCBI sequence databases based on given query terms. The tool is based on the NCBI Edirect package.

You can search for protein or nucleotide sequence entries, whose annotation matches the given search terms. A search term must be a single word consisting of letters and/or numbers. You can also use wildcard character *, to match any string. The search is case-insensitive: Mus and mus will produce the same matches.

The case of sequence length, a range should be defined with syntax from:to. For example: 120:125

The available search fields are
  • All fields
  • Keywords
  • Author
  • Organism
  • Accession
  • Gene name
  • Protein name
  • Sequence length
  • One search can include 1-3 search terms that are combined using the given logical operator (AND, OR, NOT).

    The matching sequences are by default saved in FASTA format, but also native Genbank formats, that include the sequence annotation too, can be used.

    Details

    As retrieving data from NCBI server takes time, the maximum number of hits to be retrieved is set to 50 000 hits. However, in the case of nucleotides, 50 000 hits may already be far too big data set to download a as one "hit" may be a complete chromosome or genome.

    Output

    The matching sequences are by default saved in fasta format, but also native Genbank formats, that include the sequence annotation too, can be used.