Skip to content

Commit

Permalink
Revise documentation and download strategy.
Browse files Browse the repository at this point in the history
Signed-off-by: Anurag Priyam <[email protected]>
  • Loading branch information
yeban committed Feb 17, 2016
1 parent b469083 commit a9e9b64
Showing 1 changed file with 17 additions and 7 deletions.
24 changes: 17 additions & 7 deletions lib/ncbi-blast-dbs.rake
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,16 @@ require 'net/ftp'
# local copy is older than at the given URL, or if the local copy is corrupt.
def download(url)
file = File.basename(url)
# Download tarball if the local copy is older than at the given URL or fetch
# it for the first time.
# Resume an interrupted download or fetch the file for the first time. If
# the file on the server is newer, then it is downloaded from start.
sh "wget -Nc #{url}"
# If the local copy is already fully retrieved, then the previous command
# ignores the timestamp. So we check with the server again if the file on
# the server is newer and if so download the new copy.
sh "wget -N #{url}"
# Resume aborted download. Do nothing if the file is already fully retrieved
# (at the cost is a round trip to server).
sh "wget -c #{url}"

# Always download md5 and verify the tarball. Re-download tarball if corrupt;
# extract otherwise.
# Immediately download md5 and verify the tarball. Re-download tarball if
# corrupt; extract otherwise.
sh "wget #{url}.md5 && md5sum -c #{file}.md5" do |matched, _|
if !matched
sh "rm #{file} #{file}.md5"; download(url)
Expand All @@ -22,6 +23,11 @@ def download(url)
end
end

# Connects to NCBI's FTP server, gets the URL of all database volumes and
# returns them grouped by database name:
#
# {'nr' => ['ftp://...', ...], 'nt' => [...], ...}
#
def databases
host, dir = 'ftp.ncbi.nlm.nih.gov', 'blast/db'
usr, pswd = 'anonymous', ENV['email']
Expand All @@ -35,10 +41,14 @@ def databases
end
end

# Create user-facing task for each database to drive the download of its
# volumes in parallel.
databases.each do |name, files|
multitask(name => files.map { |file| task(file) { download(file) } })
end

# List name of all databases that can be downloaded if executed without
# any arguments.
task :default do
puts databases.keys.join(', ')
end

0 comments on commit a9e9b64

Please sign in to comment.