Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need better way of specifying software versions #4

Open
tpiscitell opened this issue Jul 2, 2015 · 1 comment
Open

Need better way of specifying software versions #4

tpiscitell opened this issue Jul 2, 2015 · 1 comment

Comments

@tpiscitell
Copy link
Collaborator

Keeping a hard coded version number in common.sh is tedious because as new versions are released, old ones are purged from the Apache mirrors. It would be nice if this didn't break every time a new version of Hbase/Storm/Hadoop/etc.. was released.

Idea:

  1. If any version of $software exists in vagrant/resources/tmp, use that
  2. If not, get the closest apache mirror of $software and see what version are available
  3. Do some "fuzzy matching" to identify what version to use. (Ex. ver 0.98.* of Hbase)
  4. wget identified version
  5. carry on....
@chokha
Copy link

chokha commented Jul 6, 2015

Let me know if this works. This will address 2 & 3 and will return a valid link of the first available binary in the directory listing from the closest apache mirror.

Usage example:
python list_apache.py http://www.webhostingreviewjam.com/mirror/apache/hbase
python list_apache.py http://www.webhostingreviewjam.com/mirror/apache/hive

import sys
import urllib
import re
import urllib2

parse_re = re.compile('href="([^"]*)".*(..-...-.... ..:..).*?(\d+[^\s<]*|-)')
          # look for          a link    +  a timestamp  + a size ('-' for dir)
response_code = -1
def list_apache_dir(url):
    global response_code
    try:
        html = urllib.urlopen(url).read()
    except IOError, e:
        print 'error fetching %s: %s' % (url, e)
        return
    if not url.endswith('/'):
        url += '/'
    files = parse_re.findall(html)
    dirs = []
#   print url + ' :'  
#   print '%4d file' % len(files) + 's' * (len(files) != 1)
    for name, date, size in files:
        if size.strip() == '-':
            size = 'dir'
        if name.endswith('/'):
            dirs += [name]
#       print '%5s  %s  %s' % (size, date, name)

        if not name.endswith('/') and name.endswith('bin.tar.gz'):
                 print url + name
                 resp = urllib.urlopen(url+name)
                 response_code = resp.code
#                print response_code
        if response_code == 200:
                break

    for dir in dirs:
        print
#        print response_code
        list_apache_dir(url + dir)
        if response_code ==200:
#                        print response_code
                 break

for url in sys.argv[1:]:
    print
    list_apache_dir(url)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants