Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple Command Line Interface (CLI) #68

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

Simple Command Line Interface (CLI) #68

wants to merge 6 commits into from

Conversation

buren
Copy link
Contributor

@buren buren commented Jun 13, 2018

Side note: First of all thank you for an awesome gem. Over the past years and I've reached for this gem numerous times for various purposes big and small, its always a joy to use - thank you! 🙌


Simple Command Line Interface (CLI)

Rationale
I've found myself wanting to do a "quick and dirty" crawl of different websites quite often. For example to find 4X, 5XX etc. So far I've written small Ruby scripts using spidr with the things I need.
Many of these use cases could be solved with a fairly simple CLI.

Examples

spidr https://example.com

it supports all Spdir::Agent arguments

spidr --limit=10 --user-agent=myagent https://example.com

you can output multiple values (CSV-style), the columns argument map to methods on page

spidr --columns=code,url,title,content_type,meta_redirect? https://example.com

Usage

Usage: spidr [options] <url>
        --columns=[val1,val2]        Columns in output
        --content-types=[val1,val2]  Formats to output (html, javascript, css, json, ..)
        --[no-]header                Include the header
        --open-timeout=val           Optional open timeout
        --read-timeout=val           Optional read timeout
        --ssl-timeout=val            Optional ssl timeout
        --continue-timeout=val       Optional continue timeout
        --keep-alive-timeout=val     Optional keep_alive timeout
        --proxy-host=val             The host the proxy is running on
        --proxy-port=val             The port the proxy is running on
        --proxy-user=val             The user to authenticate as with the proxy
        --proxy-password=val         The password to authenticate with
        --default-headers=[key1=val1,key2=val2]
                                     Default headers to set for every request
        --host-header=val            The HTTP Host header to use with each request
        --host-headers=[key1=val1,key2=val2]
                                     The HTTP Host headers to use for specific hosts
        --user-agent=val             The User-Agent string to send with each requests
        --referer=val                The number of seconds to pause between each request
        --queue=[val1,val2]          The initial queue of URLs to visit
        --history=[val1,val2]        The initial list of visited URLs
        --limit=val                  The maximum number of pages to visit
        --max-depth=val              The maximum link depth to follow
        --[no-]robots                Respect Robots.txt
    -h, --help                       How to use
        --version                    Show version

todo

  • Communicate that the --[no-]robots option requires gem install robots?

If you don't want to include this here then this could be a separate gem, something like spidr_cli (with your blessing unless you object?). However it would probably be easier for others to find it if its here.

Thanks!

@buren
Copy link
Contributor Author

buren commented Jul 1, 2018

I've created a spidr_cli gem which includes the above mentioned functionality, plus accept/reject hosts, ports, links and urls arguments and ability to chose what method to use: Spidr::site|host|start_at.

@postmodern
Copy link
Owner

postmodern commented Aug 26, 2021

Sorry for not noticing this. If I were to add a CLI it would need to be a class called Spidr::CLI. It would also need to catch Interrupt and Errno::EPIPE exceptions (see: how command_kit handles this). Also, would need a --format or --output-format option to control plain text, CSV, or JSON. Would also need specs that invoke the command and uses RSpec's .to output(...).to_stdout.

Not to plug my own code too much, but you might want to consider using command_kit for your spidr-cli gem?

@postmodern
Copy link
Owner

If you want to get this merged, checkout the CLI class from wordlist.rb. Feel free to copy it's zero-dependency boilerplate CLI code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants