-
Notifications
You must be signed in to change notification settings - Fork 57
swineherd capture
Philip (flip) Kromer edited this page May 9, 2012
·
1 revision
Logging:
- All output from the launched workflow should go to a workflow log file
- Hadoop output is special and should be pulled down from the jobtracker
- jobconf.xml
- job details page
Workflow should specify a logdir, defualts to workdir + '/logs'
Fetching hadoop job stats:
- Get job id
- Use curl to fetch the latest logs listing: "http://jobtracker:50030/logs/history/"
- Parse the logs listing and pull out the two urls we want (something-jobid.xml, something-jobid....)
- Fetch the two urls we care about and dump into the workflow's log dir.
- Possibly parse the results into an ongoing workflow-statistics.tsv file
Other output:
Output that would otherwise go to the terminal (nohup.out or some such) should be collected and dumped into the logdir as well.