ch_frb_rfi: A plugin-based framework for processing channelized intensity data in radio astronomy.
This manual isn't 100% complete, but there should be enough examples to get started!
There are currently two ways to use ch_frb_rfi
.
First, you can write standalone python scripts which run the pipeline through the python interface in the ch_frb_rfi
module.
Second, you can write python scripts which output json files, and add them to the "inventory" in ch_frb_rfi/json_files
.
You can then use command-line tools (rfp-run
, rfp-time
, rfp-analyze
) which form pipelines by chaining together json files.
In this section, we list some examples of the first method (running the pipeline through standalone python scripts). See comments in the scripts for details.
./example.py
: a few minutes of 1K-channel incoherent-beam data../example2.py
: an acquisition with a real pulsar../example3.py
: five minutes of 16K-channel Crab pulsar (26m) data../s1.py
...s13.py
: some examples of interesting subsets of data, collected by Masoud (17-08-11)./s1-offline.py
: re-analyzing s1.py using offline variance estimation.
The second way of using ch_frb_rfi is to write short scripts which output json file "building blocks",
then use command-line tools (rfp-run
, rfp-time
, rfp-analyze
) to chain these building blocks together.
It is also possible to make "run lists" which allow multiple pipelines in batch mode.
For more information on the command-line tools, see
"Command-line utilities" in rf_pipelines/MANUAL.md.
The ch_frb_rfi repository already contains some json file "building blocks" in json_files/
.
The scripts which generate them are in the parallel directory json_scripts/
.
For a current inventory of json files, see the next section "JSON inventory".
For now, we just give a few examples of interesting runs which can be done using these building blocks.
In all of these examples, the pipeline runs consist of three json files: an acquisition, an RFI chain, and a dedisperser.
-
Analyze the "17-08-11 Masoud examples". The following command line uses 6 threads in parallel and takes a few minutes to complete. The result will appear in the web viewer under
masoud_examples_s1
, ...,masoud_examples_s10
.cd ch_frb_rfi/json_files rfp-run -w masoud_examples -t 6 acqs/17-08-11-masoud-examples-runlist-first10.json rfi_1k/17-10-24-first-try.json bonsai_1k/bonsai_nfreq1024_7tree_f512_v3.json
-
Analyze the 16k-upchannelized data. Note that currently, we only have ~2.5 hours of data in total! This uses 4 threads and takes ~4 hours to complete. The result will appear in the web viewer under
upchannelized_part0
, ...,upchannelized_part6
.cd ch_frb_rfi/json_files rfp-run -w upchannelized -t 4 acqs/17-04-25-utkarsh-26m-16k-runlist.json rfi_16k/17-10-24-first-try.json bonsai_16k/bonsai_production_noups_nbeta1_v2.json
-
Analyze all of the incoherent-beam pathfinder data. The following command line should work but I haven't actually tried it yet. It will take a few days to finish! The results will appear in the web viewer under
everything_runX_partY
.cd ch_frb_rfi/json_files rfp-run -w everything -t 6 acqs/17-02-08-incoherent-data-avalanche-runlist.json rfi_1k/17-10-24-first-try.json bonsai_1k/bonsai_nfreq1024_7tree_f512_v3.json
-
Timing the pipeline. The following command should be run on a 20-core compute node (e.g. frb-compute-0.physics.mcgill.ca), and is intended to be representative of the real-time search with 16 beams/node, where each core is responsible for packet decoding, RFI removal, and dedipsersion for one beam. We run 20 timing threads in parallel (one for each core).
cd ch_frb_rfi/json_files # on a 20-core compute node, not frb1! rfp-time -t 20 toy_streams/chime_network_nfreq16k_nt100k.json rfi_16k/17-10-24-first-try-noplot.json bonsai_16k/bonsai_production_noups_nbeta1_v2-noplot.json
Note that the timing pipeline consists of:
toy_streams/chime_network_nfreq16k_nt100k.json
: dummy network stream, runs the packet decoding kernel but does not receive real packets.rfi_16k/17-10-24-first-try-noplot.json
: current 16k rfi removal chain, note the "noplot" which removes the plotter_transforms.bonsai_16k/bonsai_production_noups_nbeta1_v2-noplot.json
: 16k dedispersion, note the "noplot" which disables plotting.
-
"Analyzing" the pipeline. The
rfp-analyze
utility shows some diagnostic info: buffer latencies and memory footprints. It could use more documentation, so the output may be cryptic, but we include an example here for completeness! Note that the transform chain here is the same as the timing example above (16k, noplot).rfp-analyze -r toy_streams/chime_network_nfreq16k_nt100k.json rfi_16k/17-10-24-first-try-noplot.json bonsai_16k/bonsai_production_noups_nbeta1_v2-noplot.json
The json_files/
directory contains json files which correspond to pipeline_objects, and also contains some "run-lists".
The pipeline_object json files are created by scripts in json_scripts/
.
The run-list json files are simple enough that I usually just write them by hand.
Reminder: to read an rf_pipelines json file, the syntax is
j = rf_pipelines.json_read(filename)
p = rf_pipelines.pipeline_object.from_json(j) # returns an object of type rf_pipelines.pipeline_object
To write an rf_pipelines json file, the syntax is
j = p.jsonize() # where p is an object of type rf_pipelines.pipeline_object
rf_pipelines.json_write(filename, j)
-
json_files/acqs/17-04-25-utkarsh-26m-*
Currently, this is our only 16K-channel dataset! It is from a 26-m run on 17-04-25, recorded in baseband, and upchannelized by Utkarsh's code. There is ~2.5 hours of data, divided into 7 parts (i.e. 7 json files).
We save json files for the 16k-channelized acqs, and 1k-channelized acqs obtained by downsampling down to 1024 frequencies. These latter acqs have (1/16) the data volume and are sometimes useful, e.g. for quick plotting or experimenting with tweaks to the "1k" part of the RFI transform chain.
In hindsight, it would have been better to generate a single json file for all 7 parts combined, but I haven't done this yet! (Due to details of how the upchannelization code works, there will be ~1 second gaps at boundaries between the 7 parts, but that's OK.)
-
json_files/acqs/17-08-11-masoud-examples
Some examples of "interesting" incoherent-beam CHIME pathfinder data (also in scripts/s*.py). From Masoud (17-08-11). Note that the runlist
json_files/acqs/17-08-11-masoud-examples-runlist-first10.json
only contains the first 10, since #11 is a long acq.- s1: A faint source at low DM.
- s2: An RFI storm which results in a single false positive: (DM, SNR) = (77.62, 10.64).
- s3: A periodic variation in intensity, combined with RFIs, makes it very hard to suppress all the false positives.
- s4: This sample contains two major RFI events which result in a large fraction of zero weights along the time axis.
- s5: This example contains a pulsar. In addition, there seems to be a strange variation in the overall intensity which can be revealed by using (detrender_niter, clipper_niter) = (1, 3).
- s6: This short sample represents a highly active RF environment!
- s7: Another highly active RF environment.
- s8: Another highly active RF environment.
- s9: This is an interesting example: It contains an RFI storm, a few false positives, and some significant changes in the running variance.
- s10: B0329 (can be analyzed with rfi_level = 1).
- s11: 6 hours of data.
- s12: 5 min of data.
- s13: 4 min of data.
-
json_files/acqs/17-02-08-incoherent-data-avalanche
Acquisition data files for the ~1000 hours of incoherent-beam pathfinder data in frb1:/data2/17-02-08-data-avalanche.
We define one json file per ~10 hours of data, so ~100 json files are created. When a "long" gap (more than a minute) occurs in the data, we start a new json file.
The runlist
json_files/acqs/17-02-08-incoherent-data-avalanche-runlist.json
contains all ~100 json files, and can be used to analyze all of the data with onerfp-run
command (see example earlier in the manual). This will take a long time of course!It would be useful to catalog all "interesting" subsets of the incoherent-beam data and put these into another runlist which takes less time to run, but will (hopefully) be just as useful as the full dataset for testing RFI transform chains.
Currently there is only one possibility, but we anticipate adding more RFI transform chains soon.
-
rfi_1k/17-10-24-first-try.json
: based on 1K-channel RFI transform chain proposed by Masoud in the 17-08-11-examples. -
rfi_16k/17-10-24-first-try.json
: simplest 16K-channel chain, obtained by wrapping the 1K-chain in awi_sub_pipeline
, and adding two 16K-detrenders (one along the time axis, and one along the frequency axis).
Note that rfi_16k/17-10-24-first-try.json
is the same as the file rfi_configs/rfi_production_v1.json
in the ch_frb_l1 repository. The idea is that RFI transform
chains can be developed in the ch_frb_rfi "laboratory" and copied to ch_frb_l1 when we want to use them
in the real-time search.
The following 1K-channel json files are defined:
json_files/bonsai_1k/bonsai_nfreq1024_7tree_f384_v3-noplot.json
json_files/bonsai_1k/bonsai_nfreq1024_7tree_f384_v3.json
json_files/bonsai_1k/bonsai_nfreq1024_7tree_f512_v3-noplot.json
json_files/bonsai_1k/bonsai_nfreq1024_7tree_f512_v3.json
The fXXX
part of the filename is the sampling rate, in FPGA counts per sample.
This is 512 for the incoherent-beam acqs (0.00131072 sec), or 384 for the 16K 26m acqs (0.00098304 sec).
The following 16K-channel json files are defined:
json_files/bonsai_16k/bonsai_production_noups_nbeta1_v2-noplot.json
json_files/bonsai_16k/bonsai_production_noups_nbeta1_v2.json
json_files/bonsai_16k/bonsai_production_ups_nbeta1_v2-noplot.json
json_files/bonsai_16k/bonsai_production_ups_nbeta1_v2.json
Filenames containing ups
include an upsampled tree, which makes them more optimal for DM < 820
and pulse width < 1 ms, but they are more computationally expensive.
To use the bonsai json files, you must generate hdf5 files for each bonsai_config (with bonsai-mkweight
,
see bonsai MANUAL.md for more info) in /data/bonsai_configs
. This has already been done on frb1, and
the CHIME compute nodes (frb-compute-X).
For no deep reason, the (variance, reweighting) timescales are currently (100, 100) in the 1K-channel bonsai configs, and (200, 400) in the 16K-channel configs. These values are chosen semi-arbitrarily, and we should do a study to determine optimal settings!
Note that bonsai_configs/bonsai_production_XX.txt
is the same file as bonsai_configs/bonsai_production_XX.txt
in the ch_frb_l1 repository. The idea is that bonsai configs
can be developed in the ch_frb_rfi "laboratory" and copied to ch_frb_l1 when we want to use them
in the real-time search.
From the above discussion, the same bonsai configs are expected to appear in multiple places:
- in
git/ch_frb_rfi/bonsai_configs
- in
git/ch_frb_l1/bonsai_configs
- in
/data/bonsai_configs
(with "derived" hdf5 files created withbonsai-mkweight
).
This raises the possibility of files becoming out of sync, if they are modified in one place
and not updated in others, or if the .txt files are updated without regenerating the corresponding
hdf5 files. The script scripts/check-bonsai-config-consistency.py
will check for
inconsistencies, and should be run periodically.
Sometimes we want to run a pipeline with random data, instead of reading an acquisition from disk.
-
toy_streams/gaussian_nfreqXX_ntXX.json
Generates gaussian random intensities. The
ntXX
part of the filename is the number of time samples which will be generated before the stream ends. -
toy_streams/chime_network_nfreqXX_ntXX.json
A "dummy" network stream which runs the packet decoding kernel that is used in the real-time search to decode network packets, but does not actually read packets from the network. This is useful for timing (see
rfp-time
example earlier in the manual), since it represents the computational cost of the packet decoding kernel in the real-time search.
The following directories are located in /frb-archiver-1/acq_data
. CHIMEFRB/frb_ops/issues track their technical details. See kmsmith137/ch_frb_l1/OPERATIONS_MANUAL.md for more info on how to prepare new acquisitions for the "offline pipeline".
Name | Size |
---|---|
Long runs | |
Galactic_20180330_1 | 4.6T |
frb_run_11_20180407_beams_111to118_130to137_146to150 | 1.5T |
frb_run_11_20180406_beams_110to114_119to122_133to141_144to148 | 1.2T |
frb_run_11_20180406_1230_1430_beams_130_133_135_146_148_plus_minus_2 | 678G |
frb_run_11_20180407_beams_108to115_119to123_129to139_144to148 | 507G |
frb_run_11_20180406_beams_135_148 | 492G |
frb_run_11_20180406_beams_120to125_137to144 | 417G |
frb_run_11_20180407_beams_111to118_129to137_146to150 | 377G |
frb_run_11_20180406_1230_1430_beams_130_133_135_146_148 | 24G |
frb_run_10_B0329_20180322 | 6.5G |
frb_run_10_B0329_20180322_1 | 7.7G |
frb_run_10_B0329_20180322_2 | 1.4G |
frb_run_10_B0329_20180322_3 | 313G |
frb_run_10_B0329_20180322_3_sub_copy | 6.6G |
Short transits | |
frb_B1953+50_2018-05-02-09-06 | 25G |
frb_B2022+50_2018-04-08-08-17 | 17G |
frb_B1953+50_2018-05-02-09-07 | 15G |
frb_J2027+4557_2018-04-08-08-18 | 3.6G |
frb_J2027+4557_2018-04-08-08-17 | 1.8G |
frb_J2001+42_2018-05-02-09-10 | 1.5G |
No gains | |
B0329_no_gains_20180329 | 58G |
B0329_no_gains_20180329_2 | 75G |
B0329_no_gains_20180329_3 | 24G |
B0329_no_gains_20180329_4 | 11G |
B0329_no_gains_20180329_5 | 334M |
Test runs | |
frb_run_9_B0329 | 79G |
frb_run_9 | 2.1G |
frb_run_9_night 3 | 70G |
frb_run_9_test_1 | 5.8G |
frb_run_9_test_2 | 1.1G |
frb_run_9_test_3 | 8.4G |
frb_run_incoherent_test | 284M |
frb_run_incoherent_test_2 | 5.9G |
frb_run_incoherent_test_3 | 6.5G |
frb_run_incoherent_test_4 | 45G |
frb_incoherent_coherent_run | 67G |
DM95_beam69 | 29G |
inf_snr_capture | 2.6G |