Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

supporting-data-trips.csv - stuck agents + data structure #34

Open
val-ismaili opened this issue Feb 15, 2024 · 9 comments
Open

supporting-data-trips.csv - stuck agents + data structure #34

val-ismaili opened this issue Feb 15, 2024 · 9 comments
Assignees
Labels
bug Something isn't working documentation Improvements or additions to documentation enhancement New feature or request

Comments

@val-ismaili
Copy link

The trips file has some issues with stuck agents. Agents are active past the 24 hour window. We should implement a way of accounting for this perhaps as Elara does by wrapping times back into a 24 hour period. This bug will effect most KPIs.

person trip_number trip_id dep_time trav_time wait_time traveled_distance euclidean_distance main_mode longest_distance_mode ... start_facility_id start_link start_x start_y end_facility_id end_link end_x end_y first_pt_boarding_stop last_pt_egress_stop
190391 3 190391_3 31:59:32 00:00:01 00:00:00 0 0 NaN walk ... NaN 369880 525078 297231 NaN 369880 525078 297231 NaN NaN
40100 5 40100_5 31:47:32 00:00:01 00:00:00 0 0 NaN walk ... NaN 137871 532367 226254 NaN 137871 532367 226254 NaN NaN
16955 4 16955_4 31:41:54 00:06:36 00:00:00 332 255 NaN walk ... NaN 201580 638498 277353 NaN 221750 638712 277493 NaN NaN
38749 4 38749_4 31:02:20 00:37:27 00:00:00 1874 1441 NaN walk ... NaN 5177455258260724183_5177455258272565597 651594 307107 NaN 5177457890266640769_5177457890865694291 652373 308320 NaN NaN
183091 4 183091_4 30:49:57 00:07:30 00:00:00 377 290 NaN walk ... NaN 5177097503427265221_5177097503428536281 559770 193725 NaN 232110 559481 193700 NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...

Elara wraps in such a way that a trip at 25:00:00 is reported as 01:00:00. However we handle it in Gelato, we should clearly state this in the documentation.

Additionally there's a lot of empty data columns. main_mode is all NaN as we have longest_distance_mode. All facility related columns are empty. Can we add an end_time column. we have start and duration but its always useful to have the end_time and cuts out the need to calculate it each time.

See TE gelato outputs

time java -Xmx120g -jar target/gelato-0.0.1-alpha-with-dependencies-eb19697.jar \
-mc /mnt/efs/simulations_refresh/10pc/workdist_5km_20230826/200/output_config.xml \
-mo /mnt/efs/simulations_refresh/10pc/workdist_5km_20230826/200 \
-o /mnt/efs/analysis/vi/gelato/baseline
@val-ismaili val-ismaili added bug Something isn't working documentation Improvements or additions to documentation enhancement New feature or request labels Feb 15, 2024
@divyasharma-arup
Copy link
Contributor

@divyasharma-arup to check on the NaN issues Val identified, and whether this issue shows up in the Paris simulation too.

@steffenaxer
Copy link
Contributor

If one starts wrapping entries that are >24 h one is forced to do this everywhere in order avoid distracting users. And I found out doing this everywhere in the simulation is quite a sisyphus work. So I discarded this approach.

@divyasharma-arup
Copy link
Contributor

@syhwawa , would you be able to check what is different between the facilities file we use for Paris East and the one we use for TE? Seems this is an issue for our TE file, but we don't have the same problem of NaN values for facilities with Paris.

@syhwawa
Copy link

syhwawa commented Feb 22, 2024

From paris east simulation, the trips log outputs looks more reasonable compared to the TE outputs. There is no missing value for the main_mode or end_facility_id in the trips outputs. I am wondering if that could be specific issue occur in the TE simulation.

Regarding the main_mode and longest_distance_mode, there are some differences between these two modes.

  • main_mode indicates the primary mode of transportation for the entire trip
  • longest_distance_mode specifies the mode of transportation used for the longest distance part of the trip

They could be the same or not and there is an example below showing the when they're different:

trip_id main_mode modes longest_distance_mode
10000285_3 car_passenger walk-car_passenger-walk walk
10000500_3 pt walk-pt-walk walk
10000566_2 car_passenger walk-car_passenger-walk walk
10000566_3 car_passenger walk-car_passenger-walk walk
10000684_6 pt walk-pt-walk walk

Paris east gelato submission command :

time java -Xmx48g -jar /mnt/efs/analysis/ys/gelato/target/gelato-0.0.2-alpha-with-dependencies-da45a9b.jar \
-mc /mnt/efs/simulation/output/baseline_paris_east_no_drt_fix_activity_20240125_30pct/output_config.xml \
-mo /mnt/efs/simulation/output/baseline_paris_east_no_drt_fix_activity_20240125_30pct \
-o /mnt/efs/analysis/ys/gelato_outputs/baseline

@divyasharma-arup
Copy link
Contributor

hi Yuhao, thanks for looking into this. What does primary mode mean, and how is that different from longest distance? Do you have an idea why it would be blank in the TE sim? For facilities, Kasia thinks the Paris East sim has a different synthesis pipeline that captures the facilities IDs, but that isn't the case for the TE sim. She's thinking of a way to address that in the code base.

@syhwawa
Copy link

syhwawa commented Feb 23, 2024

@divyasharma-arup From my perspective, the "main mode" is the primary or most significant mode from the agent trip, which might not necessarily be the mode covering the longest distance. "Longest distance mode" is a purely quantitative measure.

For example, when an agent walks a long distance to a pt station and takes a stop bus, and the main mode is PT and the longest distance mode is walk(An example can be found in the previous message, like trip_id: 10000500_3 )

@syhwawa
Copy link

syhwawa commented Feb 28, 2024

The Gelato trips output, named supporting-data-trips.csv, should ideally be derived from the MATSim tabular outputs found in output_trips.csv.gz.

I've noticed that the main_mode column is missing in the output_trips.csv.gz file in our ABM project simulations, such as TE and the Sheffield project, but this issue does not occur in the Paris East simulation.

As Kasia pointed out, the discrepancy could likely be due to differences in the synthesis pipeline. I feel the issue happens when post-simulation outputs in MATSim rather than gelato.

@syhwawa
Copy link

syhwawa commented Feb 29, 2024

There is github issue talking about the missing value in the main_mode column in output_trips.csv which related to the version of matsim. It was fixed in this PR

And the missing value happend in the trips outputs since the version of matsim we used in TE simulation is not that up-to-date.

@divyasharma-arup
Copy link
Contributor

divyasharma-arup commented Apr 2, 2024

To recap:

  • Gerry has opened a separate issue to address agents travelling beyond 24 hours
  • The main_mode column does not have data due to the version of MATSim being used for the TE project.
  • facility_id was also missing due to the lack of generating an facilities.xml file for TE, this is being addressed separately for future sims.
  • We'd like to add an end_time column to the trip logs to enable easy analysis.

@KasiaKoz, for a future release, would it be possible to address the below within the trips file?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants