Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'speed' values are NA , but not really #260

Open
rafapereirabr opened this issue Jul 7, 2022 · 3 comments
Open

'speed' values are NA , but not really #260

rafapereirabr opened this issue Jul 7, 2022 · 3 comments

Comments

@rafapereirabr
Copy link
Member

I'm finding a strange behavior in gtfs2gps(). In the reprex below, I filter a single trip and convert it to a GPS-like table. The problem is that the gtfs2gps() function prints a message saying 'speed' values are NA for shape_id '60-1-b12-1.1.O'. This message seems to suggest all speed values in this trip are NA, but they are not. This seems to be a problem in the code. The message should not be printed in this case, right?

The function is able to calculate the speed correctly, as seen in the outupt. There is only one NA in the last segment (as expected). So the function also prints the message Some 'speed' values are NA in the returned data.. As a rule, there will always be one NA in the last trip segment, right? So perhaps this message is unecessary. What do you guys think?

reprex

library(gtfs2gps)
library(ggplot2)
library(gtfstools)
library(data.table)

# path to GTFS.zip file
gtfs_file <- system.file("extdata/irl_dub/irl_dub_gtfs.zip", package = "gtfs2emis")

# read GTFS
gtfs <- gtfstools::read_gtfs(gtfs_file)

# Keep Monday services GTFS
gtfs <- gtfstools::filter_by_weekday(gtfs, 
                                     weekday = c('saturday', 'sunday'), 
                                     keep = FALSE)
# filter trip
id <- '6343.2.60-1-b12-1.1.O'
gtfs <- gtfstools::filter_by_trip_id(gtfs, trip_id =  id )


head(gtfs$trips)
head(gtfs$stop_times)
head(gtfs$stops)
head(gtfs$shapes)

# convert to gps
gps <- gtfs2gps(gtfs)
gps_sf <- gtfs2gps::gps_as_sflinestring(gps)

# plot
stops_df <- gtfstools::convert_stops_to_sf(gtfs)

ggplot() +
  geom_sf(data=gps_sf, aes(color=as.numeric(speed))) +
  geom_sf(data=stops_df, color='red') 

@Joaobazzo
Copy link
Collaborator

Joaobazzo commented Jul 12, 2022

Behavior of function

The gtfs2gps function creates new coordinates of stop_sequences, and adds extra points in order to match with the last coordinate of gtfs$shapes.
In this print blow, you can see that the last coordinate of gtfs$shapes do not match with the last coordinate of gtfs$stops.

Running the example you mentioned

Screenshot from 2022-07-12 11-47-38
In the map, you can see that is very very close, but not the same.
Screenshot from 2022-07-12 11-43-36

Problems

However, even if tail(gps,1) == tail(gtfs$shapes,1), there will be NAs added — because of this behavior of gtfs2gps::gtfs2gps that creates new points for the stops. In this reprex below, I changed the last stop_id coordinates in order to match the last line of gtfs$shapes. The results are similar, because the stop_sequences coordinates are no longer the same as in gtfs$stop_id.

REPREX

gtfs_file <- system.file("extdata/irl_dub/irl_dub_gtfs.zip", package = "gtfs2emis")

# read GTFS
gtfs <- gtfstools::read_gtfs(gtfs_file)

# Keep Monday services GTFS
gtfs <- gtfstools::filter_by_weekday(gtfs, 
                                     weekday = c('saturday', 'sunday'), 
                                     keep = FALSE)
# filter trip
id <- '6343.2.60-1-b12-1.1.O'
gtfs <- gtfstools::filter_by_trip_id(gtfs, trip_id =  id )
# last stop equal to last shapes (lat,long)
gtfs$stops[.N,stop_lat := gtfs$shapes[.N,shape_pt_lat]]
gtfs$stops[.N,stop_lon := gtfs$shapes[.N,shape_pt_lon]]
# convert to gps
gps <- gtfs2gps(gtfs)
tail(gps)
gps_sf <- gtfs2gps::gps_as_sflinestring(gps)

Screenshot from 2022-07-12 12-08-22

Approaches

I can think of few strategies to solve this problem:

  1. Not replacing the input stop_ids coordinates: by doing this we will no longer have this last NA (if, and only if, tail(gps,1) == tail(gtfs$shapes,1))
  2. Use some sort of tolerance in gtfs2gps: For instance, if the snapped point is within a certain distance (say 5 meters) of the input coordinates, we will not change the input value.

However, I don't know exactly how difficult would be to implement such solutions.

@pedro-andrade-inpe
Copy link
Collaborator

Maybe we could only improve the message. Currently it says that:

paste0(na_values, " 'speed' values are NA for shape_id '", shapeid, "'.")

Possibly we could also say that such values are (i) in the beginning, (ii) in the end, or (iii) in different parts of the shape. We could also remove the message Some 'speed' values are NA in the returned data. as the previous messages are more informative.

@rafapereirabr
Copy link
Member Author

Perhaps this message should only be printed if there are two or more NAs in the output of the shape.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants