Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The action sometimes fails #4

Open
matt-dray opened this issue Oct 6, 2020 · 11 comments
Open

The action sometimes fails #4

matt-dray opened this issue Oct 6, 2020 · 11 comments
Labels
bug Something is broken

Comments

@matt-dray
Copy link
Owner

matt-dray commented Oct 6, 2020

Examples of failure (red crosses) on the Actions tab: https://github.com/matt-dray/londonmapbot/actions

Image gets downloaded, but then:

Error in curl::curl_fetch_memory(url, handle = handle) : 
  Empty reply from server
Calls: <Anonymous> ... request_fetch -> request_fetch.write_memory -> <Anonymous>
Execution halted
Error: Process completed with exit code 1.
@mattkerlogue
Copy link

I find GitHub actions can be a bit hit and miss. Sometimes it just seems to fall over for no clear reason.

@matt-dray
Copy link
Owner Author

Haha, last night it failed four times in a row for this repo! We talked about a possible work application for cron-based actions, but I'm not sure it's a great idea if you can't handle a fail in the workflow.

@mattkerlogue
Copy link

Yeah. We probably should investigate the common cause of failures across our cron based actions.

@mattkerlogue
Copy link

Probably not related if the error above is the only error you're getting, but some of my action fails have been that R hasn't installed properly. However, I've just discovered that R comes pre-installed on the macOS GitHub runner as per this spec. So lines 17-18 of the yaml aren't necessary any more. Which should at least speed up the action a little.

But on your curl issue … suggests its a problem with network connectivity "Empty reply from server" suggests something is going wrong in Twitter server land (or the way {rtweet}/its dependencies are making the request). The main thing is whether it's happening in the OAuth stage or the tweet posting stage - but looking at the workflow outturn you link to above it looks like it's got past the mapbox download and so it's in the tweet stage of the script that it's getting some sort of problem.

As you're running the script via Rscript there's not much useful logging - you might want to either add some messages to the script and/or specify Rscript --verbose {file} to make it "chatty".

@rexarski
Copy link

rexarski commented Oct 9, 2020

I found two posts that might be related to the current issue.

In short, if we use download.file() on https://, there might be some problem. So instead of figuring out what causes it, I just switch to httr::GET() instead.

httr::GET(img_url, httr::write_disk(temp_file, overwrite = TRUE))

Currently testing this build.

Update: No, it doesn't work.

@mattkerlogue
Copy link

However, I've just discovered that R comes pre-installed on the macOS GitHub runner as per this spec. So lines 17-18 of the yaml aren't necessary any more. Which should at least speed up the action a little.

I've just tested this and can confirm R 4.0.2 is available on the macOS-latest runner, log here. However, one caution, you'll need to use the repos argument in install.packages() (or set it via options).

@scott-saunders
Copy link

scott-saunders commented Oct 21, 2020

Hello. @matt-dray thanks for writing your blog post - I found it very helpful. I am experiencing this same issue with an rtweet bot, and I agree the issue is with rtweet or twitter.

In my case post_tweet('some text', media = 'some_pic.png') sometimes works fine, but often runs for minutes and then fails with the same error:

curl::curl_fetch_memory(url, handle = handle) : 
  Empty reply from server

However, without the media post_tweet('some_text') works instantly with no error.

Could Twitter be blocking the posts with media, because they view it as suspicious / automated activity? At one point my bot got blocked by twitter, which gave an explicit error to that effect. The account seems to be unblocked, but I wonder if Twitter still has some restrictions in place?

@matt-dray
Copy link
Owner Author

Thanks @mattkerlogue, @rexarski and @scott-saunders.

Given the near-identical nature of londonmapbot's tweet contents, I figured that any blocking from Twitter's end would be all-or-nothing, but it looks like the action failures are more-or-less random. I also thought londonmapbot might be more susceptible to blocking when I started posting URLs, but haven't had any issues.

I'm not sure of Twitter's algorithm for detecting 'malicious bots', but I guess slight variation in the time taken for the action to run and post (and the fact it fails randomly!) might help to prevent it being flagged. This is worth a read in any case (might be slightly out of date): https://help.twitter.com/en/rules-and-policies/twitter-automation

@mattkerlogue
Copy link

mattkerlogue commented Oct 22, 2020

Having looked back through the Actions logs it seems that the narrowbotR’s workflow, which uses its custom post_geo_tweet() function, has only had one curl error so far, and as with others it got an Empty reply from server response.

I’ve checked the Twitter developer portal and I can’t see that I’ve done anything different in the app setup on their side to make this less suspicious to Twitter’s spam filters (if that is what’s happening).

Given the documentation you linked to @matt-dray these types of bots definitely seem to be in scope.

Provided you comply with all other rules, you may post automated Tweets for entertainment, informational, or novelty purposes.

The mapbots only post two tweets an hour, so definitely we’ll shy of the rate limits set out in Twitter’s documentation on the POST statuses/updates. I’m not sure @scott-saunders if you’ve put anything in your covid bot (awesome work by the way!) that keeps a check on limits.

I still think it could be an issue with the GitHub Actions runners, as I’m not convinced the have perfect connectivity with the rest of the internet. The R install from r-libs and package installs would periodically fall over on my Google scraping repo without reason. The narrowbotR some how failed to install {data.table} recently (wasn’t aware it was a package dependency), which was a surprise since the code calls for it to install packages from the cloud mirror of CRAN.

@scott-saunders
Copy link

Thanks @mattkerlogue ! I haven't done anything to check the limit on tweets per hour, mostly because covid_data_bot isn't seeing that level of traffic, but that's a good point for the future. For now I think I'm also well under the 300 / 3hr limit.

Yeah the github actions connectivity maybe an issue (startup time and package install is sometimes wildly variable for me too), but in my code the bot first uses rtweet to search for tweets and reads in nyt data from github, so it's not so bad that it's causing an error there. I have also experienced the same post_tweet() problem using rtweet locally on my computer, so I don't think connectivity is the main issue.

Here's one thing from the POST documentation:

For each update attempt, the update text is compared with the authenticating user's recent Tweets. Any attempt that would result in duplication will be blocked, resulting in a 403 error. A user cannot submit the same status twice in a row.

covid data bot was having issues posting multiple tweets, because of the weird overlapping timing of github actions. Perhaps twitter could have been blocking the post of duplicate tweets? I have mostly fixed this duplicate posting issue since then, so we'll see if it continues. Could rare duplicate posts explain any of your issues? @mattkerlogue @matt-dray

Let's say twitter api does throw the 403 error, does anyone know what error rtweet / github actions would show?

@mattkerlogue
Copy link

Interesting suggestion @scott-saunders, I've never had issues with post_tweet() from the console.

Nothing in my code (or I think @matt-dray's) attempts to post multiple tweets in quick succession. However, I don't know if the underlying code of {rtweet} tries to post multiple times if it gets a fail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is broken
Projects
None yet
Development

No branches or pull requests

4 participants