Failure: Page already exists #1067

GruenSein · 2024-12-18T07:32:46Z

Hi all,

as my usage of this nice tool intensifies, I have stumbled across another issue that I cannot seem to fix. When I push my documentation to confluence, I receive the following error:

atlassian DATA:
unexpected rest response detected; retrying in 5 seconds...

sphinxcontrib.confluencebuilder error:

---
Unsupported Confluence API call

An unsupported Confluence API call has been made. See the following
details for more information:

REQ: POST
RSP: 400
URL: https://voith.atlassian.net/wiki/
API: rest/api/content
DATA: {
  "statusCode": 400,
  "data": {
    "authorized": true,
    "valid": true,
    "errors": [],
    "successful": true
  },
  "message": "com.atlassian.confluence.api.service.exceptions.BadRequestException: A page with this title already exists: A page already exists with the same TITLE in this space"
}
---

While the response is relatively clear, it unfortunately does not tell me which page resulted in this error. Additionally, I am a bit confused how this can be because:

I wiped the space before trying to publish when I first encountered this issue
I was under the impression that existing pages were supposed to be overwritten
The issue seems to occur with the same page every time but the error message does not tell me which page it is
As I used the API myself before switching to confluence builder, I tried to take care that all pages have individual names

Is there some way to further investigate? Also, being able to skip pages with errors and continue would already help a lot. I have almost 10k pages about configuration files and the one failing is around number 3k leading to a situation in which 7k configs remain undocumented. Is this possible?

Update:

I have actually spent the time watching the publishing process to figure out which page is to blame. The page in question is also 800kb in size and therefore the biggest one I have. Is it possible that this is a problem? It is always this page that seems to cause the issue. It is interesting that there is a retry before and then it states that the page already exists. Is it possible that the first attempt actually succeeds and then the retry fails?

Update 2:

After increasing the time out to a minute, my documentation is correctly being published again. This leads me to believe that there is some issue with the retry being initiated even though the primary request ends up succeeding. The other issue is that this completely cancels the publishing process as a whole

The text was updated successfully, but these errors were encountered:

jdknight · 2025-01-09T07:05:07Z

Is there some way to further investigate?

Enable Sphinx to run with a higher verbosity may provide more insights.

Is it possible that the first attempt actually succeeds and then the retry fails?

This very well may be the case. Confluence sometimes reports 500 errors which request from this extension may perform a generic retry or some special retry events. There is a use case where a title conflict should fallback to a page update. My random guess (from a fresh state) maybe that some page (maybe the large one you mentioned) is being published, accepted by Confluence but then the instance reports a 500-like error. An automatic retry is made:

unexpected rest response detected; retrying in 5 seconds...

Which will repeat the same request to create a "new" page instead of updating and Confluence reports the title already existing. The extension then retries for find the title that should exist to perform a page update, but maybe it fails to find it? This will in turn re-raise the original exception that a duplicate title exists.

Being able to skip pages with errors and continue would already help a lot... Is this possible?

There does not exist such a capability at this time. We can try to introduce something like this, but it might get a bit complex (e.g. if a page fails to publish, how to gracefully handle not publishing nested pages and attachment that could later fail as well).

The extension is designed in a way where Sphinx can be re-run and only newer pages are updated. With respect to Confluence, there should be a hash check made on the page's metadata to verify if a new publish is warranted. However, this does not apply if you plan to do a fresh state each time.

If the issue only occurs on one or a subset of "large" documents in the entire set, a workaround could be to run Sphinx twice -- once to publish most of the documents as normal and then re-run with only publishing the large one with a larger delay. Granted, not ideal.

The page in question is also 800kb in size and therefore the biggest one I have. Is it possible that this is a problem?

In theory, page sizes are not a concern for the extension itself. It is just a matter of how Confluence handles it. This extension tries to handle various undocumented and unexpected states reported by Confluence instances, but the process of handling all the corner cases is less then ideal.

If you can reproduce the error, running with confluence_publish_debug = 'headers' might give more information of where the retry occurs and what requests were made before. This could help us to test and improve a workaround for such a scenario.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failure: Page already exists #1067

Failure: Page already exists #1067

GruenSein commented Dec 18, 2024 •

edited

Loading

jdknight commented Jan 9, 2025

Failure: Page already exists #1067

Failure: Page already exists #1067

Comments

GruenSein commented Dec 18, 2024 • edited Loading

jdknight commented Jan 9, 2025

GruenSein commented Dec 18, 2024 •

edited

Loading