Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stream cannot be read. Please send us the PDF file so that we can fix this (issues (at) pdfsharp.net). #200

Open
daharmon opened this issue Oct 31, 2024 · 6 comments

Comments

@daharmon
Copy link

daharmon commented Oct 31, 2024

2024-Compliance-Supplement-V1.pdf

Reporting an Issue Here

Expected Behavior

The document should be read just fine. I can open it in Adobe, Chrome, Edge.

Actual Behavior

I get the error indicated in the title of this post.

Steps to Reproduce the Behavior

Stream coming directly from Azure File Share. I can confirm this code works against all of our other documents.
PdfDocument inputDocument = PdfReader.Open(streamcontent, PdfDocumentOpenMode.Import);

I'm using the latest preview version (6.2.0-preview-1) and have also tried the latest non-preview version. Both throw the same error.

Any ideas?

@martinossendorf
Copy link

We did not get the error you reported, but observed huge performance issues, when reading the file. We aborted the process after two hours. We fixed that performace issues, that slowed down loading of objects from object streams, for the next release of PDFsharp. Loading the file now finishes after eight seconds.
Did you also observe performance problems with that file? If yes, how long did you wait to get this error message?

@daharmon
Copy link
Author

daharmon commented Nov 5, 2024

I did not see performance issues because it immediately failed to load. It's interesting that you could open it. I was able to load it with itext just fine but not pdfsharp. Looping through thousands of pdf documents, this was the only one that would fail and it always failed with the error indicated in the issue title.

@ThomasHoevel
Copy link
Member

I did not see performance issues because it immediately failed to load.

Which version of PDFsharp are you using? GDI? WPF? Core? 6.1.0? 6.1.1? 6.2.0 Preview 1?

@martinossendorf
Copy link

We now tried to reproduce your issue by loading the file from an Azure File Storage. However, we still did not get the error you reported, but an Azure exception when PDFsharp tries to get the stream length:
System.NotSupportedException: 'Specified method is not supported.'
Azure.Core.Pipeline.RetriableStream.RetriableStreamImpl.Length.get() in RetriableStream.cs

We tried it with the "PDFsharp" nuget packages of version 6.1.1 and 6.2.0-preview-1.
To work around this issue, we copied the stream to a new MemoryStream and opened that with PDFsharp. Now we ran into the expected performance issue of PDFsharp with the stated versions and could successfully load the file with our development version.
Please try if you get the reported error when loading the file locally. I would expect the performance issue to occurr instead. Loading the file from Azure, please try to copy the stream to a MemoryStream and use that. We want to improve the error messages, but will surely not support loading all kinds of streams that may have several restrictions. So using a MemoryStream to get a fully supported stream is and will be the way to go in most cases when loading files via streams from the internet.
Please notify if you still get the reported error with one of these approaches.

@daharmon
Copy link
Author

daharmon commented Nov 6, 2024

No worries. It's odd that all of the other files we retrieve come from azure file storage worked fine coming straight from the stream. If it makes any difference, the method we're using to retrieve the stream from azure is ShareFileClient.OpenReadAsync(). I would assume that by copying the stream to a memory stream, we're introducing additional memory overhead, which I'd like to avoid. Either way, I appreciate you looking into this!

@martinossendorf
Copy link

Interesting. Using ShareFileClient.OpenReadAsync() I get the error you described, using ShareFileClient.DownloadAsync() I get the NotSupportedException when accessing Length.
Well, in both cases copying the stream into a MemoryStream worked. So both the streams got from Azure seem to have some kind of limitations. As stated before, we can't guarantee every stream loaded from the internet to be compatible with PDFsharp. So copying it to a MemoryStream or searching for another method to get a compatible stream would be the clean way.
But figuring out which stream is compatible by trying different methods may be not that easy. As you said, it works for you with all of the other files. So, maybe success depends on the file size or the order PDFsharp has to read the objects from the file,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants