-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I have got dual ST8000NM000A-2KE101 - they have 0 bad sectors and errors but Raid 1 keep getting degraded - Intel® Optane™ Memory #123
Comments
Furken,
I looked at the pics, saw that all the smart looks good and is showing no errors. I did see that the drive was running at 6gps, does the controller do 12 gps? The other types of errors can be "end to end crc error" or phy layer errors. Maybe look for these types of errors. Also try a new data cable to the drive, make sure it's not close to something that can add noise into the cable.
Try looking at seachest_smart for pulling the drive statistics log. Should be more information in that log. Also I think smartctl has that option also.
SeaChest_SMART -d /dev/sg<#> --deviceStatistics
Maybe also run DST on the drive to make sure the drive is healthy.
SeaChest_SMART -d /dev/sg<#> --shortDST --captive
SeaChest_SMART -d /dev/sg<#> --showDSTLog
also look in the comp error log and see if you see any errors
SeaChest_SMART -d /dev/sg<#> --showSMARTErrorLog comprehensive
SeaChest_SMART -d /dev/sg<#> --showSMARTErrorLog summary --smartErrorLogFormat raw
I think that's all of my idea's that I can think of today.
Tim Gilmer
Staff Engineer
Field Diags
Office: (720)-684-2624
Seagate Technology
[cid:5502dce5-cfd7-4db1-8b1d-130c980088ef]
Seagate Internal
…________________________________
From: Furkan Gözükara ***@***.***>
Sent: Monday, September 25, 2023 6:07 PM
To: Seagate/openSeaChest ***@***.***>
Cc: Subscribed ***@***.***>
Subject: [Seagate/openSeaChest] I have got dual ST8000NM000A-2KE101 - they have 0 bad sectors and errors but Raid 1 keep getting degraded - Intel® Optane™ Memory (Issue #123)
This message has originated from an External Source. Please use proper judgment and caution when opening attachments, clicking links, or responding to this email.
I can't solve this error
The disks are genuine. I checked the QR code on them
The Raid 1 keep getting degraded. Then I reset disks to non-raid do a full check with chkdsk /f /r /x
0 Errors found
Then I do byte comparison of each file in each disk and they are exactly identical
But Raid 1 is kept getting degraded
How can I debug this issue?
Here my disks and drivers
I am using Windows 10
[image]<https://user-images.githubusercontent.com/19240467/270498861-69eaf9bc-644b-44ec-83ff-7949780cc3f3.png>
[image]<https://user-images.githubusercontent.com/19240467/270498610-1b41c8ba-da47-4ff1-9456-dcfec77ffda9.png>
[image]<https://user-images.githubusercontent.com/19240467/270498637-e3b09929-444c-4211-bb9d-1e682549f88f.png>
[image]<https://user-images.githubusercontent.com/19240467/270498667-d2c08980-8532-4687-8b46-cab5545c8417.png>
[image]<https://user-images.githubusercontent.com/19240467/270498698-57268a7d-62e6-466f-b730-a8710088934d.png>
[image]<https://user-images.githubusercontent.com/19240467/270498734-9f7c2667-2225-47fb-a820-cb7c94a3c199.png>
[image]<https://user-images.githubusercontent.com/19240467/270498800-ee8d0389-9339-44f4-b697-5c96c177b850.png>
—
Reply to this email directly, view it on GitHub<#123>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AIA3L2GWWRZLM3IA6QWOMFLX4IMD7ANCNFSM6AAAAAA5G3JQIQ>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
Thanks for answers @Swiss3003 here some results. any ideas? |
So, if DST passes then the drives are fine. The only thing I see is a very high number of resets on the interface side. Also looks like you are seeing CRC errors from the interface. I would still change out the data cable and start looking and the controller card for any issues.
[cid:ea114452-e664-48c4-9ad6-e31ea450b206]
[cid:da131d63-2116-4025-8973-8151e31d19af]
Tim Gilmer
Staff Engineer
Field Diags
Office: (720)-684-2624
Seagate Technology
[cid:f2aaffb2-d21b-4aaf-a70d-7c6fcfa13a89]
Seagate Internal
…________________________________
From: Furkan Gözükara ***@***.***>
Sent: Tuesday, September 26, 2023 9:41 AM
To: Seagate/openSeaChest ***@***.***>
Cc: Tim Gilmer ***@***.***>; Mention ***@***.***>
Subject: Re: [Seagate/openSeaChest] I have got dual ST8000NM000A-2KE101 - they have 0 bad sectors and errors but Raid 1 keep getting degraded - Intel® Optane™ Memory (Issue #123)
This message has originated from an External Source. Please use proper judgment and caution when opening attachments, clicking links, or responding to this email.
Thanks for answers @Swiss3003<https://github.com/Swiss3003>
here some results. any ideas?
[image]<https://user-images.githubusercontent.com/19240467/270714230-b0a69811-22e3-4a32-9dbd-fb7b1b055f80.png>
[image]<https://user-images.githubusercontent.com/19240467/270714318-761fd2f4-884e-416e-b228-6bb35d19d4c8.png>
[image]<https://user-images.githubusercontent.com/19240467/270714476-4b48a3ed-624c-4fbb-a8ed-470dda334e94.png>
[image]<https://user-images.githubusercontent.com/19240467/270714527-6ba2e383-5ac3-4bd4-9225-6bd17076bdfe.png>
[image]<https://user-images.githubusercontent.com/19240467/270714741-e351dc56-a439-423c-bdb4-f4fc27de6118.png>
[image]<https://user-images.githubusercontent.com/19240467/270714796-5224feff-8ce1-4f27-82d0-48157c02ff54.png>
—
Reply to this email directly, view it on GitHub<#123 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AIA3L2AFEJHLUJ3DVLIC6KTX4LZUPANCNFSM6AAAAAA5G3JQIQ>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
@Swiss3003 thanks I changed both ports and cables (brand new). testing again right now Only DST left what are CRC errors meaning? Also which controller card you would suggest me as raid 1? |
Sounds good. dst shouldn't take log. then pull the log and that will tell you if you have a bad drive or not.
CRC - a good way to think of it is a checksum created by the host. Then the data is pushed to the device ( through the card - through the data cable ( there is more to the stack then just those two)) The device gets the data and creates it's own checksum then the two are compared to make sure the checksums are the same. if not the same then you get a CRC error. ( that is the simple version) The CRC check was created a long time ago for IDE for the noise issues that were seen in the data cable, and also the chipsets of cheap controllers. As the single got faster and sharper the more the CRC check became more important. Also CRC does more today than it used to. It has error correction for the data, helps with the phy singles and words, and helps with the boots of drives for the OS, just to name a few.
Cards - hmmm. I don't see a lot of different ones anymore. Seems like I always have a LSI or Broadcom around to test with. I do have a few Broadcom 95** meagaraid cards that seem to work nice. Hope that helps
Tim Gilmer
Staff Engineer
Field Diags
Office: (720)-684-2624
Seagate Technology
[cid:c6899a5b-6724-4b2b-9e6c-9b53178f706a]
Seagate Internal
…________________________________
From: Furkan Gözükara ***@***.***>
Sent: Wednesday, September 27, 2023 9:13 AM
To: Seagate/openSeaChest ***@***.***>
Cc: Tim Gilmer ***@***.***>; Mention ***@***.***>
Subject: Re: [Seagate/openSeaChest] I have got dual ST8000NM000A-2KE101 - they have 0 bad sectors and errors but Raid 1 keep getting degraded - Intel® Optane™ Memory (Issue #123)
This message has originated from an External Source. Please use proper judgment and caution when opening attachments, clicking links, or responding to this email.
@Swiss3003<https://github.com/Swiss3003> thanks
I changed both ports and cables (brand new). testing again right now
Only DST left
what are CRC errors meaning?
Also which controller card you would suggest me as raid 1?
—
Reply to this email directly, view it on GitHub<#123 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AIA3L2AINHGZVNF77GIUXG3X4Q7CPANCNFSM6AAAAAA5G3JQIQ>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
@Swiss3003 thank you so much for answers I did the DST tests and no errors shown. So what do you think? By the way currently I am on new cable and new port. Haven't tested Raid 1 yet here below disk 1 disk 2 |
Like the new cable and new port. Give it a try the drive passed.
Watch the resets and the crc errors.
Tim Gilmer
Staff Engineer
Field Diags
Office: (720)-684-2624
Seagate Technology
[cid:52bc2732-1af9-4c7c-a70c-63f0f4f30ea1]
Seagate Internal
…________________________________
From: Furkan Gözükara ***@***.***>
Sent: Wednesday, September 27, 2023 3:20 PM
To: Seagate/openSeaChest ***@***.***>
Cc: Tim Gilmer ***@***.***>; Mention ***@***.***>
Subject: Re: [Seagate/openSeaChest] I have got dual ST8000NM000A-2KE101 - they have 0 bad sectors and errors but Raid 1 keep getting degraded - Intel® Optane™ Memory (Issue #123)
This message has originated from an External Source. Please use proper judgment and caution when opening attachments, clicking links, or responding to this email.
@Swiss3003<https://github.com/Swiss3003> thank you so much for answers
I did the DST tests and no errors shown. So what do you think? By the way currently I am on new cable and new port. Haven't tested Raid 1 yet
here below
disk 1
[image]<https://user-images.githubusercontent.com/19240467/271118137-e7faa9b7-c490-4d4b-8a2d-cd3969dd7e3b.png>
disk 2
[image]<https://user-images.githubusercontent.com/19240467/271118628-76f0c7e3-981c-4145-ac1f-250a05620825.png>
—
Reply to this email directly, view it on GitHub<#123 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AIA3L2HXJGPLKMHW5QQZ2KDX4SKABANCNFSM6AAAAAA5G3JQIQ>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Thank you @Swiss3003 but the disks are still not working properly. Sometimes they just freeze too long I plan to purchase this raid card and test it what do you think? Marvell® 88SE9230 look at this image not normal at all. it just 100% active no read or write and here my another sata 3 reguler hdd drive the other hdd immediately starts read write as supposed to be |
Hello again Any ideas how to debug this issue? |
Well, if you have a slow running drive. You have all kinds of options. These are some of the top ones
1.
new data cable *
2.
new card *
3.
Defrag the drive
4.
Make sure you don't have a virus
5.
check and repair the partition
6.
fix the format.
So, the last 4 there are a lot of tools to perform these tasks. But sometime on older / used drives you might what to low level the drive to clean it up. You can do this through an erase of the drive (ATA security erase) . It writes a pattern to every LBA on the drive. Cleaning it up. After that give the drive time to perform "background "tasks. Then format the drive again and write the data to the drive again.
Now I'm still thinking you are seeing something in the interface on U: I beat the drive has dropped down the sata300 again. Check the CRC error and see if that number has increased. I also think that the command timeout count has increased. Check those numbers. I think you saw the drop in the Phy speed in your trace. IF you replace the drive with a different drive and the problem goes away. Try replacing the drive.
Now drive D: I really don't see anything standing out on the drive. But it's running slow. No CRC, No big number timeouts. Almost no ecc errors. It seems like it's still running at SATA600 for the smart pic you sent in. If it's busy all the time, then it might need to perform some background tasks. You can put the drive into idle for a few hours and see if it helps. Otherwise, I would low level the drive (ATA security erase) . If it's still slow no matter what. Try replacing the drive.
The last thing to look into would be vibration within the system. A fan or the power system is sometimes the issue for vibration. I didn't see anything in the smart that would point to vibration. That's way I suggested looking at the data cable for noise, because of the CRC errors and U: was running at SATA300 vs SATA600. Both drives did pass DST, so I really don't think it's a vibration issue and I don't think the drives are bad. I think your clues are with the U: drive. If you did replace the cable and the card, it could be the drive. The best thing I could offer is pulling of logs and do a first level fa on the drive to see. Is that something you would want to try?
Tim Gilmer
Staff Engineer
Field Diags
Office: (720)-684-2624
Seagate Technology
[cid:d08e9930-97a9-431f-a5ff-ed108b4dc933]
Seagate Internal
…________________________________
From: Furkan Gözükara ***@***.***>
Sent: Friday, September 29, 2023 7:11 PM
To: Seagate/openSeaChest ***@***.***>
Cc: Tim Gilmer ***@***.***>; Mention ***@***.***>
Subject: Re: [Seagate/openSeaChest] I have got dual ST8000NM000A-2KE101 - they have 0 bad sectors and errors but Raid 1 keep getting degraded - Intel® Optane™ Memory (Issue #123)
This message has originated from an External Source. Please use proper judgment and caution when opening attachments, clicking links, or responding to this email.
Hello again
Any ideas how to debug this issue?
—
Reply to this email directly, view it on GitHub<#123 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AIA3L2CEH3Q3FK5SQVVV7CLX45WTJANCNFSM6AAAAAA5G3JQIQ>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
@FurkanGozukara , |
@DebabrataSTX thank you so much currently disks are non-raid but speed is terrible for some reason and i can't find. normally I use intel raid manager to make my raid 1 I changed cables as @Swiss3003 said I have another HDD in same area in their previous slot working perfectly fine. this is also eliminating many other listings @Swiss3003 made interestingly i did another test today and results are different than before still very slow compared to what it should have been but didn't wait forever (10s of minutes) to start reading writing like before I made this change recently disk K is ST4000NM0024-1HT178 |
Furkan,
As long as you are not seeing those hard resets and CRC error to tick up still. The only things I can come up with for slow drives is to erase the drive and let the drive sit with only power for two days and then try it again. You could let the drive sit for two days with no data cable it in and see if the drive runs faster.
See the drives have been doing error recovery for some time. We know that from the CRC errors, hard resets, timeout counts from the smart data. Also, the drives have spent little time in "Idle mode". Idle mode is when a drive try's to do self-repair and DOS. These are call background tasks. These tasks take time and with the error recovery that the drive was doing, could be backed up and behind in the counts. So, If you let the drive go into idle for a day or so it could finish all the background tasks and error recoveries. This would help in the speed of coming out of idle and could speed up the drive and keep the phy running at sata600 and not dropping down to sata300. Like I said "could" help.
The erase would also help this. An erase on the drive would self-clean all the LBA's and set the drive back to factory and clean all the glist and plist on the drive. returning the drive to a health state. Clearing all the cache on the drive. But the SMART logs showed you had no glist and plist and DST passed. So the only thing that the erase would do really is clear all the background tasks and reallocate any back sectors on the drive.
The only other thing that you can do is look at the setting on the drive. Just do the -i option in the tools and it will print out all the features on the drive.
To me the key is the OS is dropping the drive down to sata300. The OS is seeing errors, and it is slowing the phy down to keep the signals clean. Therefore, not seeing the errors anymore. Otherwise it would have slowed it down even more to SATA150.
So, you might want to talk to the call center for more help. I'm sure they know more than me. Good luck. Slow drive are hard to figure out.
Seagate Technology
[cid:5b181e4a-920f-4b3f-b137-fd5eedc347e3]
Seagate Internal
…________________________________
From: Furkan Gözükara ***@***.***>
Sent: Tuesday, October 3, 2023 3:33 AM
To: Seagate/openSeaChest ***@***.***>
Cc: Tim Gilmer ***@***.***>; Mention ***@***.***>
Subject: Re: [Seagate/openSeaChest] I have got dual ST8000NM000A-2KE101 - they have 0 bad sectors and errors but Raid 1 keep getting degraded - Intel® Optane™ Memory (Issue #123)
This message has originated from an External Source. Please use proper judgment and caution when opening attachments, clicking links, or responding to this email.
@DebabrataSTX<https://github.com/DebabrataSTX> thank you so much
currently disks are non-raid but speed is terrible for some reason and i can't find
I changed cables as @Swiss3003<https://github.com/Swiss3003> said
I have another HDD in same area in their previous slot working perfectly fine. this is also eliminating many other listings @Swiss3003<https://github.com/Swiss3003> made
interestingly i did another test today and results are different than before
still very slow compared to what it should have been but didn't wait forever (10s of minutes) to start reading writing like before
I made this change recently
[image]<https://user-images.githubusercontent.com/19240467/272215195-98c13551-fb88-48a7-9f4b-85dbd8fab8d4.png>
disk K is ST4000NM0024-1HT178
disk U and D are ST8000NM000A-2KE101
[image]<https://user-images.githubusercontent.com/19240467/272215660-749834b6-0aaf-484a-8f5b-458020973f4a.png>
—
Reply to this email directly, view it on GitHub<#123 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AIA3L2EJAXDLKNXHUCIFOALX5PLXBAVCNFSM6AAAAAA5G3JQISVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONBUGU4TCOJXG4>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
@Swiss3003 thank you so much for reply They have been idle all the time since i opened this thread But I didn't remove cable I just did a test and still slow but at least directly starts working Also intel raid manager is displaying 6 GBs . so i think sata 3? Also what technique do you suggest to erase disks? self-clean the disks are so new but i want to give it a try |
@FurkanGozukara , |
i have it what command? |
Can you summarize exactly what operation/functions you want to do with the drive. That way it will be easy for me to send the right command(s). |
I want to do what @Swiss3003 said
|
Try |
I can't solve this error
The disks are genuine. I checked the QR code on them
The Raid 1 keep getting degraded. Then I reset disks to non-raid do a full check with chkdsk /f /r /x
0 Errors found
Then I do byte comparison of each file in each disk and they are exactly identical
But Raid 1 is kept getting degraded
How can I debug this issue?
Here my disks and drivers
I am using Windows 10
The text was updated successfully, but these errors were encountered: