Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
[solved] Something overwrote my partition table?
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
tholin
Apprentice
Apprentice


Joined: 04 Oct 2008
Posts: 184

PostPosted: Wed Sep 16, 2020 11:56 am    Post subject: [solved] Something overwrote my partition table? Reply with quote

I've encountered a bit of a mystery with a partition table that was zeroed.

First signs of problem was yesterday when smartd reported a pending sectors on sdb. I have 8 spinning disks so a bad sector here and there isn't unusual. I confirmed the sector was bad with dd if=/dev/sdb of=/dev/null bs=4k skip=684709637 count=1. That block didn't map to anything important so I tried to force a reallocation with dd of=/dev/sdb if=/dev/zero bs=4k skip=684709637 count=1 but that didn't work. The sector still gave IO errors. I also ran ddrescue --idirect --verify-on-error --force /dev/sdb /dev/null /root/ddrescue_mapfile_sdb to check for other bad sectors but there was only the one. I figured I'd have to get a replacement disk and went to bed.

Today when I started the computer the uefi firmware popped up a message about "cmos has been cleared" and asked if I wanted to load optimized defaults. WTF? Is the mobo battery going bad? But I have the power cable connected. As far as I could tell the settings were not cleared so I just rebooted.

After reboot I got a warning that sdb1 was missing so the filesystem on it couldn't be mounted. Fdisk confirmed that sdb had no partition table. Using a hexeditor I could see that the first 17KiB of sdb had been overwritten with zeros. It was a GPT table with only one partition for the entire disk. The backup header at the end of the disk is still there so it should be easy to restore. I can mount the disk by specifying --offset 2048 for the offset to the actual data partition and the data is still there.

I don't believe in these kind of coincidences so what do these 3 problems have in common?

Before you suggest it, no, I did not accidentally overwrite the table with dd. I looked over my bash history and the only thing I did was the commands I pasted above.

A scary possibility is that uefi firmware do integrity checking of GPT tables and will automatically try to repair (actually corrupt) tables. http://forum.asrock.com/forum_posts.asp?TID=10174
That guy had the start of the disk overwritten with zero bytes by the firmware.

Triggering of an auto repair function shouldn't have anything to do with cmos clearing but who knows with buggy uefi code. But if it was the firmware that wiped the table why did it happen now? The bad block was nowhere near the GPT tables.

A sensible person would probably just replace the broken disk and move on but random wiping of partition tables is a serious problem. I would like to know what happened.


Last edited by tholin on Wed Sep 16, 2020 3:32 pm; edited 1 time in total
Back to top
View user's profile Send private message
alamahant
Guru
Guru


Joined: 23 Mar 2019
Posts: 551

PostPosted: Wed Sep 16, 2020 2:50 pm    Post subject: Reply with quote

Hi
you mentioned you run
Code:

dd of=/dev/sdb if=/dev/zero bs=4k skip=684709637 count=1

Would not that have erased your whole disk.
The only way to have deleted the UEFI firmware is if you ran
Code:

rm -rf /sys/firmware/efi/efivars

and the efi vars were mounted rw.

Maybe your dd did that?
Please note that these are kind of dangerous operations and may even brick your machine.
Be especially careful of rm -rf a chroot.
and anything involving dd
I always mount efivars ro in fstab
Code:

efivarfs /sys/firmware/efi/efivars    efivarfs  ro,nosuid,nodev,noexec,relatime 0 0


and only remout it rw if needed.
:D
Back to top
View user's profile Send private message
tholin
Apprentice
Apprentice


Joined: 04 Oct 2008
Posts: 184

PostPosted: Wed Sep 16, 2020 3:31 pm    Post subject: Reply with quote

Good news, I've figured out what is wrong. Bad news, it's my own fault...

alamahant wrote:
Hi
you mentioned you run
Code:

dd of=/dev/sdb if=/dev/zero bs=4k skip=684709637 count=1

Would not that have erased your whole disk.

No, since I set count=1 it only copied a single block and the block size was bs=4k so it only wrote 4k. The skip=684709637 makes it write to the 684709637'th block in the output... eh... fuck!
Skip is for skipping data in the input, not output. What I did was read the 684709637'th block from /dev/zero and write that to the start of /dev/sdb. I should have used seek=684709637 instead. That also explains why the bad block wasn't reallocated.

I can only assume the bogus cleared cmos message was some uefi bug caused by the disappearance of the GPT header. Uefi likely stores a list of the known GPT partitions in efivars and checks them on boot or something.

Mystery solved.

alamahant wrote:

The only way to have deleted the UEFI firmware is if you ran
Code:

rm -rf /sys/firmware/efi/efivars

and the efi vars were mounted rw.

I boot in legacy mode so there are no efivars to fiddle with.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum