What witchcraft is this? System transfer problem
Posted: 26 Oct 2015, 04:22
I'm in deep trouble. It's quite probable you're going to facepalm a few times when reading this, but please refrain from caustic comments - I'm already quite exhausted after spending my entire Sunday (which I had other plans for) on this issue.
TL;DR: I do not have working system, and B3 seems to be doing weird stuff to partition table of drives installed internally.
I wanted to move to a new harddrive - 3TB HGST proved to be waaay too loud, so I ordered 2TB WD Red for the B3. Also: I am running Arch, installed using Sakaki's scripts.
I had a pendrive with Arch and Sakaki's scripts lying around since I first moved to Arch, so it's probably v1.0 or 1.1 or sth around that. I checked the script - as you know, it creates partition table, copies system files, kernel etc., and then you're good to go.
I connected the WD Red over USB enclosure to be able to transfer the ~1.5 TB of data while server was still online - just stopped Samba and AFP, but otherwise transferring the system seemed simple - this seems the same thing that Sakaki's script was doing.
Thing to note: my /home is a separate partition.
The data got transferred over during Saturday and Saturday night, and by Sunday I was ready to move. (now, this might or might not be important: I did pacman -Syu prior to copy, and did not reboot system afterwards, I just forgot to do it).
I installed WD Red in B3, reloaded... only to be met with purple blinkenlight. Weird. So I started system from aforementioned pendrive - imagine my surprise when I saw that the partition table is missing! (or rather: there was something there, and drive claimed to have msdos partition table - I used GPT - but at that time, it didn't seem important to write it down. Now I know this is the first occurence of witchcraft). Well, the partition table might be missing, but I just spent over 20 hours copying stuff, and the data is there - so I created new partition table, recreated all of the partitions, and lo and behold - I could access all the data again, no problem. So I rebooted again.
B3 went up, network connectivity went up, all is nice and dandy... hey, why does it not accept my SSH password?
Another reboot with pendrive - weird stuff. Journald shows "failed password" for every login attempt. I even moved the password to pendrive /etc/shadow to test it out, and it worked fine. So, weird stuff. I enable root login and rebooted. Root password did not work either.
So I connected HGST via USB enclosure to md5 the data for corruption. Imagine my surprise when HGST did not have partition table as well! (witchcraft part II) Well, I dealt with this using first drive, so I tried doing the same: recreate partition table, create partitions... only this time it did not work. Maybe I forgot how I set them up in the first place; maybe there's something else in place. So, the data is there, so I'll probably be able to recover it, but for now it was no-go.
Now comes a few hours that are a blur, and it's hard for me to remember what I did exactly. Somewhere near the beginning, without apparent reason, network on B3 ceased to start. Neither LAN nor WAN came up. So, any further troubleshooting was: change something, try to boot, wait, reconnect cables, reboot with pendrive, change again and so on.
At some stage, I noticed that networking seems not to go up, because script that gives eth0 and eth1 their MAC addresses is failing; it's failing because /dev/mtd0 was missing. Also, some other devices were missing. That seemed to be some kind of trace - I tried copying devices from pendrive (as in Sakaki's script) - no go; then I tried using old kernels from pacman's cache (using depmod afterwards in chroot) - still no go. I tested quite a number of them.
I can't even give you a log now, since despite copying fragments on pendrive, I left the pendrive at home (I'm writing from the office now).
There's clearly something I failed to do, but I don't know what it is.
And, above all: can someone explain what B3 is doing with partition tables? (I noticed it also about half a year ago when disposing of old HDD, but it didn't seem important back then - I just had to reformat drive anyway).
TL;DR: I do not have working system, and B3 seems to be doing weird stuff to partition table of drives installed internally.
I wanted to move to a new harddrive - 3TB HGST proved to be waaay too loud, so I ordered 2TB WD Red for the B3. Also: I am running Arch, installed using Sakaki's scripts.
I had a pendrive with Arch and Sakaki's scripts lying around since I first moved to Arch, so it's probably v1.0 or 1.1 or sth around that. I checked the script - as you know, it creates partition table, copies system files, kernel etc., and then you're good to go.
I connected the WD Red over USB enclosure to be able to transfer the ~1.5 TB of data while server was still online - just stopped Samba and AFP, but otherwise transferring the system seemed simple - this seems the same thing that Sakaki's script was doing.
Thing to note: my /home is a separate partition.
The data got transferred over during Saturday and Saturday night, and by Sunday I was ready to move. (now, this might or might not be important: I did pacman -Syu prior to copy, and did not reboot system afterwards, I just forgot to do it).
I installed WD Red in B3, reloaded... only to be met with purple blinkenlight. Weird. So I started system from aforementioned pendrive - imagine my surprise when I saw that the partition table is missing! (or rather: there was something there, and drive claimed to have msdos partition table - I used GPT - but at that time, it didn't seem important to write it down. Now I know this is the first occurence of witchcraft). Well, the partition table might be missing, but I just spent over 20 hours copying stuff, and the data is there - so I created new partition table, recreated all of the partitions, and lo and behold - I could access all the data again, no problem. So I rebooted again.
B3 went up, network connectivity went up, all is nice and dandy... hey, why does it not accept my SSH password?
Another reboot with pendrive - weird stuff. Journald shows "failed password" for every login attempt. I even moved the password to pendrive /etc/shadow to test it out, and it worked fine. So, weird stuff. I enable root login and rebooted. Root password did not work either.
So I connected HGST via USB enclosure to md5 the data for corruption. Imagine my surprise when HGST did not have partition table as well! (witchcraft part II) Well, I dealt with this using first drive, so I tried doing the same: recreate partition table, create partitions... only this time it did not work. Maybe I forgot how I set them up in the first place; maybe there's something else in place. So, the data is there, so I'll probably be able to recover it, but for now it was no-go.
Now comes a few hours that are a blur, and it's hard for me to remember what I did exactly. Somewhere near the beginning, without apparent reason, network on B3 ceased to start. Neither LAN nor WAN came up. So, any further troubleshooting was: change something, try to boot, wait, reconnect cables, reboot with pendrive, change again and so on.
At some stage, I noticed that networking seems not to go up, because script that gives eth0 and eth1 their MAC addresses is failing; it's failing because /dev/mtd0 was missing. Also, some other devices were missing. That seemed to be some kind of trace - I tried copying devices from pendrive (as in Sakaki's script) - no go; then I tried using old kernels from pacman's cache (using depmod afterwards in chroot) - still no go. I tested quite a number of them.
I can't even give you a log now, since despite copying fragments on pendrive, I left the pendrive at home (I'm writing from the office now).
There's clearly something I failed to do, but I don't know what it is.
And, above all: can someone explain what B3 is doing with partition tables? (I noticed it also about half a year ago when disposing of old HDD, but it didn't seem important back then - I just had to reformat drive anyway).