Braggtown dot com

A Tangled Web

Data Transfer

My next objective at work is to transfer some data to Library of Congress. I have the option of pushing data up the Abilene Network and will probably give network transfer a try, but the bulk of the data will move in two 1TB USB drives via Fed Ex. I’ll load them up and ship them to LC where they unload them and ship them back. Back and forth until they get everything.

I have an 11 page document specifying the organization, naming conventions, etc,. The most important point is that I need to create a UTF-8 text file at top-level directory containing the path and checksum (MD5 or SHA-1) for each file.

So, I have 5 1TB ZFS slices sitting in a storage array in our server room. Here’s the df -h:

Filesystem Size Used Avail Use% Mounted on
/storage/ndiipp1 977G 975G 1.9G 100% /storage/ndiipp1
/storage/ndiipp2 977G 972G 5.2G 100% /storage/ndiipp2
/storage/ndiipp3 977G 804G 174G 83% /storage/ndiipp3
/storage/ndiipp4 977G 826G 152G 85% /storage/ndiipp4
/storage/ndiipp5 977G 820G 158G 84% /storage/ndiipp5

There are a couple of issues I need to think through. Hopefully I have enough free space after formatting (haven’t decided on a file system to use on USB drives) to perform a 1:1 copy from partition to USB drive. Of course I still need room for my UTF-8 manifest file. It seems like I’ll have space.

We’ve got 4GFC Firbre Channel switches in our storage area network, but I’ve only got a 100BASE-T LAN connection to my workstation. I’m very curious to find out how long it will take to both transfer the data from the SAN to the USB drive (probably using tar over SSH) and how long it will take to checksum the up to 85,000 file in each partition. I’m sure I’ll be glad I kept my old Xeon workstation to chew data. I think I’ll look around for a utility to grab some network statistics like collisions and resent packets.

Luckily, this may be network, processor, and time intensive, but it’s pretty automation friendly. That’ll give me some time to figure out why Fedora doesn’t seem to want to deploy properly in Sun Java App Server. Then I can start mapping data models between the DSpace and Fedora repositories.

Leave a Reply

Spam protection by WP Captcha-Free