Braggtown dot com

A Tangled Web: Archive

Archive for the ‘Work’ Category

 Remember the Alamo

Thursday, January 25th, 2007

This is the last full day of Open Repositories 2007 in San Antonio, Texas. It’s been fun and my presentation went well, I think. Unfortunately, Market Square is being partially renovated, but Mi Tierra, Mexican bar, restaurant, and bakery, is awesome. Incredibly tacky and garish, but great. We’re staying across the street. Almost forgot, we’ve been tagging sites at del.icio.us/tag/or2007/.

James pointed this out to me this morning. I thought it was much funnier than mod_oai.

image

 Post Holiday Updates

Friday, January 5th, 2007

I have some photos to upload from Christmas, but they’re awaiting third-part review. Ive been listening to my new Zen Nano Plus and I’m finally working through the backlog of audio casts I’ve accrued and thought others might find interesting this interview with Bill LeFurgy, program manager at Library of Congress, discussing NDIIPP and digital preservation.

In other news, Raleigh-Durham made Wired Magazine’s list of Top Tech Towns. They even pointed out the strength of the Linux community, which includes Triangle Linux User Group, Duke LUG, and NCSU LUG. Both NC State and Duke have specialized Linux distributions, too. Of course, we also have Redhat and ibiblio.

In other other news, Brandi just celebrated her 1 year anniversary at Duke and I’m days away from my 2nd year at NC State.

 2GHz of Black Magic

Friday, December 22nd, 2006

I got a new Thinkpad T60 this week. It’s a beauty, too. It’s an understated, unassuming, dealer of digital death. With nmap, kismet, and dsniff, it’s a very dangerous weapon. Ubuntu 6.10 runs flawlessly on it, including the integrated fingerprint reader, and it’s light. Among the things that work with little work: automatic/manual frequency scaling of each Core Duo processor core, fingerprint authentication on login and resumption from Gnome Screensaver, sleep/hibernation, wireless promiscuous mode, and direct 3D rendering at 1400×1050 resolution. All of the buttons, such as speaker volume, work out of the box with on-screen display.

I ordered the Intel 950 Integrated Graphics since I won’t do business with ATI and I know the 950 supports both DRI 3D and hibernation. Also, the ATI graphics cards run much hotter, require faster fan speeds which makes the laptop louder, and draw more power. I also chose the Intel 3945ABG wireless NIC after researching the support for it. I did have to compile kismet from source to use the Intel 3945 as a scanning source, but that took minutes. This laptop is known for it’s Linux-compatible hardware and I wasn’t disappointed.

thinkpad image

In other news, I’ve started working on my Cobalt Qube again, but I think I’ll try to install Debian 3.1 before working on a 2.6 kernel. I’m torn- it’s so loud that I think I’ll have to put it in the closet, but it’s too pretty to not be seen. Sigh.

 Paper Posted

Thursday, December 14th, 2006

I finally uploaded the paper from the Society for Imaging Science and Technology conference, Archiving 2006 on which I’m a co-author. Have a look at Preservation of State and Local Government Digital Geospatial Data: The North Carolina Geospatial Data Archiving Project.

Additionally, I was gratified to see some of the data processed for ingest by the script I wrote for Special Collections in one of our DSpace development instances. I also had opportunity to share that DSpace batch ingest, dsrun, doesn’t require item directories names item_001, item_002, etc. Instead, it just processes for ingest directories in the order in which is finds them and only requires that the specified structure be present e.g. contents and dublin_core.xml at the root level of the item. I guess MIT isn’t paying for documentation development.

 Converting tiff Images to PDF

Tuesday, December 12th, 2006

I’ve been working on a project with Special Collections to ingest some of their digital collections into various DSpace institutional repositories. My piece has been to automate the concatenation of images into multi-image PDF’s, to move them into logical items, and to create the Dublin Core metadata for each item. The metadata was in two text files- one for describing the contents of each pdf and one for creating the metadata, and a directory of ~4,500 tiff images

I finally had all I could take today with Imagemagick’s convert application. The documentation wasn’t clear and there didn’t seem to be a way to reduce the resulting PDF file size or quality. It took hours, too. Today I switched the script to tiff2pdf from libtiff which is included in the libtiff-tools Debian package.

For background, the process is running on Linux within a VMware Server virtual machine on Windows 2000. The linux virtual machine mounts the remote file server on which the images reside via Samba. Every read and write has to traverse three network file sharing connections. Still, with tiff2pdf it’s fairly fast. The images are about 4.5MB each with generally about 4 images to a PDF and the script processes each image/pdf conversion in about 30 seconds. Pretty good performance considering it was taking hours for each process with Imagemagick’s convert.

I’ve uploaded the new script to the projects page under Catalog Processor - DSpace ingest workflow

 NDIIPP Developments

Tuesday, December 5th, 2006

Last week a representative from Lockheed Martin visited to discuss collaboration on geospatial repository ingest processing for a preservation system LM is working on in parallel to Electronic Records Archive for the National Archives and Records Administration. NARA, the branch of the US Government responsible for preserving the records of government, awarded a $308 million dollar contract for a digital preservation system to Lockheed Martin who apparently want to develop a more robust system than called for in the ERA specification. They visited us to discuss the possible inclusion of my workflow processes into their repository framework.

January is shaping up to be busy for me. I have a Digital Preservation Partnership meeting at the San Diego Supercomputer Center. From there I’ll spend a couple of days in Ft. Worth, TX, with friends, then fly to San Antonio, TX, to present at Open Repositories 2007.

Speaking of repositories, if you, gentle reader, are interested in the position we’re offering, Data Repository Architect, I’d be glad to field questions about it. It’s my understanding that this position will work with me on our NDIIPP, North Carolina Geospatial Data Archiving Project and in possible collaboration on a project with Renaissance Computing Institute, a joint supercomputing venture between NC State, Duke, and UNC- Chapel Hill.

Speaking of that, I believe I’ll have a new title soon- Data Repository Librarian. It’s a much more accurate description of what I do so I’m pleased. I’m also pleased that DLI has moved back into it’s normal space after HVAC work. I’ve upgraded spaces and now have a wall of windows. Click on the picture below to see my cube.

Jim's cube

 Recursion and Recursion and …

Monday, October 30th, 2006

While working on some of the NDIIPP pre-ingest workflow scripting this morning, which I’ve been out of for awhile, I acheieved a minor victory. I wrote a recursive function that was short, sweet, and useful. The function creates a directory and all necessary parent directories. See below:

def checkDestinationExists(d):
   if not isdir(d):
      l,r = split(d)
      checkDestinationExists(l)
      mkdir(d)

Beautiful, right? Well, the problem is that shortly after patting myself of the back for writing something clever, I discovered that (of course) this function already exists in python. The distutils.dir_util module contains mkpath, which does exaclty what I wrote. At least I can finally look Jason Zych in the eye. (departmental treasure!) I can finally live down the java recursion machine problem in CS125.

 File Not Found

Monday, August 21st, 2006

A recent article in the technology section of Atlantic Monthly referred to the National Digital Information Infrastructure and Preservation Program, the Library of Congress program under which my project, the North Carolina Geospatial Data Archiving Project, operates. It’s nice to see that NDIIPP and the digital preservation problem is garnering the attention of broader audiences. To read the article, File Not Found, see the September 2006 issue on-line or see my cached copy.

We’re still awaiting responses on the proposals we’ve made to Library of Congress and the National Science Foundation for continued funding for our current research project and additional partnerships leveraging our extensive work on the preservation of digital spatial data.

 DPP Meeting at LC

Saturday, July 22nd, 2006

This week I was in Washington, DC, at the Digital Preservation Partnership meeting. Highlights include: free beer at the Library of Congress reception in the Thomas Jefferson Building, skinning my knee at same, seeing the sites at dawn, and actually getting some details about what others are doing. Oh, and a Senator addressed us and congratulated us on … well, I guess I don’t know. Something, I guess.

LC Great Hall

See the photos.

 JCDL 2006

Wednesday, June 14th, 2006

Today was the final day of the JCDL 2006 conference. The Joint Conference on Digital Libraries is an international digital library conference and is one of the premiere dl conferences in the world I’m told. I presented on our NDIIPP project, the North Carolina Geospatial Data Archiving Project.

I had a friend take a couple of pictures of the panel session and they will be available soon. My slides are already available.