When your resources aren’t perfect

dcdotnerd Avatar

This post is a bit of a “what would YOU do?” request for input. Let me begin with the background:

I am, on behalf of my office, working on a project to digitize (*cough*scan*cough*) all of the D.C. Laws from Council periods 1 through 7. These are unofficial copies, and are online in concert with the unofficial D.C. Code that while unofficial is the easiest to use in most circumstances. The scanning has been done by two interns placed with my office through Urban Alliance (I wrote about Urban Alliance before), the first who did a huge amount of work and the second who is going through and picking up where things were missed or scanned from poor copies.

And thus we have the phenomenon that leads me to my question. I now have in some cases two scanned versions of a law, each with its own problem. You know how you can have two of (a) good, (b) fast, and (c) cheap, but not all three? Well, in some cases I can have two of (a) legible margins (that is, from a flat original), (b) bottom lines of text not cut off, or (c) consistent appearance (that is, all of the pages from the same “original” and not combined from two separate scans).

In an ideal world, we would go and search out the original and scan from there. But that’s not happening here. It isn’t an ideal world, we don’t have perfect resources, and these aren’t intended to be archival quality. (There IS a risk that they could be used as “oh, someone’s already scanned these, yay we don’t have to.”)

Here is an example of this situation. What would you do? 1. Use the scan with poor margins. (law 5-129) 2. Use the scan with the bottom of the first page cut off. (L5-129) 3. Use page 1 of the scan with poor margins and the rest of the pages from the other scan. 4. Throw it all together in a single PDF because the scan with the problem on the first page also doesn’t have page numbers.

L5-129 law 5-129

 

Comments

4 responses to “When your resources aren’t perfect”

  1. infinitebuffalo Avatar

    especially if you’re willing to sacrifice consistent appearance for continuous legibility, if you have access to Acrobat Pro or similar, you might be able to interweave the two PDFs, selecting the better copy of each page…

    [this, of course, could get expensive time-wise, but may yet yield the best (i.e., most usable) result…]

    1. dcdotnerd Avatar
      dcdotnerd

      Hi Buffalo, thanks for this suggestion. It’s actually what I’ve been doing in most cases, and my real problem is what to do when a single page has problems in both copies. I’m probably giving this project much too much of my attention!

      1. infinitebuffalo Avatar
        infinitebuffalo

        Are there enough “good enough” pages between the two that you can just have an intern rescan the remainder from source?

        1. dcdotnerd Avatar
          dcdotnerd

          Maybe? Right now I’m leaving the re-scanning for the documents that are missing full pages.

Leave a Reply

Your email address will not be published. Required fields are marked *