[plug] Catalogue Database (DVD, etc. titles?)

Mon Apr 10 22:48:16 WST 2006

Eric S <ews.mail at gmail.com> writes:
>Bernd Felsche wrote:
>> Eric S <ews.mail at gmail.com> writes:
>>>Bernd Felsche wrote:

>>>> I'd like to create a database of DVD titles and reference data;
>>>> mainly so that I don't buy the same ones over and over again.

>>>> But I don't know where I'd be able to get a database of EAN-13
>>>> barcode to title, etc. Yes; I have a barcode scanner.

>>>> Any ideas?

>>>http://www.upcdatabase.com/

>> That's great for UPC-A barcodes. (Un)Fortunately we are in Australia
>> where EAN-13 is the standard GTIN system for retail goods. Although
>> some goods may be sold by UPC-A code, the internal representation
>> under EAN-13 typically has a left-padded zero to "convert" the UPC
>> to EAN-13 in Australian systems.

>> upcdatabase.com will not accept EAN-13.

>> The bulk of my DVD library has EAN-13 barcodes.

>Sorry about that (I didn't actually check before I posted)

>That leads me to a question
>I have a Prop Win based database/POS system with thousands (7 years) of
>manually entered EAN-13 and my own shortcodes that when/if I end up moving
>it over to linux I will need to extract is there someone that could help
>me with that?

>BTW the software company doesn't exist any more.

A nice challenge.

You'll probably find either an "Access DB" or flat file with fixed
record length sitting behind the application.

I've previously decoded similar databases with 10s of thousands of
data rows containing several columns of data. Quite straight-forward
with Unix/Linux tools.

Working out the structure of data records is the time-consuming bit.
A process of analysis based on the known data and how numbers are
represented in the native platform. The time required will depend
largely on how numbers are encoded and if you really need them; as
well as the number of important data fields you want to extract.

I suggest that you have a go at it yourself... print out a few
kilobytes of the data file(s) as an octal/hex dump and see if
there's a recognizable structure. It'll confirm fixed record sizes,
or identify record and field separators.  Most likely; the EAN will
be stored as text. People don't tend to invent new wheels; they'll
give them a fresh lick of paint and sell them as their own.

Once you know the record structure, a quick hack can be written to
extract the data into something that's humanly-readable. I suggest
XML, but you may decide to simply dump out CSV. Then load into your
favourite (FOSS) DB.

Always save a copy of the original data. Write-protect the file.
Create a dotted sub-directory and place a hard link to the data
file within that directory; then make the sub-directory mode zero.
It'll make the file very hard to "lose" with normal user
permissions. 

Take a copy and put it on CD/tape/floppy.
-- 
/"\ Bernd Felsche - Innovative Reckoning, Perth, Western Australia
\ /  ASCII ribbon campaign | "Laws do not persuade just because
 X   against HTML mail     |  they threaten."
/ \  and postings          | Lucius Annaeus Seneca, c. 4BC - 65AD.