Forum OpenACS Development: primary key in cr_mime_types

Collapse
Posted by Mark Aufflick on
is on mime_type alone. this doesn't seem right, and causes me a problem in file storage when it tries to link, for example, the type text/plain with extension .bar

other examples ate text/html which should be able to be found from .html and .htm

the unique key should be (imho) file_extiension, since it is not possible to reasonably link a single extension with multiple mime types.table

would having possible duplicate mime_type's break anything?

Collapse
Posted by Janine Ohmer on
Mark, I just ran into this yesterday and came to the same conclusion you did.  However, Jeff Davis says

"The problem is that the cr_mime_types table is not (and should not be) a mapping table of extension to mime type.  Extension happens to be in the table but it is really a list of existing mime types.

There needs to be a table cr_extension_mime_type_map which handles this mapping and the mime type creation stuff should not be creating a mime type if one already exists."

(I hope he won't mind my posting his note - but he's asleep at the moment and can't do it himself :)

He said he was going to fix this but I don't know exactly when.

Collapse
Posted by Jeff Davis on
I have created the table and have some data for populating it but the question is should we just quit using ns_guesstype and maintain it all ourselves. I looked at the raw data for ns_guesstype and it is wrong in some cases (like .zip -> application/x-zip-compressed rather than application/zip which iana lists) and is missing a lot of what people really upload (all the MS mime types for example), and has some DCI/AOL internal ones (or ones I have never seen anyway like .ani mapping to application/x-navi-animation). It can of course be overridden but maybe it would be better to maintain our own.

Also, ftp://ftp.isi.edu/in-notes/iana/assignments/media-types/media-types lists about a zillion mime types and I wonder if we should just create most of them by default (which would mean tracking down extensions for most of them).

Another issue is that as it stands I have defined the table as:

create table cr_extension_mime_type_map (
   file_extension       varchar(200) 
                        constraint cr_mime_type_extension_map_pk
                        primary key,
   mime_type            varchar(200) 
                        constraint cr_mime_ext_map_mime_type_ref
                        references cr_mime_types
); 
which would mean a unique entry per extension. I know it's not really valid but I think until we do the work on the UI side to allow for chosing the correct mime type it's probably better to enforce uniqueness.

The final one is that for some things I am not sure what the right generic extension to return would be when more than one is listed (the ext in cr_mime_types). An example being application/quicktime where both .qt and .mov are used.

Anyone want to figure it all out for me?

Collapse
Posted by Janine Ohmer on
I dunno about figuring it all out for you but here's what I think:

I agree that enforcing uniqueness on file extensions is the right thing to do for now.

For expediency's sake I suggest creating a basic list of extensions, just the ones that seem the most common, and let people add more as they need them.  No duplicate mime-types in the list, so you don't have to handle that case.

This leaves the work of fixing it the rest of the way to the next person who really needs it to work "right", which might be kind of unfair, but that *is* the way open source works...

Collapse
Posted by Dave Bauer on
Changing primary key to extension breaks cr_import_content which does not insert an extension when it finds a new mime type.

So don't change the primary key to extension. Also there are many cases where more than one extension is used to identify the same mime-type.

Collapse
Posted by Jeff Davis on
I have pretty much finished this up and will be committing what I have.
  • created cr_extension_mime_type_map ( extension pk, mimetype ref cr_mime_types);
  • added a reasonably complete set of mime types and extension mime type mappings.
  • Added ad_proc cr_filename_to_mime_type { -create:boolean filename} which will return a mime type and create the mime type if needed (including creating the extension -> mime type map)
  • Added ad_proc cr_create_mime_type { -extension -mime_type -description }
  • removed *_maybe_create_mime_type and use the new api.

On thing I did was add a mapping for a lot of the common code extensions so that they map to text/plain. This means when you pull them down from file storage they will be displayed rather than saved to disk. (This might not be right for everyone but I think it's a reasonable thing to do).

the only big issue outstanding is that I am not sure what I should do on upgrade scripts. I guess create the mime types that don't already exist and populate the extension mapping table with a preference for existing extensions, but I also fixed some that were wrong (or out of date) in the original data. I guess we just let those go...

Collapse
Posted by Jun Yamog on
Hi,

In a bit of a related note.  I use [template::util::file::get_property mime_type $upload_file] to get the mime type of the file.  Then if it does not exists yet I create a new mime type.  It seems to work better than ns_guesstype.

Collapse
Posted by Gustaf Neumann on
i had a problem with uploading mp3 files into xowiki. firefox 2.0 uploads .mp3 files with a content_type audio/mpeg (as listed by iana) where the actual cr_mime_types table contains two different entries for mp3 files.


dotlrn-test5=# select * from cr_mime_types where file_extension = 'mp3';
label | mime_type | file_extension
-------------+--------------+----------------
Audio MPEG | audio/x-mpeg | mp3
Audio - MP3 | audio/mp3 | mp3
(2 rows)

In the current code for the content repository inserting an entry with a mime_type of audio/mpeg triggers a foreign key violation on table "cr_revisions" with foreign key constraint "cr_revisions_mime_type_ref".

my quick fix was

insert into cr_mime_types (label, mime_type, file_extension) select 'Audio - MP3','audio/mpg', 'mp3' from dual where not exists (select 1 from cr_mime_types where mime_type = 'audio/mpg');

is there a better way?

Collapse
Posted by Mark Aufflick on
It's been a while (nearly 3 years), but I don't recall anything breaking when I removed that unique constraint and added one on extension.

By leaving the extension column allowing nulls there is no problem with multiple null values even with a unique constraint.

Collapse
Posted by Gustaf Neumann on
there is no unique constraint on cr_mime_types. adding the value does work (the error "goes away"), but the mess increases. A result will be, that there are mp3 files in the cr, which might have one of the three values
  • audio/mp3
  • audio/mpeg
  • audio/x-mpeg
in the database. When OpenACS delivers these files, some of these might have e.g. the first, some other the second mime-type. A browser will most likely behave different on those, the user has no idea, why....

When an e.g. mp3 file is stored in the static area, the mime-type might be resolved via ns_guesstype, maybe coming up with a forth variant...

actually, while iana recommends for mp3 audio/mpeg, mac OS X returns "audio/mpg", what i have used in the statement above. the mess is actually a real word mess.

so, for delivery, a unique mapping would be good. This could be achieved e.g. via

  • a unique constraint in the DB + a mapping table for incoming files that maps different incoming mime-types to a canonical, or
  • using ns_guesstype always for delivery, no matter what the stored mime_type is.

The mess is not yet to big. we have multiple mime_types for .mp3, .cdf and .wav

     4 | mp3
     2 | cdf
     2 | wav

i think they can be standardized:

  • mp3:
    • we have: audio/x-mpeg, audio/mp3, audio/mpg, audio/mpeg
    • IANA: audio/mpeg, Apache: audio/mpeg, mac: file -i audio/mpg
  • wav:
      we have: audio/x-wav, audio/wav
    • Apache: audio/x-wav, mac: file -i audio/x-wav
  • cdf
    • we have: application/x-cdf, application/x-netcdf
    • Apache: application/x-netcdf

The only problem seems to be mp3, where indeed multiple ideas exist, trend towards audio/mpeg. I would say that handling this case in the file upload form via a configured ns_guesstype should be the way to go...

Collapse
Posted by Jon Griffin on
It also appears that some pdf files also have the problem.
I am not sure how firefox determines the mime type, but I am getting a CR error when uploading .pdf, the sql shows:

application/unknown

which isn't in the table.

Collapse
Posted by Alfred Werner on
Import /usr/share/mime/globs ? Might be a different place on debian family, but that is the RH family default location.