Image Map Image Map
Results 1 to 8 of 8

Thread: Importing old messages

  1. #1

    Default Importing old messages

    Shawn provided me with a Zip file of the Yahoo! group that somebody scraped.

    The messages are in "email" format, so they look like full emails with all of the headers. vBulletin has an API for creating threads. Writing some code to import each message and add it to a new thread (or an existing thread if the subject start with "Re:") should be be too terrible.

    For now I think I want to write a simple script to just get the date, subject, sender and the body text onto a static web page. That's now more than a few hours of work and it would be searchable, but it would not be threaded. Getting things threaded and/or importing into vBulletin is a longer term project.


    Thoughts?

  2. #2
    Join Date
    Mar 2013
    Location
    Chaffee, MO
    Posts
    1,607

    Default

    What email file formats are these in?

    .DBX, .PST, or .EML?


    Larry

  3. #3

    Default

    .EML - nice, human readable text. The filenames are just sequence numbers.

  4. #4

    Default

    Quote Originally Posted by mbbrutman View Post
    Shawn provided me with a Zip file of the Yahoo! group that somebody scraped.

    The messages are in "email" format, so they look like full emails with all of the headers. vBulletin has an API for creating threads. Writing some code to import each message and add it to a new thread (or an existing thread if the subject start with "Re:") should be be too terrible.

    For now I think I want to write a simple script to just get the date, subject, sender and the body text onto a static web page. That's now more than a few hours of work and it would be searchable, but it would not be threaded. Getting things threaded and/or importing into vBulletin is a longer term project.


    Thoughts?
    This is a great idea, and will be incredibly helpful for future searchers. vBulletin integration will be nice, but at least there will be something to refer to (with a stickied link at the top of the Grid forum?) when digging for old info.

    Has a home been found for the files from the group?

  5. #5

    Default

    I've been meaning to get to the message import but I got sidelined by something horrible. Trust me, it's a good excuse. It will happen in the next few weeks.

    Hosting the files here is still possible; I just need to see what kind of copyright risk we would be taking on. One thing that works in our favor is that we are a registered 501C3 with a real museum, so we have more latitude to protect and preserve software than I let on. It's the distribution part that we need to be careful about.

  6. #6

    Default

    Some progress:

    http://www.brutman.com/RuGRiD/

    That directory has 8 HTML files which have 500 messages each. The messages have some light formatting on them.

    Known problems/limitations:
    • Many of the messages are "multi-part" and include an HTML version and a plaintext version; the HTML version of those messages is suppressed for now until I can properly sanitize the HTML in them. (It includes things like <head> and <body> tags which screw the overall page up.
    • Attached pictures and files are not included yet.
    • This is a prototype. The final location will be on something owned by VCFed.org


    Please have a look and let me know about outright bugs. Things like completely missing message bodies might still be happening. The formatting is rough, but it is as it appears in the originals.

    Reading MIME emails has been more challenging than I expected.

  7. #7

    Default

    I updated the files again today; the files should be more complete and readable.

    Still to come - attachments.

  8. #8

    Default

    Thanks for your work on this. What you have so far looks great!
    -Shawn

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •