Author Topic: HTML Stack to SMF Database  (Read 251 times)

Stratovarius

  • Forum Host
  • Administrator
  • *****
  • Posts: 174
    • View Profile
HTML Stack to SMF Database
« on: June 14, 2017, 09:14:58 PM »
So over on GitP, RedMop has been able to snag the entire Wayback archive for both MMX and BG, in HTML format. What we don't have is a way to transform that into a Simple Machine Forums database, in case that's the only way to go about rebuilding the data.

Given there are a number of technically minded people on here, do we have any ideas on how that might be accomplished?
« Last Edit: June 15, 2017, 05:15:25 AM by Stratovarius »

oslecamo

  • Sr. Member
  • ****
  • Posts: 360
  • Creating monsters for my world of darkness
    • View Profile
Re: HTML Stack to SMF Database
« Reply #1 on: June 14, 2017, 09:30:50 PM »
Even if we can't find a way to directly transfer everything, having the html code would save a lot of work. I think I'll wait a day or two to see if that nice guy gets download links available

EDIT:Oh, wait, will still need to clean the code manually. Well, back to work.
« Last Edit: June 14, 2017, 09:37:06 PM by oslecamo »

Stratovarius

  • Forum Host
  • Administrator
  • *****
  • Posts: 174
    • View Profile
Re: HTML Stack to SMF Database
« Reply #2 on: June 15, 2017, 05:14:55 AM »

PlzBreakMyCampaign

  • Full Member
  • ***
  • Posts: 120
  • 75%fortified v forum death as a fairness elemental
    • View Profile
Re: HTML Stack to SMF Database
« Reply #3 on: June 15, 2017, 12:28:16 PM »
Not to poo the party, but what I found when I finally admitted to myself that MMB might not come back (first day was, meh, fluke. Second day was let's hope that tomorrow...) was that the internet archive was only useful for year-old threads, and even then didn't have more than 2 deep of the board.

So it was useful for finding how pages actually looked and as another snapshot compared to bing and google cache. But I found myself actually using bing and google cache because they simply held more pages than the internet archive. I thought someone requested that Wayback When go through the full site (you can do that, right?), but I only found one full-ish mirror of the site and it had holes...
Sigh. Not again. :(

oslecamo

  • Sr. Member
  • ****
  • Posts: 360
  • Creating monsters for my world of darkness
    • View Profile
Re: HTML Stack to SMF Database
« Reply #4 on: June 16, 2017, 05:39:29 AM »
GitHub link to the data

I downloaded it but how do I open it? It's mostly a giant pile of files with no configurtation like .html.

awaken_D_M_golem

  • Jr. Member
  • **
  • Posts: 99
  • I told you there was/is two of me.
    • View Profile
Re: HTML Stack to SMF Database
« Reply #5 on: June 20, 2017, 07:23:15 PM »
Oh that one looks good.
 :)


This one looks quite old and rather shaky (no slight intended toward whomever tried to do this).
Don't do this particular one.

Someone already made an HTML dump of the WayBackMachine archive of the forums and put it on GitHub (IIRC), and the current problem is how to preserve formatting when transferring content over. As it stands it's hard manual labor.

Is this it, or an older one ??
https://github.com/v-zmiycharov/min-max-board
... seems to be dated to Dec 2014.

Solauren

  • Newbie
  • *
  • Posts: 1
    • View Profile
Re: HTML Stack to SMF Database
« Reply #6 on: June 30, 2017, 11:03:20 AM »
I'm not familiar with SMF Database.....

However.......

Most databases can import formatted data from other sources.
i.e  Formatted Text files (or even HTML in a simple table format)
Since SMF works with standard SQL (MySQL, PostgreSQL, or SQLite), that shouldn't be an issue.

Looking at the Archives I sent you, the HTML coding in the threads isn't complicated.      It's 'overformatted', but that's pretty much par per course for Web discussion forums.  (They pretty much need to be).
Stripping out the unneeded HTML Coding, and reformatting everything shouldn't be difficult.

As for the importing, it would really depend on what can talk to SMF databases.

If I get the time in the near future, I'll see if I can get MS-Access to use SMF as a back end.  If it can, then setting something up to import it all into SMF for you to play with shouldn't be that hard.