New website HTML conversion errors

Started by LociOiling

LociOiling Lv 1

Just figured this one out in reference to Josh's newletters, where all the image links seem to be broken. Something bad happened when pages containing HTML were moved to the new website.

Back on old.fold.it, the newsletter images were clickable links, an img tag inside an a tag, wrapped in a p tag. Simple enough. The old HTML looks like this:

<p><a href="https://fold.it/portal/files/images/newsletter_2129.png"><img src="https://fold.it/portal/files/images/newsletter_2129.png" alt="" title="" class="image image-preview" width="600" /></a></p>

(Scroll right to see the whole line.)

Over on the new website, the img tag get wrapped in a span tag, and both the img and the span have a class keyword. The p tag is gone.

Unfortunately, the a tag is missing its closing bracket. Also, the slash that closes the img tag is now misplaced, lodged between the class and width keywords, not next to the closing bracket. More complicated, and invalid HTML. The result is no images. The new broken HTML is:

<a href="https://fold.it/portal/files/images/newsletter_2129.png<span class="inline inline-center"><img src="https://fold.it/portal/files/images/newsletter_2129.png" alt="" title=""  class="image image-preview"/ width="600"></span></a>

In this case, the HTML from the old website seems to work just fine on the new system. The span tag and the class keywords aren't needed. Perhaps they might be needed in other contexts.

Here's a corrected post as proof of concept: Newsletter April 15: Titles are Hard. I simply copied the old HTML over the invalid "converted" HTML. Also, the original title was added to the post, since the new system doesn't allow a subject line on replies to a thread.

While it's not difficult to correct the bad HTML, there are still 101 of Josh's newletters left to review. And of course, there's no way to tell how many other pages are affected. If an automated conversion re-do is possible, it would be nice if it could throw in the old subject line in an h2 tag at the post of the post.

LociOiling Lv 1

The misplaced slashes in the image tag aren't very consistent. They happen on some of the posts, but not others.

The problem with .png<span happens on almost every post. At least one post had .pngspan instead, still invalid, just not consistent.

All the 2020 posts have been updated to fix the links and most other HTML issues. There may still be some bad links to the node level on the old website, haven't figured those out yet.

As previously noted elsewhere, the new website is very strict about closing tags. If you have li tags with the closing tags, things go badly when you get to the ul or ol closing tag.

LociOiling Lv 1

Another issue I've been seeing a lot is a missing closing quote on the href keyword of the a tag. It didn't happen on the example I happened to pick, but it did happen in lots of other spots.

The new system also doesn't seem to like href without quotes around the value, even though I think that used to be valid HTML.

The formatting issues seem kind of inconsistent, it's really not clear why one post has one problem, while another similar post has another problem.

LociOiling Lv 1

If anyone has gotten a flood of notifications about the Foldit Newsletters or Josh's How To Foldit posts, I'm sorry. I had neglected to consider that anyone might be following these old threads.

The How To Foldit thread is probably all updated now, there were only 10 posts. (But they had a lot of picky formatting issues.)

The Foldit Newsletters thread is not nearly finished, I still have most of 2021 and a good part of 2022 to fix up.

The threads have a "Stop Following" button, which might be a good idea to save your inbox.