Search

Domino Upgrade

VersionSupport end
5.0
6.0
6.5
7.0
8.0
8.5
Upgrade to 9.x now!
(see the full Lotus lifcyle) To make your upgrade a success use the Upgrade Cheat Sheet.
Contemplating to replace Notes? You have to read this! (also available on Slideshare)

Languages

Other languages on request.

Twitter

Useful Tools

Get Firefox
Use OpenDNS
The support for Windows XP has come to an end . Time to consider an alternative to move on.

About Me

I am the "IBM Collaboration & Productivity Advisor" for IBM Asia Pacific. I'm based in Singapore.
Reach out to me via:
Follow notessensei on Twitter
(posts)
Skype
Sametime
IBM
Facebook
LinkedIn
XING
Amazon Store
Amazon Kindle
NotesSensei's Spreadshirt shop
profile for stwissel on Stack Exchange, a network of free, community-driven Q&A sites

« From Blogsphere to a Static Site (Part 1) - Objects | Main| From Blogsphere to a Static Site (Part 3) - Generating pages »

From Blogsphere to a Static Site (Part 2) - Cleaning up the HTML

Blogsphere allows to create RichText and plain HTML entries. To export them I need to grab the HTML, either the manual entered or the RichText generated on, clean it up (especially for my manual entered HTML) and then replace image sources and internal links using the new URL syntax. To make this happen I created 2 functions that saved images and attachments and created a lookup list, so the HTML cleanup has a mapping table to work with

private void saveImage(Document doc) {
        String sourceDirectory = this.config.sourceDirectory + this.config.imageDirectory;
        try {
            String subject = doc.getItemValueString("ImageName");
            Date created = doc.getCreated().toJavaDate();
            @SuppressWarnings("rawtypes")
            Vector attNames = this.s.evaluate("@AttachmentNames", doc);
            String description = doc.getItemValueString("ImageName");
            String oldURL = this.config.oldImageLocation + doc.getItemValueString("ImageUNID") + "/$File/";
            SimpleDateFormat sdf = new SimpleDateFormat("yyyy");
            String year = sdf.format(created);
            FileEntry fe = this.imgEntries.add(subject, oldURL, description, created);

            for (Object attObj : attNames) {
                try {
                    String attName = attObj.toString();
                    String newURL = this.config.webBlogLocation + this.config.imageDirectory + year + "/" + attName;
                    fe.add(attName, newURL, description, created);
                    String outDir = sourceDirectory + year + "/";
                    this.ensureDirectory(outDir);
                    EmbeddedObject att = doc.getAttachment(attName);
                    att.extractFile(outDir + attName);
                    Utils.shred(att);
                } catch (NotesException e) {
                    e.printStackTrace();
                } catch (Exception e2) {
                    e2.printStackTrace();
                }
            }

        } catch (NotesException e) {
            e.printStackTrace();
        }

    }

    private void saveImageFromURL(String href, String targetName) {

        String fetchFromWhere = "https://" + this.config.bloghost + href;
        try {
            byte[] curImg = Request.Get(fetchFromWhere).execute().returnContent().asBytes();
            this.saveIfChanged(curImg, targetName);
        } catch (ClientProtocolException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }

    }

With images saved the HTML cleanup can proceed. As mentioned before I'm using JSoup to process crappy HTML. It allows for easy extraction of elements and attributes, so processing of links an images is just a few lines

    private String cleanupHTMLlinksAndImages(String source, String location) {

        org.jsoup.nodes.Document hDoc = Jsoup.parse(source);
        this.cleanupTagUrlAttribute(hDoc, "img", "src", location);
        this.cleanupTagUrlAttribute(hDoc, "a", "href", location);
        return hDoc.body().html();

    }

    private void cleanupTagUrlAttribute(org.jsoup.nodes.Document hDoc, String elementName, String attName, String location) {

        String query = elementName + "[" + attName + "]";
        Elements elements = hDoc.select(query);

        for (Element element : elements) {
            String attValue = element.attr(attName).trim();
            if (this.mapperOldNewURLs.containsKey(attValue.toLowerCase())) {
                String replace = this.mapperOldNewURLs.get(attValue.toLowerCase());
                System.out.print("Replacing:");
                System.out.print(attValue);
                System.out.print(" with ");
                System.out.println(replace);
                element.attr(attName, replace);
            }
        }
    }

The returned HTML not only has the attribute values cleaned up, but also produces valid clean HTML.
Next stop: rendering output. Stay tuned

Disclaimer

This site is in no way affiliated, endorsed, sanctioned, supported, nor enlightened by Lotus Software nor IBM Corporation. I may be an employee, but the opinions, theories, facts, etc. presented here are my own and are in now way given in any official capacity. In short, these are my words and this is my site, not IBM's - and don't even begin to think otherwise. (Disclaimer shamelessly plugged from Rocky Oliver)
© 2003 - 2017 Stephan H. Wissel - some rights reserved as listed here: Creative Commons License
Unless otherwise labeled by its originating author, the content found on this site is made available under the terms of an Attribution/NonCommercial/ShareAlike Creative Commons License, with the exception that no rights are granted -- since they are not mine to grant -- in any logo, graphic design, trademarks or trade names of any type. Code samples and code downloads on this site are, unless otherwise labeled, made available under an Apache 2.0 license. Other license models are available on written request and written confirmation.