dslibris: Preparing books for reading

News: EPUB is now supported. EPUB is preferred and recommended. If you use EPUB, the rest of this article is unnecessary to review.

dslibris understands books stored as EPUB or XHTML.

XHTML should be in UTF-8 encoding with numeric entities. The files must end with the extension ‘.xhtml’ or ‘.xht’ and be saved to the ‘book’ folder on your media.

Converting from HTML to XHTML

Use HTML Tidy to clean up HTML and convert it to XHTML. An online Tidy service at http://infohound.net/tidy lets you upload an HTML file and get XHTML back.

If you’re using command line tidy, here’s an example:

tidy -asxhtml -utf8 -numeric -o book.xhtml book.html

Also, people have used these programs to save as XHTML:

  • Microsoft Word
  • Amaya
  • AbiWord
  • OpenOffice Writer

Converting from PDF to XHTML

This generally doesn’t work since PDF formats are preformatted assuming a certain page size and so can’t be reliably converted to a form that will flow properly on the DS. If you’re willing to massage the text in a text editor after copying it out of a PDF you can sometimes get a reasonable result.

Converting from TXT to XHTML

As with PDF, the lack of information for reformatting ASCII text files is a problem. If your source is from Project Gutenberg, the Gutenmark project provides programs for generating reasonable HTML from the ASCII text format files. That HTML can then go through Tidy as above.

Of course, those who can write HTML could rewrite text files into HTML.

What a pain! Is there relief in sight?

There are efforts afoot to provide Gutenberg texts in ePub format, and Feedbooks provides ePub material. ePub support is on my wish list for dslibris. Cross-platform tools for generating XHTML and ePub from other formats is also in the works.

If you’re having problems with converting a book or getting it to work, please post in the Help forum on Sourceforge:

https://sourceforge.net/forum/forum.php?forum_id=739965


Posted

in

by

Comments

40 responses to “dslibris: Preparing books for reading”

  1. Pulstar Avatar
    Pulstar

    I found that saving the (uncompliant) HTML document from within OpenOffice Writer and then converting via Tidy is much less of a hassle than using Acrobat or WinWord. Just a hint 😉

  2. Stravy Avatar
    Stravy

    In recent versions, openoffice can export directly in xhtml. The exported file works perfectly well for me in dslibris.

  3. John H. Avatar
    John H.

    Been playing around with .30, but I keep getting weird As with dialetic marks over them sprinkled in places where there are extra spaces. Looking at the source however reveals no special HTML in those places, just an extra space. It’s rather mysterious.

  4. Ray Haleblian Avatar

    John, I’ve seen that before and my memory will kick in and tell us what it is. Just to be sure, maybe you can send me the file to give me a little jog.

  5. Ray Avatar
    Ray

    http://htmltidy.movb.de/

    is a convenient way to convert a URL to XHTML in UTF-8 encoding. Set the output encoding to UTF-8 and also select “asxml”. Save the document you get back with your browser “Save As…” menu item.

  6. Lance Avatar
    Lance

    can’t get it to work it just not working well oh well

  7. Ray Haleblian Avatar

    Lance, sorry to hear that. Was it books that didn’t work, or simply getting the program to run?

  8. Ray Haleblian Avatar

    John, the unprintable chars may be UTF-8 that your text viewer can’t show. if you want to email me the file i’d like to see what happened.

  9. Kijof Avatar
    Kijof

    Using the export function on Abiword to xhtml worked fine to me with UTF-8.

  10. Lou Avatar
    Lou

    Please help, I am a complete beginer ! I try to run tidy on the required file by typing in the comand above and it seemed to run (loads of text scrolled very quickly anyway!) but I cannot find the .xhtml file. Any ideas?

  11. Lou Avatar
    Lou

    Re above comment, I found a newer copy of Tidy and have now got it working.

  12. Russ Stutler Avatar

    Is there any MacOS friendly drag and drop application out there that will convert text files into UTF-8 XHTML files without a lot of fiddling?

  13. Kijof Avatar
    Kijof

    One Tip:

    Get Amaya Editor, create a Xhtml file and paste the text on it, very easy.

  14. Steve Avatar
    Steve

    Kijof, how did you export to xhtml in UTF-8 using Abiword? I have Abiword but I can’t figure it out…
    I have all the plugins.

  15. Ess.uk Avatar
    Ess.uk

    Any tips on converting .txt to .xhtml? I’m struggling to find some good software. I’ve tried “RTF to XHTML Converter” from Sautinsoft but it doesnt seem to recognise the .rtf after I changed the .txt format using wordpad…. I’ll keep trying though reading expands the mind, potentially. . . . 😉

  16. Ray Haleblian Avatar

    Ess, folks have used AbiWord and OpenOffice to write out XHTML. I suspect that Microsoft Word would work too. You may find that lines will not break at the best places unless you reformat the document by hand in Word etc; that’s a limitation of TXT format, why it’s not supported directly. Good luck

  17. Matt Avatar
    Matt

    For txt file, you might wan to try to modify the content yourself.
    Open up the txt file, and change it as below, and save as .xht:

    type ur title here

    leave whatever content here

    For example:
    DSLibris

    An ebook reader for Nintendo DS.

    When u open the file with dslibris, on the menu it will detect the file with title “DSLibris”, with the content “An ebook reader for Nintendo DS.”.

  18. Matt Avatar
    Matt

    oops, my fault
    html tag is not view-able here..

  19. Victor Avatar
    Victor

    I recommend copy and paste on the Amaya web editor. Simple, quick and nifty. Moreover, the document looks pretty much the same on the computer and on the DS.

  20. josh Avatar
    josh

    im sorry this program sucks. alot. its confusing it isnt compatable with anything easy YOU should make a converter that does it in one step and i wasted way to much time trying.

  21. Ray Haleblian Avatar

    josh, i hope you have a better day tomorrow.

  22. Robbie Avatar
    Robbie

    I can’t get it to work, whenever I try to boot up dslibris, it loads up 3 different files, and the third one always fails. I’ve been to 2 other sites trying to fix the problem, and nothing seems to help. I’ve heard great things about the app and I’m sad knowing I can’t get it to work. I would blame it on the fact that I have Gn’M, but the most recent versions of DSOrganize and Moonshell work just fine. I’ve patched it, moved it around, re-patched it. I’ve redownloaded dslibris and the books, I’ve converted the books, re-do the width on the xhtml on ++Notepad, I’ve tried every suggestion given to me and I still cannot get it to run.
    I’m going to go to bed now.

  23. Tom Avatar
    Tom

    I spent ages trying to get this to work, but what with one thing and another, I just couldn’t get my PDFs into a tidy enough HTML to work. Using Open Office, I just saved it as HTMl, then exported straight to XHTML (File, export..). It worked like a charm. If you’re having problems with this then try Open Office!

  24. Tom Avatar
    Tom

    Sorry for that last comment – I actually meant to post it on a different page, relating to converting files for use on an NDS eBook reader. Open Office was indeed the best solution for this, but I didn’t mean to claim that it was a suitable replacement for this software in every instance.

  25. Ray Haleblian Avatar

    Robbie, send me the file that doesn’t work and I bet we can figure out the problem.

  26. Garrett Avatar
    Garrett

    Anyone else getting weird gaps in the text every 2 lines or so? readable, but not very happy looking. Anyone else have this problem? I generally have been using amaya editor, although I tried the others, and they all turn out about the same.

  27. Danny Chicago & New York City Kid Avatar
    Danny Chicago & New York City Kid

    I use “Convert Doc”, it work with almost everything I have tried to convert. I’ve converted .txt to .htm, .pdf to .htm, I then run it through tidy.

  28. nelson Avatar
    nelson

    Will someone share a program, or website or something that will convert .lit or .htm files to .xhtml without us having to know all these different programing languages? I have been trying for weeks now to convert my files for this program. Sometimes i get the red and white error screen, and sometimes it will open the book, but no pages to read.

    I just want something simple that you click, and it converts it over.

    Any help would be greatly appreciated.

  29. Ray Haleblian Avatar

    I’ve put the URL to an online Tidy service in the article.

  30. yulbeast Avatar

    This was my solution with a pdf. A bit long but it worked!!:
    First I saved it with adobe acrobat as txt, but that gave me problems with the end of line, because it inserted a carriage return and I didn’t want that.

    So I opened it with the ultra edit (a text editor) and removed all that carriage returns with the menu option: Format, convert CR/LF to line wrap. I also deleted all the new page characters (shown as a horizontal line in the ultraedit).

    To create the htlm, I opened the txt with microsoft word and saved as filtered html

    Last, I executed the tidy command:
    tidy -asxhtml -numeric -o book.xhtml book.html

    Now the book is clean!!

    I have to say that ultraedit has his own tidy, but I was not able to make it work as desired: All the CR/LF disappeared, so I got an awful book in one single line. Maybe if I knew how to configure the options … (the meaning of asxhtml and numeric options)

  31. Curtis007 Avatar
    Curtis007

    I know this has got to be the cheapest cop out of the whole lot but it works for me.
    Ya see when you install the files to the root dir, now having no real knowledge of html or what im doing…
    I edited the the xht it came with..
    here is the short of it

    [Easy Tutorial]

    Adding Books:

    Very quick and easy open in notepad paste and when you save as MAKE SURE ITS IN UTF-8 format not ANSI!! and to save urself hastle cahnge the name from easy tutorial to something else..

    Controls:

    A/R/PAD-DOWN – forward one page
    B/L/PAD-UP – backward one page
    X – invert text
    Y – change screen brightness
    SELECT – toggle book browser

    More info and tips:

    PASTE TXT HERE!!

    It may look wrapped diff but keep note of where the txt is contained.

  32. Curtis007 Avatar
    Curtis007

    crud just realised.. this will be of no help as you cant see where the mark up is. wish i could just upload the file for ya…its a crude fix but it’ll serve for reading books on the busses for me.sorry ppeps.

  33. Ray Haleblian Avatar

    To follow up Yulbeast’s comment: I’ve had some luck with PDF if I save it from Acrobat as HTML 3.01 and then remove all of the tags with global search and replace in a text editor. In some cases the paragraphs will still be correct.

    Let me stress again – TXT and PDF formats throw away the formatting information that’s needed to lay a page out on a little DS screen, so there isn’t a robust means to convert these formats to work for mobile devices.

  34. R.Zonde Avatar

    here is what I used from converting from pdfs

    pdf –> txt –> html –> xhtml

    to get the text from the pdf, open the pdf in adobe and then go to file>save as text.

    from the text, we use GutenMark [http://www.sandroid.org/GutenMark/download.html] to export the text file into html this program also formats the text file to be a suitable gutenberg etext file, which is useable on basically every pda type reader out currently.[Thus why we use it here]. One note to make to avoid backtracking: The override book title option is recommended because the title you put here will show up as the title of the book in dslibris. I made the mistake of not using this feature of GutenMark. so instead of having ‘The Clockwise Man’ as the title, GutenMark took the first line in my text as the title [which was Doctor Who]

    Then lastly we use Html Tidy [http://infohound.net/tidy/] to turn the now html text into xhtml. make sure the export to xhtml option is on and also the utf8 encode option is on.
    if you run windows just rightclick the save tidy file option and save as ‘yourtitle.xhtml’ make sure you type .xhtml when you save the file [because when you save it it’ll be recognized as a .txt file.

    I hope this helps people who were having problems exported pdf files.

  35. R.Zonde Avatar

    ah I forgot to say

    I use adobe reader 8 to get the plain text [not acrobat]

  36. R.Zonde Avatar

    for people who want a html tidy type user friendly program on their computer there’s Tidy UI [http://users.rcn.com/creitzel/tidy.html#tidyui]

  37. Ray Haleblian Avatar

    Great news about Adobe Reader 8.

  38. Andrew Avatar
    Andrew

    Look at these poor comments. What kind of book reader doesn’t support plain text? It’s trivial to remove carriage returns and keep double return/tabs as paragraph delimiters; you could do it programatically on the DS or it can be done in a text editor by the end user with a couple search/replaces. Your brain is stuck in this “I need my layout to be embedded in the file” mode, when it’s self-evident that all you need is plain text with no hard returns, and a tab or double returns between paragraphs for a vanilla book with no images. I’m not sure why you think the DS is a special case. All monitors have a fixed width, and words have to be wrapped when that width runs out. Strange all around to see a polished reader like this crippled by such tunnel vision. Step one should have been displaying Project Gutenberg vanilla ascii texts.

  39. Ray Haleblian Avatar

    @Andrew: thank you for your fine contribution.

    In it, you take liberty to claim where my brain is at, specifically “need[ing] layout to be embedding in the file”, and then proceed to describe a file format whose layout is embedded in the file – an ASCII file that uses specific newline semantics to show layout.

    And also thank you for reminding me what my priorities should have been. It should be clear at this point that my priority is having books format evolve toward standards that will help publications interoperate with many applications.

    Despite my distaste for the tone of your posting, I would still be interested in seeing a document specification from PG that shows their text format is consistent enough to be able to reflow and maintain the intent of all prose, or otherwise a few hundred examples of documents from their library that prove this by induction. Or, if you have all PG TXT documents working in the book reader you’re developing you could send me some code.

    Cheers, ray

  40. Ray Haleblian Avatar

    A note to readers – I’ll be closing comments on this post, in favor of posting on the Sourceforge site. Please don’t hesitate to post there, you can use the ‘Help’ section.

    Postings there should be open to the public; please keep it coming. I’m not shutting things off because of recent postings, it takes more than that to truly piss me off these days. It’s really to have this information and commentary centralized with development and everything else on Sourceforge.

    The Help forum is here:

    https://sourceforge.net/forum/forum.php?forum_id=739965