dslibris: Preparing books for reading

News: EPUB is now supported. EPUB is preferred and recommended. If you use EPUB, the rest of this article is unnecessary to review.

dslibris understands books stored as EPUB or XHTML.

XHTML should be in UTF-8 encoding with numeric entities. The files must end with the extension ‘.xhtml’ or ‘.xht’ and be saved to the ‘book’ folder on your media.

Converting from HTML to XHTML

Use HTML Tidy to clean up HTML and convert it to XHTML. An online Tidy service at http://infohound.net/tidy lets you upload an HTML file and get XHTML back.

If you’re using command line tidy, here’s an example:

tidy -asxhtml -utf8 -numeric -o book.xhtml book.html

Also, people have used these programs to save as XHTML:

Microsoft Word
Amaya
AbiWord
OpenOffice Writer

Converting from PDF to XHTML

This generally doesn’t work since PDF formats are preformatted assuming a certain page size and so can’t be reliably converted to a form that will flow properly on the DS. If you’re willing to massage the text in a text editor after copying it out of a PDF you can sometimes get a reasonable result.

Converting from TXT to XHTML

As with PDF, the lack of information for reformatting ASCII text files is a problem. If your source is from Project Gutenberg, the Gutenmark project provides programs for generating reasonable HTML from the ASCII text format files. That HTML can then go through Tidy as above.

Of course, those who can write HTML could rewrite text files into HTML.

What a pain! Is there relief in sight?

There are efforts afoot to provide Gutenberg texts in ePub format, and Feedbooks provides ePub material. ePub support is on my wish list for dslibris. Cross-platform tools for generating XHTML and ePub from other formats is also in the works.

If you’re having problems with converting a book or getting it to work, please post in the Help forum on Sourceforge:

https://sourceforge.net/forum/forum.php?forum_id=739965

Posted

September 24, 2007

Projects

ray

Tags:

convert, dslibris, html, iso-8859, utf-8, xhtml

Comments

40 responses to “dslibris: Preparing books for reading”

Pulstar

October 3, 2007

I found that saving the (uncompliant) HTML document from within OpenOffice Writer and then converting via Tidy is much less of a hassle than using Acrobat or WinWord. Just a hint 😉
Stravy

October 11, 2007

In recent versions, openoffice can export directly in xhtml. The exported file works perfectly well for me in dslibris.
John H.

October 21, 2007

Been playing around with .30, but I keep getting weird As with dialetic marks over them sprinkled in places where there are extra spaces. Looking at the source however reveals no special HTML in those places, just an extra space. It’s rather mysterious.
Ray Haleblian

October 22, 2007

John, I’ve seen that before and my memory will kick in and tell us what it is. Just to be sure, maybe you can send me the file to give me a little jog.
Ray

November 8, 2007

http://htmltidy.movb.de/

is a convenient way to convert a URL to XHTML in UTF-8 encoding. Set the output encoding to UTF-8 and also select “asxml”. Save the document you get back with your browser “Save As…” menu item.
Lance

December 19, 2007

can’t get it to work it just not working well oh well
Ray Haleblian

December 21, 2007

Lance, sorry to hear that. Was it books that didn’t work, or simply getting the program to run?
Ray Haleblian

December 21, 2007

John, the unprintable chars may be UTF-8 that your text viewer can’t show. if you want to email me the file i’d like to see what happened.
Kijof

January 9, 2008

Using the export function on Abiword to xhtml worked fine to me with UTF-8.
Lou

January 14, 2008

Please help, I am a complete beginer ! I try to run tidy on the required file by typing in the comand above and it seemed to run (loads of text scrolled very quickly anyway!) but I cannot find the .xhtml file. Any ideas?
Lou

January 14, 2008

Re above comment, I found a newer copy of Tidy and have now got it working.
Russ Stutler

January 16, 2008

Is there any MacOS friendly drag and drop application out there that will convert text files into UTF-8 XHTML files without a lot of fiddling?
Kijof

January 17, 2008

One Tip:

Get Amaya Editor, create a Xhtml file and paste the text on it, very easy.
Steve

January 24, 2008

Kijof, how did you export to xhtml in UTF-8 using Abiword? I have Abiword but I can’t figure it out…
I have all the plugins.
Ess.uk

March 10, 2008

Any tips on converting .txt to .xhtml? I’m struggling to find some good software. I’ve tried “RTF to XHTML Converter” from Sautinsoft but it doesnt seem to recognise the .rtf after I changed the .txt format using wordpad…. I’ll keep trying though reading expands the mind, potentially. . . . 😉
Ray Haleblian

March 12, 2008

Ess, folks have used AbiWord and OpenOffice to write out XHTML. I suspect that Microsoft Word would work too. You may find that lines will not break at the best places unless you reformat the document by hand in Word etc; that’s a limitation of TXT format, why it’s not supported directly. Good luck
Matt

March 18, 2008

For txt file, you might wan to try to modify the content yourself.
Open up the txt file, and change it as below, and save as .xht:

type ur title here

leave whatever content here

For example:
DSLibris

An ebook reader for Nintendo DS.

When u open the file with dslibris, on the menu it will detect the file with title “DSLibris”, with the content “An ebook reader for Nintendo DS.”.
Matt

March 18, 2008

oops, my fault
html tag is not view-able here..
Victor

March 25, 2008

I recommend copy and paste on the Amaya web editor. Simple, quick and nifty. Moreover, the document looks pretty much the same on the computer and on the DS.
josh

April 14, 2008

im sorry this program sucks. alot. its confusing it isnt compatable with anything easy YOU should make a converter that does it in one step and i wasted way to much time trying.
Ray Haleblian

April 15, 2008

josh, i hope you have a better day tomorrow.
Robbie

April 16, 2008

I can’t get it to work, whenever I try to boot up dslibris, it loads up 3 different files, and the third one always fails. I’ve been to 2 other sites trying to fix the problem, and nothing seems to help. I’ve heard great things about the app and I’m sad knowing I can’t get it to work. I would blame it on the fact that I have Gn’M, but the most recent versions of DSOrganize and Moonshell work just fine. I’ve patched it, moved it around, re-patched it. I’ve redownloaded dslibris and the books, I’ve converted the books, re-do the width on the xhtml on ++Notepad, I’ve tried every suggestion given to me and I still cannot get it to run.
I’m going to go to bed now.
Tom

April 16, 2008

I spent ages trying to get this to work, but what with one thing and another, I just couldn’t get my PDFs into a tidy enough HTML to work. Using Open Office, I just saved it as HTMl, then exported straight to XHTML (File, export..). It worked like a charm. If you’re having problems with this then try Open Office!
Tom

April 16, 2008

Sorry for that last comment – I actually meant to post it on a different page, relating to converting files for use on an NDS eBook reader. Open Office was indeed the best solution for this, but I didn’t mean to claim that it was a suitable replacement for this software in every instance.
Ray Haleblian

April 16, 2008

Robbie, send me the file that doesn’t work and I bet we can figure out the problem.
Garrett

April 17, 2008

Anyone else getting weird gaps in the text every 2 lines or so? readable, but not very happy looking. Anyone else have this problem? I generally have been using amaya editor, although I tried the others, and they all turn out about the same.
Danny Chicago & New York City Kid

May 5, 2008

I use “Convert Doc”, it work with almost everything I have tried to convert. I’ve converted .txt to .htm, .pdf to .htm, I then run it through tidy.
nelson

May 13, 2008

Will someone share a program, or website or something that will convert .lit or .htm files to .xhtml without us having to know all these different programing languages? I have been trying for weeks now to convert my files for this program. Sometimes i get the red and white error screen, and sometimes it will open the book, but no pages to read.

I just want something simple that you click, and it converts it over.

Any help would be greatly appreciated.
Ray Haleblian

June 7, 2008

I’ve put the URL to an online Tidy service in the article.
yulbeast

July 9, 2008

This was my solution with a pdf. A bit long but it worked!!:
First I saved it with adobe acrobat as txt, but that gave me problems with the end of line, because it inserted a carriage return and I didn’t want that.

So I opened it with the ultra edit (a text editor) and removed all that carriage returns with the menu option: Format, convert CR/LF to line wrap. I also deleted all the new page characters (shown as a horizontal line in the ultraedit).

To create the htlm, I opened the txt with microsoft word and saved as filtered html

Last, I executed the tidy command:
tidy -asxhtml -numeric -o book.xhtml book.html

Now the book is clean!!

I have to say that ultraedit has his own tidy, but I was not able to make it work as desired: All the CR/LF disappeared, so I got an awful book in one single line. Maybe if I knew how to configure the options … (the meaning of asxhtml and numeric options)
Curtis007

July 21, 2008

I know this has got to be the cheapest cop out of the whole lot but it works for me.
Ya see when you install the files to the root dir, now having no real knowledge of html or what im doing…
I edited the the xht it came with..
here is the short of it

[Easy Tutorial]

Adding Books:

Very quick and easy open in notepad paste and when you save as MAKE SURE ITS IN UTF-8 format not ANSI!! and to save urself hastle cahnge the name from easy tutorial to something else..

Controls:

A/R/PAD-DOWN – forward one page
B/L/PAD-UP – backward one page
X – invert text
Y – change screen brightness
SELECT – toggle book browser

More info and tips:

PASTE TXT HERE!!

It may look wrapped diff but keep note of where the txt is contained.
Curtis007

July 21, 2008

crud just realised.. this will be of no help as you cant see where the mark up is. wish i could just upload the file for ya…its a crude fix but it’ll serve for reading books on the busses for me.sorry ppeps.
Ray Haleblian

August 3, 2008

To follow up Yulbeast’s comment: I’ve had some luck with PDF if I save it from Acrobat as HTML 3.01 and then remove all of the tags with global search and replace in a text editor. In some cases the paragraphs will still be correct.

Let me stress again – TXT and PDF formats throw away the formatting information that’s needed to lay a page out on a little DS screen, so there isn’t a robust means to convert these formats to work for mobile devices.
R.Zonde

September 24, 2008

here is what I used from converting from pdfs

pdf –> txt –> html –> xhtml

to get the text from the pdf, open the pdf in adobe and then go to file>save as text.

from the text, we use GutenMark [http://www.sandroid.org/GutenMark/download.html] to export the text file into html this program also formats the text file to be a suitable gutenberg etext file, which is useable on basically every pda type reader out currently.[Thus why we use it here]. One note to make to avoid backtracking: The override book title option is recommended because the title you put here will show up as the title of the book in dslibris. I made the mistake of not using this feature of GutenMark. so instead of having ‘The Clockwise Man’ as the title, GutenMark took the first line in my text as the title [which was Doctor Who]

Then lastly we use Html Tidy [http://infohound.net/tidy/] to turn the now html text into xhtml. make sure the export to xhtml option is on and also the utf8 encode option is on.
if you run windows just rightclick the save tidy file option and save as ‘yourtitle.xhtml’ make sure you type .xhtml when you save the file [because when you save it it’ll be recognized as a .txt file.

I hope this helps people who were having problems exported pdf files.
R.Zonde

September 24, 2008

ah I forgot to say

I use adobe reader 8 to get the plain text [not acrobat]
R.Zonde

September 25, 2008

for people who want a html tidy type user friendly program on their computer there’s Tidy UI [http://users.rcn.com/creitzel/tidy.html#tidyui]
Ray Haleblian

September 27, 2008

Great news about Adobe Reader 8.
Andrew

October 16, 2008

Look at these poor comments. What kind of book reader doesn’t support plain text? It’s trivial to remove carriage returns and keep double return/tabs as paragraph delimiters; you could do it programatically on the DS or it can be done in a text editor by the end user with a couple search/replaces. Your brain is stuck in this “I need my layout to be embedded in the file” mode, when it’s self-evident that all you need is plain text with no hard returns, and a tab or double returns between paragraphs for a vanilla book with no images. I’m not sure why you think the DS is a special case. All monitors have a fixed width, and words have to be wrapped when that width runs out. Strange all around to see a polished reader like this crippled by such tunnel vision. Step one should have been displaying Project Gutenberg vanilla ascii texts.
Ray Haleblian

October 18, 2008

@Andrew: thank you for your fine contribution.

In it, you take liberty to claim where my brain is at, specifically “need[ing] layout to be embedding in the file”, and then proceed to describe a file format whose layout is embedded in the file – an ASCII file that uses specific newline semantics to show layout.

And also thank you for reminding me what my priorities should have been. It should be clear at this point that my priority is having books format evolve toward standards that will help publications interoperate with many applications.

Despite my distaste for the tone of your posting, I would still be interested in seeing a document specification from PG that shows their text format is consistent enough to be able to reflow and maintain the intent of all prose, or otherwise a few hundred examples of documents from their library that prove this by induction. Or, if you have all PG TXT documents working in the book reader you’re developing you could send me some code.

Cheers, ray
Ray Haleblian

October 18, 2008

A note to readers – I’ll be closing comments on this post, in favor of posting on the Sourceforge site. Please don’t hesitate to post there, you can use the ‘Help’ section.

Postings there should be open to the public; please keep it coming. I’m not shutting things off because of recent postings, it takes more than that to truly piss me off these days. It’s really to have this information and commentary centralized with development and everything else on Sourceforge.

The Help forum is here:

https://sourceforge.net/forum/forum.php?forum_id=739965