Livejournal provides a method to export your posts as XML. However this has to be done manually for every month of your blog. Also comments are exported separately. I wrote this tool to make exporting more convenient.
You will need Python 3 to use it.
This script will do the exporting. You will end up with
full blog contents in several formats. posts-html
folder
will contain basic HTML of posts and comments.
posts-markdown
will contain posts in Markdown format
with HTML comments and metadata necessary to
generate a static blog with Pelican.
posts-json
will contain posts with nested comments
in JSON format should you want to process them further.
This version of the script does not require you to make any modifications prior to running it. It will prompt you for the range of years you want to pull, then will ask for your LiveJournal username and password. It will use that to acquire the required session cookies. After this, the download process will begin.
This script will download your posts in XML into posts-xml
folder. Also it will create posts-json/all.json
file with
the same data in JSON format for convenient processing.
This script will download comments from your blog as
comments-xml/*.xml
files. Also it will create
comments-json/all.json
with all the comments data in
JSON format for convenient processing.
html2text
markdown
beautifulsoup4
requests
lxml
In the last lines of export.py
there's a condition if True:
.
Change True
to False
to skip the downloading step and go
directly to the processing of already downloaded data.