ai ์—๊ฒŒ ๋จน์ผ ๋ฐ์ดํ„ฐ๋ฅผ ์š”๋ฆฌ์ค‘์ด๋‹ค.

๊ฐ€์ง€๊ณ  ์žˆ๋Š” epub๋“ค์ด ์กฐ๊ธˆ ์žˆ๋Š”๋ฐ ์ด๋Œ€๋กœ๋Š” ๋จน์ผ ์ˆ˜ ์—†์œผ๋‹ˆ ๋ชจ๋‘ text๋กœ ๋ฐ”๊ฟ”๋†”์•ผํ•œ๋‹ค.


๊ทธ๋Ÿฐ๋ฐ ์ƒ๊ฐ๋ณด๋‹ค ์ž๋ฃŒ๊ฐ€ ์—†์—ˆ๋‹ค.

ํŠนํžˆ ํ•œ๊ธ€๋“ค์ด ๋ชจ๋‘ ๊นจ์ ธ๋‚˜์™”๋‹ค.


calibre๋ฅผ ์ถ”์ฒœํ•˜๊ธฐ์— ์„ค์น˜ํ›„ convert ํ•ด๋ดค๋”๋‹ˆ ์ถœ๋ ฅ ํด๋”๋ฅผ ์„ ํƒํ•  ์ˆ˜ ์—†์–ด ์ด๊ฒƒ๋„ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ๊ฝค๋‚˜ ๊ท€์ฐฎ์€ ์ž‘์—…์ด์—ˆ๋‹ค - ํ•˜์ง€๋งŒ ํŒŒ์ด์ฌ์œผ๋กœ ๋๋‚ด ์‹คํŒจํ•œ๋‹ค๋ฉด ์ด๋ ‡๊ฒŒ๋ผ๋„ ์ž‘์—…ํ•œ ํ›„ txtํŒŒ์ผ๋“ค์„ ๋ชจ๋‘ ์ฐพ์•„ ํ•œ๋ฒˆ์— ๋ชจ์œผ๋Š” ์ฝ”๋“œ๋ฅผ ๋งŒ๋“ค ์ž‘์ •์ด์—ˆ๋‹ค.. ์ง€๋งŒ ํŒŒ์ด์ฌ์œผ๋กœ ํ•ด๊ฒฐํ–ˆ๋‹ค.

๋จผ์ € EbookLib๋ฅผ ์„ค์น˜ํ•œ๋‹ค

 

1. pip install EbookLib
https://pypi.org/project/EbookLib/

 

EbookLib

Ebook library which can handle EPUB2/EPUB3 and Kindle format

pypi.org

 

- doc ๋„ ์‚ดํŽด๋ณด์ž. ๊ทผ๋ฐ ์ข€ ๋ถ€์กฑํ•˜๋‹ค

https://docs.sourcefabric.org/projects/ebooklib/en/latest/tutorial.html#introduction

 

Tutorial — EbookLib 0.17 documentation

Creating EPUB from ebooklib import epub book = epub.EpubBook() EPUB has some minimal metadata requirements which you need to fulfil. You need to define unique identifier, title of the book and language used inside. When it comes to language code recommende

docs.sourcefabric.org

2. This is codes

import ebooklib
from ebooklib import epub
from bs4 import BeautifulSoup

book = epub.read_epub('./tear.epub')
result = book.get_metadata('DC', 'language') # ์–ด๋–ค ์–ธ์–ด๋กœ ๋˜์–ด ์žˆ๋Š”์ง€ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค

for idx, doc in enumerate(book.get_items_of_type(ebooklib.ITEM_DOCUMENT)):
    book = doc.content
    soup = BeautifulSoup(book, 'html.parser')
    soup = soup.select('p')
    for pTag in soup:
        print(pTag.text)

3. ์•Œ๋งž์€ ์˜ต์…˜์„ ์ฐพ๋Š”๋ฐ ๊ณ ์ƒ์„ ์ข€ ํ–ˆ๋‹ค.

- ํ•œ๊ธ€ ๋ฌธ์ œ์˜ ๊ฒฝ์šฐ stackoverflow๋Š” ํฐ ๋„์›€์ด ๋˜์ง€ ์•Š๋Š”๋‹ค. ๊ทธ๋ฆฌ๊ณ  ํ•œ๊ธ€ ๋ฌธ์ œ๋„ ์•„๋‹ˆ์—ˆ๋‹ค.

- idx๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด enumerate๋ฅผ ์ผ์ง€๋งŒ ์ด ๊ฒฝ์šฐ ๋ฒ”์šฉ์„ฑ์ด ๋–จ์–ด์ ธ ๊ทธ๋ƒฅ doc๋งŒ ์‚ฌ์šฉํ•œ๋‹ค. ์ฑ…์˜ ํŠน์ • ์ •๋ณด๋งŒ ๋ฝ‘์•„์„œ ์ฒ˜๋ฆฌํ•˜๊ณ  ์‹ถ๋‹ค๋ฉด ์“ฐ๋Š” ๊ฒƒ๋„ ์ข‹๊ฒ ๋‹ค.


4. ์ฑ… ์ œ๋ชฉ์€ book.get_metadata('DC', 'title') ์ด๋ ‡๊ฒŒ ๊ฐ€์ ธ์˜ฌ ์ˆ˜ ์žˆ๋‹ค.

- ์ด๊ฑธ๋กœ ํŒŒ์ผ๋ช…์„ ๋งŒ๋“ค์–ด ํด๋” ํ•˜๋‚˜์— ์ €์žฅํ•˜๋ฉด ๋ชจ๋“  epubํŒŒ์ผ์„ ํ•œ๋ฒˆ์— txt๋กœ ๋งŒ๋“ค์ˆ˜ ์žˆ๊ฒ ๋‹ค.

+ Recent posts