์ˆ˜๋…„๋งŒ์— ๋งŒ๋‚œ ์นœ๊ตฌ๋Š” ๋ฐ˜๊ฐ€์›Œํ•˜๋Š” ๊ฒƒ ๊ฐ™์•˜์ง€๋งŒ ์กฐ๊ธˆ ๋ฐ”๋น ๋ณด์˜€๋‹ค.

๋‚˜๋Š” ๋ฉ€๋ฆฌ์žˆ๋Š” ์นœ๊ตฌ๋ฅผ ๋งŒ๋‚˜๋Ÿฌ๊ฐ”๋‹ค. ์นœ๊ตฌ๋Š” (๋‚˜๋„ ์ž˜ ์•„๋Š” ์‚ฌ๋žŒ๋“ค์ด ๋งŽ์€) ์–ด๋–ค ๋ชจ์ž„์— ์ฐธ์„ํ•ด์•ผํ•œ๋‹ค๋ฉฐ ๋‚ด๊ฒŒ ์‹œ๊ฐ„์„ ๋‚ด๊ธฐ ์–ด๋ ค์›Œํ–ˆ๋‹ค.

๋‚˜๋Š” ์•Œ๊ฒ ๋‹ค๊ณ  ๋งํ•ด๋†“๊ณค ์นœ๊ตฌ๊ฐ€ ์ฐธ์„ํ•œ๋‹ค๋Š” ๋ชจ์ž„ ์žฅ์†Œ์— ๊นœ๋นก ๋†“๊ณ (?) ๊ฐ„ ๋ฌผ๊ฑด์ด ์žˆ๋‹ค๋ฉฐ ๋ฐฉ๋ฌธํ–ˆ๋‹ค.

๋‚ด๊ฐ€ ๋‹ค ์ž˜์•„๋Š” ์‚ฌ๋žŒ๋“ค(ํ›„๋ฐฐ๋“ค)์ด์—ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์‹ค์ œ๋กœ ๋‚ด๊ฐ€ ๋†“๊ณ  ๊ฐ„ ๋ฌผ๊ฑด์ด ์žˆ์—ˆ๋‹ค.

๋ฌผ๊ฑด์„ ๋“ค๊ณ  ๊ทธ ์žฅ์†Œ๋ฅผ ๋‚˜์™”๋‹ค.

 

๋‹ค์‹œ ์ง‘์œผ๋กœ ๋Œ์•„์˜ค๋Š” ๊ธธ์€ ๋ณต์žกํ–ˆ๋‹ค. ํ„ฐ๋ฏธ๋„๋กœ ์–ด๋–ป๊ฒŒ ๊ฐ€์•ผํ• ์ง€ ๋ชฐ๋ž๋‹ค.

ํ•œ์ชฝ์—๋Š” ๋“ฑ์‚ฐ๋ณต์„ ์ž…์€ ํ• ๋จธ๋‹ˆ, ํ• ์•„๋ฒ„์ง€๋“ค์ด ๋งŽ์ด ๊ณ„์…จ๋‹ค. ๋ณ€๋‘๋ฆฌ ์‹œ๊ณจ๋Š๋‚Œ์ด์—ˆ๋‹ค. ํ• ์•„๋ฒ„์ง€๋“ค์—๊ฒŒ ํ„ฐ๋ฏธ๋„๋กœ ์–ด๋–ป๊ฒŒ ๊ฐ€์•ผํ•˜๋Š”์ง€ ์—ฌ์ญธ๋‹ค.

ํ• ์•„๋ฒ„์ง€๋“ค์€ ๋‚ด๊ฐ€ ๋ฌป์ง€๋„ ์•Š๋Š” ์—ฌ๋Ÿฌ ์ด์•ผ๊ธฐ๋“ค์„ ๋“ค๋ ค์ฃผ์…จ๋‹ค.

๋‚˜๋Š” ์ž˜ ์ดํ•ดํ•˜์ง€ ๋ชปํ–ˆ์ง€๋งŒ ๊ฒฝ์ฒญํ•˜๋ฉฐ ๊ณ ๊ฐœ๋ฅผ ๋„๋–ก์˜€๋‹ค.

ํ• ๋จธ๋‹ˆ๋“ค์ด ์ฐฉํ•˜๋‹ค๋ฉฐ ์นญ์ฐฌํ•ด์ฃผ์…จ๋‹ค.

 

ํ•œ์ฐธํ›„ ๋‹ค์‹œ ํ˜ผ์ž๊ฐ€ ๋˜์—ˆ์ง€๋งŒ ์—ฌ์ „ํžˆ ํ„ฐ๋ฏธ๋„์ด ์–ด๋””์žˆ๋Š”์ง€ ์•Œ์ˆ˜ ์—†์—ˆ๋‹ค.

์นœ๊ตฌ์˜ ๊ทธ ๋ฐ˜๊ฐ€์›Œํ•˜๋ฉด์„œ๋„ ๊ท€์ฐฎ์•„ํ•˜๋Š” ํ‘œ์ •์„ ์žŠ์„ ์ˆ˜ ์—†๋‹ค.

 

์นœํ•œ ์นœ๊ตฌ๊ฐ€ ์žˆ์Œ์—๋„ ํ˜ผ์ž๊ฐ€ ๋œ ์ƒํ™ฉ์ด ๊ณ ํ†ต์Šค๋Ÿฌ์› ๋‹ค.

๊ทธ ๋ชจ์ž„์˜ ํ›„๋ฐฐ๋“ค๋„ ์˜ˆ์ „์—๋Š” ๋ชจ๋‘ ๋‚˜๋ฅผ ์ž˜ ๋”ฐ๋ฅด๋˜ ํ›„๋ฐฐ๋“ค์ด์—ˆ๋‹ค.

๊ทธ๋“ค์€ ๋‚˜๋ฅผ ๋ณด๊ธฐ๋งŒ ํ• ๋ฟ ์•„๋ฌด๋ง๋„ ํ•˜์ง€์•Š์•˜๋‹ค. ๋†“๊ณ  ๊ฐ”๋‹ค๋Š” ๋ฌผ๊ฑด์„ ์ฐพ๋Š” ์‹œ๋Š‰๋งŒํ–ˆ๋‹ค.

๊ทธ๊ณณ์—์„œ ๋‚˜๋Š” ์–ด์ฉŒ๋‹ค ์ด ์ง€๊ฒฝ์ด ๋œ ๊ฑด์ง€ ๋„๋ฌด์ง€ ์•Œ ์ˆ˜๊ฐ€ ์—†์—ˆ๋‹ค.

 

๊นจ์–ด๋‚œ ์ง€๊ธˆ, ๋‚ด๊ฐ€ ๊ทธ ์ด์œ ๋ฅผ ์•Œ๊ณ  ์žˆ๋‹ค๋Š” ๊ฑธ ๊นจ๋‹ฌ์•˜๋‹ค.

๋‹น์‹œ์—๋Š” ๋‚˜๋ฅผ ํ›Œ๋ฅญํ•œ ๊ฐœ๋ฐœ์ž๋ผ๊ณ  ์ƒ๊ฐํ–ˆ๋‹ค. ์ฒ˜์Œ๋ถ€ํ„ฐ ๋๊นŒ์ง€ ๋ฌด์–ธ๊ฐ€๋ฅผ ๋งŒ๋“œ๋Š” ๊ฒƒ๋„ ๋ฒŒ์จ ๋‘๋ฒˆ์งธ์˜€๋‹ค. ์ง€๊ธˆ์ฒ˜๋Ÿผ ํŒŒํŒŒ๊ณ ๋‚˜ ๊ตฌ๊ธ€ ๋ฒˆ์—ญ์ด ์ข‹์ง€ ์•Š๋˜ ์‹œ์ ˆ์ด์—ˆ๋‹ค. 

๋ˆ์„ ๋ฒŒ์ง€ ๋ชปํ•˜๋Š” ๊ฐœ๋ฐœ์ด ์ง€๊ธ‹์ง€๊ธ‹ํ–ˆ๋˜ ๋‚˜๋Š” ๋œฌ๊ธˆ์—†์ด ์˜์–ด ๊ณต๋ถ€์— ๋น ์ ธ์žˆ์—ˆ๋‹ค. ๋‚ด๊ฐ€ ๊ฐœ๋ฐœ์„ ์ž˜ํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋œ ๋ฐฉ๋ฒ•๋ก ์„ ์˜์–ด์— ์ ์šฉํ•ด ๋ณด๊ณ  ์‹ถ์—ˆ๋‹ค. ๋Œ์ด์ผœ๋ณด๊ฑด๋ฐ ๋‘˜๋‹ค ๊ฒฐ๊ณผ๊ฐ€ ๋ณ„๋กœ์˜€๋˜ ๊ฑด ์ด๋ ‡๊ฒŒ ํ•„์—ฐ์ ์ด์—ˆ๋‹ค. 

๊ทธ๋Ÿผ์—๋„ ์ด ์„œ๋น„์Šค์˜ ์•„์ด๋””์–ด๋Š” ๋‚˜์˜์ง€ ์•Š๋‹ค๊ณ  ์ƒ๊ฐํ•œ๋‹ค. ๋‚˜์ฒ˜๋Ÿผ ์ˆซ์ž ๋“ฃ๊ธฐ์— ์•ฝํ•œ ์‚ฌ๋žŒ์ด๋‚˜ ๋‚˜๋งŒ์˜ ํŠน์ • ์ƒํ™ฉ์„ ํ›ˆ๋ จํ•ด์•ผ ํ•˜๋Š” ์‚ฌ๋žŒ์ด๋ผ๋ฉด ๊ทธ๋Ÿฐ ๋ฌธ์žฅ๋“ค์„ ์ง‘์ค‘์ ์œผ๋กœ ๋ฐ˜๋ณตํ•ด ๋“ฃ๊ณ  ์ตํž ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ๋ฌผ๋ก  ๋‚˜๋Š” ์—ฌ์ „ํžˆ ์˜์–ด ์ˆซ์ž ๋“ฃ๊ธฐ์— ์•ฝํ•˜๋‹ค. ๊ทธ๋Ÿฌ๋‹ˆ ์ด ์„œ๋น„์Šค์˜ ์‹ ๋ขฐ ์—ฌ๋ถ€๋Š” ๋‚ด ํŒ๋‹จ์„ ๋„˜์–ด์„ ๋‹ค.

 

https://github.com/vEduardovich/sayToRemember

 

GitHub - vEduardovich/sayToRemember: ์˜์–ด๋“ฃ๊ธฐ ํŠนํ™” ์›น์„œ๋น„์Šค

์˜์–ด๋“ฃ๊ธฐ ํŠนํ™” ์›น์„œ๋น„์Šค. Contribute to vEduardovich/sayToRemember development by creating an account on GitHub.

github.com

https://www.youtube.com/watch?v=MSGr6z-tVTU 

๋„ฅ์Šจ๊ฒœ์ด์ง€๋งŒ ์ฐธ์‹ ํ•˜๊ณ  ์žฌ๋ฐŒ๋‹ค๋Š” ๋ง๋„ ์•ˆ๋˜๋Š” ์ด์•ผ๊ธฐ๋ฅผ ๋“ฃ๊ณ  ์‹œ์ž‘ํ–ˆ๋‹ค.

์‹œ์ž‘ 1์‹œ๊ฐ„๋งŒ์— ๋ถ„๋…ธ์˜ ๊ทธ๋งŒ๋‘๊ธฐ ํ›„ ๋‚˜๋งŒ ๊ทธ๋Ÿฐ๊ฑด๊ฐ€ ์•Œ์•„๋ณด๋‹ˆ ์—ญ์‹œ๋‚˜ ๋‚˜๋งŒ ๊ทธ๋Ÿฐ๊ฑด ์•„๋‹ˆ์—ˆ๋‹ค.

 

๋‚ด๊ฐ€ ๋”ฑ ์ € ์นจํˆฌ๋ถ€ ์˜์ƒ์ฒ˜๋Ÿผ ํ”Œ๋ ˆ์ดํ•˜๋‹ค๊ฐ€, ์ €๋ ‡๊ฒŒ ๋˜‘๊ฐ™์ด ๋ช‡๋ฒˆ ์ฃฝ๊ณ  ๊ทธ๋งŒ๋’€๋‹ค.

์•„๊ธฐ์ž๊ธฐํ•˜๊ฒŒ ๊ฐˆ๊ฑฐ๋ฉด ์˜›๋‚  ๊ณ ์ „๊ฒŒ์ž„ '๊ธˆ๊ด‘์„ ์ฐพ์•„์„œ'์ฒ˜๋Ÿผ ์ค‘์‹ฌ ํ”Œ๋ ˆ์ด ํ•˜๋‚˜๋ฅผ ์ฝ”์–ด์— ๋‘๊ณ  ๋ถ€์ž๊ฐ€ ๋œ๋‹ค๋Š” ๋ชฉํ‘œ์•„๋ž˜ ๊ณ๊ฐ€์ง€๋ฅผ ์ณค์–ด์•ผํ•˜๋Š”๋ฐ

์ด๊ฑด ๋ถ„์ฃผํ•˜๊ณ  ์žก๋‹คํ•˜๊ฒŒ ํ•  ๊ฒƒ๋“ค์ด ๋งŽ์€๋ฐ ๊ทธ ํ”Œ๋ ˆ์ด ๋ฐฉ์‹์ด ๊ฐ๊ฐ ๋‹ฌ๋ž๋‹ค.

 

๋ฏธ๋‹ˆ๊ฒŒ์ž„ ์—ฌ๋Ÿฌ๊ฐœ๊ฐ€ ๋ฌผ๋ฆฌ์ ์œผ๋กœ ๊ฒฐํ•ฉ๋˜์–ด ์žˆ์–ด ๋‚ด๊ฐ€ ๊ฒŒ์ž„์„ ํ•˜๋Š”๊ฒŒ ์•„๋‹ˆ๋ผ ๊ฒŒ์ž„์— ๋Œ๋ ค๊ฐ€๋Š” ๊ธฐ๋ถ„์ด๋‹ค.

๋ ˆ์ด์‹ฑ ๊ฒŒ์ž„์„ 1๋ถ„ํ•˜๊ณ  ํด๋ฆญ ๊ฒŒ์ž„์„ 1๋ถ„ํ•˜๊ณ  ๋ฐ์ดํŠธ ๊ฒŒ์ž„์„ 1๋ถ„ํ•˜๊ณ  ์Š›ํŒ… ๊ฒŒ์ž„์„ 1๋ถ„ํ•˜๊ณ  ๋‹ค์‹œ ๋ ˆ์ด์‹ฑ ๊ฒŒ์ž„์„ 1๋ถ„ํ•˜๊ณ  ๋Œ๊ณ  ๋„๋Š” ์ˆ˜๋ ˆ๋ฐ”ํ€ด.

์‹ ์•ˆ ์—ผ์ „ ๋…ธ์˜ˆ๊ฐ€ ๋œ ๊ฒƒ ๊ฐ™์€ ๊ธฐ๋ถ„์ด๋‹ค.

๋ฌด์—‡๋ณด๋‹ค ํ™•์‹คํ•˜๊ฑด ๋‚ด๊ฐ€ ์ฃผ์ธ๊ณต์€ ์•„๋‹ˆ์—ˆ๋‹ค. ๋‚˜๋Š” ๊ณ„์† ๊ทธ๋ƒฅ ๊ฒŒ์ž„์˜ ์‹œ์ค‘๋งŒ ๋“ค๊ณ  ์žˆ๋‹ค.

๋‚ด๊ฐ€ ์ฃผ๋„ํ•˜๋Š”๊ฒŒ ํ•˜๋‚˜๋„ ์—†๋‹ค๋Š” ๋ถ€๋ถ„์ด ๋„ˆ๋ฌด๋‚˜ ๋งˆ์Œ์— ์•ˆ๋“ค์—ˆ๋‹ค.

๋‚˜๋Š” ์žฌ๋ฏธ๋ฅผ ๋ชป๋Š๊ผˆ๋‹ค.

ํ•œ๋ฒˆ๋„ ๋„๋ฐ•์„ ํ•ด๋ณธ์  ์—†์œผ๋ฉด์„œ ์‹œ์ž‘ํ–ˆ๋˜ ํ•ด์™ธ์ถ•๊ตฌ ๋ฒ ํŒ… ์„œ๋น„์Šค. ๋ชจ๋“  ๊ฒƒ์ด ์„œํˆด๊ณ  ํž˜๋“ค์—ˆ๋‹ค. ์‚ฌ์—…์„ ์ง„ํ–‰ํ• ์ˆ˜๋ก ๊ณ„์† ์ด๋”๋ฆฌ์›€์„ ์žƒ์—ˆ๋‹ค. ์ด๋”๋ฆฌ์›€์„ ๋”ด ์‚ฌ๋žŒ๋“ค์€ ๊ณ„์† ๋ฐฉ๋ฌธํ–ˆ๊ณ  ์žƒ์€ ์‚ฌ๋žŒ๋“ค์€ ๋‹ค์‹œ ์˜ค์ง€ ์•Š์•˜๋‹ค.


๋ธ”๋ก์ฒด์ธ ํŒ์€ ์‚ฌ๊ธฐ๊พผ ์ฒœ์ง€์˜€๋‹ค. ์„œ๋ฒ„๋ฅผ ํ•ดํ‚น๋‹นํ•˜๊ณ  ํ™๋ณด๋ฅผ ๋ฏธ๋ผ๋กœ ์—ฌ๊ธฐ์ €๊ธฐ์„œ ์‚ฌ๊ธฐ๋ฅผ ๋‹นํ–ˆ๋‹ค. ๊ฒŒ๋‹ค๊ฐ€ ์ด๋”๋ฆฌ์›€ ๊ฐ€์Šค๋น„๋Š” ๋งค์ผ ์ถœ๋ ๊ฑฐ๋ ธ๋‹ค. ์–ด๋Š ๋‚ ์€ 30์›์ด์—ˆ๊ณ  ์–ด๋Š ๋‚ ์€ 1300์›์ด์—ˆ๋‹ค.

์„œ๋น„์Šค๋ฅผ ์ง„ํ–‰ํ–ˆ๋˜ 6๊ฐœ์›”๊ฐ„ ๊ฑฐ์˜ ์ž ์„ ์ž์ง€ ๋ชปํ–ˆ๋‹ค. ๋ถ€๋ชจ๋‹˜์€ ํ•ญ์ƒ ๊ฑฑ์ •ํ•˜์…จ๋‹ค. ๋ถ‰์€ ๋ˆˆ๊ณผ ๋Š˜์–ด์ง„ ๋‹คํฌ์„œํด์ด ๋„ˆ๋ฌด ์ง„ํ•˜๊ฒŒ ๊ฐ์ธ๋˜์–ด ์˜์›ํžˆ ์—†์–ด์งˆ ๊ฒƒ ๊ฐ™์ง€ ์•Š์•˜๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ฝ”๋กœ๋‚˜๋กœ ๋ชจ๋“  ์ถ•๊ตฌ ๋ฆฌ๊ทธ๊ฐ€ ์ค‘์ง€๋˜์—ˆ๋‹ค.

์„œ๋น„์Šค๋„ ์ค‘์ง€๋˜์—ˆ๋‹ค. ์ ์ž 1์–ต. ๊ทธ 1๋…„์ด ์•„๋งˆ๋„ ๋‚ด ์ธ์ƒ์—์„œ ๊ฐ€์žฅ ๋ฏฟ์„ ์ˆ˜ ์—†์„๋งŒํผ ์„ธ์ƒ์— ๋ฌด์–ธ๊ฐ€๋ฅผ ๋นŒ๋“œํ–ˆ๋˜ ์‹œ๊ธฐ์˜€๋‹ค. ์•„์‰ฌ์šด๊ฑด ๊ทธ๊ฒƒ์ด ๊ฒจ์šฐ ๋„๋ฐ• ์‚ฌ์ดํŠธ์˜€๋‹ค๋Š” ๊ฒƒ์ด๋‹ค. ๋‹ค์‹œ๋Š” ๊ทธ๋Ÿฐ๊ฒƒ์— ๋‚ด ์ธ์ƒ์„ ๊ฑธ์ง€ ์•Š์„ ๊ฒƒ์ด๋‹ค. ๊ทธ๋Ÿฐ ์˜๋ฏธ๋กœ ์ด ์„œ๋น„์Šค๋ฅผ ์™„์ „ ๊ณต๊ฐœํ•œ๋‹ค.

 

https://github.com/vEduardovich/whitebetting

 

GitHub - vEduardovich/whitebetting: https://wb.himion.com

https://wb.himion.com. Contribute to vEduardovich/whitebetting development by creating an account on GitHub.

github.com

* Whisper-WebUI ์„ค์น˜ ๋ฐ ์‹คํ–‰

- Whisper-webui๋Š” openAI์˜ whisper(https://github.com/openai/whisper)๋ฅผ ์“ฐ๊ธฐ ์‰ฝ๋„๋ก webUI๋กœ wrappingํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋‹ค.

- ๊ต‰์žฅํžˆ ํƒ์›”ํ•œ ์„ฑ๋Šฅ์„ ๋ฐœํœ˜ํ•œ๋‹ค. ์ž๋ง‰์ด ์—†๋Š” ์˜ํ™”๋‚˜ ๋‚ด์šฉ์ด ๊ถ๊ธˆํ–ˆ๋˜ ์ผ๋ณธ ์•ผ๋™, ์™ธ๊ตญ ์œ ํŠœ๋ธŒ ์˜์ƒ๋“ค์˜ ์ž๋ง‰์„ ์˜์–ด๋‚˜ ์ผ์–ด๋กœ ๋จผ์ € ์ƒ์„ฑํ•œ ํ›„, ์ด๊ฑธ ๋‹ค์‹œ ํ•œ๊ตญ์–ด๋กœ ๋ฐ”๊พธ๋ฉด ๋œ๋‹ค - ๋ฐ”๋กœ ํ•œ๊ธ€๋กœ ์ƒ์„ฑ๋„ ๋˜์ง€๋งŒ ๋ช‡๋ฒˆํ•ด๋ณด๋‹ˆ ํ€„๋Ÿฌํ‹ฐ๊ฐ€ ๋–จ์–ด์กŒ๋‹ค.

- max_length๊ฐ€ 200์œผ๋กœ ์ œํ•œ๋˜์–ด ์žˆ์–ด์„œ ํ•œ๋ฒˆ์— ์žฅ๋ฌธ์˜ ๊ธ€์„ ๋ฒˆ์—ญํ•˜์ง€ ๋ชปํ•œ๋‹ค. ๋ฒˆ์—ญ ๋ชจ๋ธ์€ facebook/nllb-200์ด๋‹ค.

- ํ•œ๋ฒˆ์— ๋Œ€๋Ÿ‰์˜ ๋ฒˆ์—ญ์„ ํ•˜๊ธฐ์œ„ํ•ด์„  ๋”ฐ๋กœ ์Šคํฌ๋ฆฝํŠธ๋ฅผ ๋งŒ๋“ค์–ด 200์ด๋‚ด๋กœ ๋ฌธ์žฅ์„ ๋Š์–ด์„œ ์ „๋‹ฌํ•ด์•ผ ํ•œ๋‹ค. ์ด๊ฑด ๋‹ค์Œ์— ๋งŒ๋“ค์–ด ๊ณต์œ ํ•  ๊ณ„ํš์ด๋‹ค.

 

1. ์ฝ”๋ฑ ์„ค์น˜

2. ์„ค์น˜

  • git clone https://github.com/jhj0517/Whisper-WebUI.git
  • ์•ˆ์— ๋“ค์–ด๊ฐ€์„œ ์œ„์—์„œ ์••์ถ• ํ‘ผ ์ฝ”๋ฑ ํด๋” ์ „์ฒด๋ฅผ ๋ถ™์—ฌ๋„ฃ๊ณ  ํ•ด๋‹น ๊ฒฝ๋กœ๋ฅผ path ๊ฑธ์–ด์ค€๋‹ค

3. install.bat์‹คํ–‰ - ํ•œ์ฐธ ๋ฐ›๋Š”๋‹ค
4. start-webui.bat ์‹คํ–‰

 

* ์‹คํ–‰

1. ๋จผ์ € Youtube ํƒญ์œผ๋กœ ๊ฐ€์„œ ์ฃผ์†Œ๋ฅผ ๋ถ™์—ฌ๋„ฃ๊ณ  generateํ•ด๋ณด์ž

2. ํ˜„์žฌ(23.06.26) ์•„๋ž˜์™€ ๊ฐ™์€ ์—๋Ÿฌ๊ฐ€ ๋œฌ๋‹ค.

pytube.exceptions.RegexMatchError: get_throttling_function_name: could not find match for multiple

3. ํ•ด๊ฒฐ์„ ์œ„ํ•ด ๊ตฌ๊ธ€๋ง์„ ํ•œ๋‹ค.

https://github.com/pytube/pytube/issues/1684

 

[BUG] FIX · Issue #1684 · pytube/pytube

fully working code pytube https://github.com/oncename/pytube/tree/master fix cipher.py function_patterns = [ # https://github.com/ytdl-org/youtube-dl/issues/29326#issuecomment-865985377 # https://g...

github.com

์ดํ‹€์ „์— ๋‚˜์˜จ ํ•ด๊ฒฐ์ฑ…์ด๋‹ค. ์œ ํŠœ๋ธŒ ํŒจํ„ด๋“ค์ด ์ž์ฃผ ๋ฐ”๋€Œ์–ด์„œ ์—…๋ฐ์ดํŠธ๊ฐ€ ๋ฏธ์ณ ๋ชป๋”ฐ๋ผ๊ฐ€๋Š” ๊ฒƒ ๊ฐ™๋‹ค.

function_patterns = [
    # https://github.com/ytdl-org/youtube-dl/issues/29326#issuecomment-865985377
    # https://github.com/yt-dlp/yt-dlp/commit/48416bc4a8f1d5ff07d5977659cb8ece7640dcd8
    # var Bpa = [iha];
    # ...
    # a.C && (b = a.get("n")) && (b = Bpa[0](b), a.set("n", b),
    # Bpa.length || iha("")) }};
    # In the above case, `iha` is the relevant function name
    r'a\.[a-zA-Z]\s*&&\s*\([a-z]\s*=\s*a\.get\("n"\)\)\s*&&.*?\|\|\s*([a-z]+)',
    r'\([a-z]\s*=\s*([a-zA-Z0-9$]+)(\[\d+\])?\([a-z]\)',
]

์•„๋ž˜์™€ ๊ฐ™์ด Whisper-WebUI\venv\Lib\site-packages\pytube ํด๋”๋กœ ์ด๋™ํ›„ cipher.py ํŒŒ์ผ์„ ์—ฐํ›„

๊ธฐ์กด์— ์žˆ๋˜ ์•„๋ž˜ ์ฝ”๋“œ๋ฅผ

r'a\.[a-zA-Z]\s*&&\s*\([a-z]\s*=\s*a\.get\("n"\)\)\s*&&\s*'

์ƒˆ๋กœ์šด ์•„๋ž˜ ์ฝ”๋“œ๋กœ ๋ฐ”๊ฟ” ๋ถ™์—ฌ๋„ฃ๊ธฐํ•˜์ž.

r'a\.[a-zA-Z]\s*&&\s*\([a-z]\s*=\s*a\.get\("n"\)\)\s*&&.*?\|\|\s*([a-z]+)',

์ด์ œ ์œ ํŠœ๋ธŒ ์ฃผ์†Œ๋ฅผ ๋‹ค์‹œ ๋„ฃ๊ณ  generateํ•˜๋ฉด ์˜ค๋ฅ˜์—†์ด ์ž˜๋œ๋‹ค.

 

'AI ์Œ์„ฑ' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

xtts-webui๋กœ coqui ์„ค์น˜ํ•˜๊ธฐ  (0) 2024.07.01
coqui tts(xtts) v2 ์‚ฌ์šฉ๊ธฐ ์ •๋ฆฌ  (0) 2024.05.17
xtts๊ฐ€ ํ›จ์”ฌ ์ข‹๋‹ค  (1) 2024.05.11
Open Voice V1 ๋Œ๋ฆฌ๊ธฐ  (0) 2024.05.10

๋””์ž์ด๋„ˆ๊ฐ€ ์•„๋‹Œ ์‚ฌ๋žŒ์—๊ฒŒ bi๋‚˜ ci๋ฅผ ๋งŒ๋“œ๋Š” ์ผ์€ ์–ธ์ œ๋‚˜ ๊ณค์š•์ด๋‹ค.

์–ด์ฉŒ๋ฉด ๋””์ž์ด๋„ˆ์—๊ฒŒ๋„ ์‰ฌ์šด์ผ์€ ์•„๋‹๊ฒƒ์ด๋‹ค.

 

๊ฐ€์žฅ ํฐ ๋ฌธ์ œ๋Š” ๋‚˜์กฐ์ฐจ ๋‚ด๊ฐ€ ์›ํ•˜๋Š”๊ฒŒ ๋ฌด์—‡์ธ์ง€ ๋ชจ๋ฅธ๋‹ค๋Š” ๊ฒƒ์ธ๋ฐ ์ด ๋ฌธ์ œ๋Š” ์Šคํ…Œ์ด๋ธ” ๋””ํ“จ์ „์œผ๋กœ ๊น”๋”ํ•˜๊ฒŒ ํ•ด๊ฒฐ๋œ๋‹ค. ์ปจ์…‰์„ ์ฃผ๊ณ  ์ฒดํฌํฌ์ธํŠธ์™€ ํ”„๋กฌํ”„ํŠธ๋ฅผ ๋ฐ”๊ฟ”๊ฐ€๋ฉฐ 300๊ฐœ ์ •๋„ ๋ฝ‘์•„๋ณด๋‹ˆ ๋งˆ์Œ์— ๋“œ๋Š” ๊ฒƒ๋งŒ ์ˆ˜์‹ญ๊ฐœ๊ฐ€ ๋‚˜์˜จ๋‹ค.

 

lora ๋Š” ์ฝ”๋žฉ์œผ๋กœ ๋งŒ๋“ค์–ด๋„ ์ข‹๊ณ  ๋‚˜์ฒ˜๋Ÿผ kohya_gui๋กœ ๋งŒ๋“ค์–ด๋„ ์ข‹๋‹ค.

https://github.com/bmaltais/kohya_ss

 

1. kohya๋ฅผ ์„ค์น˜ํ•œ ํ›„

2. https://blog.himion.com/175 ๋‚˜ https://blog.himion.com/176 ์˜ ์Šคํฌ๋ฆฝํŠธ๋ฅผ ์ด์šฉํ•ด ๋„ค์ด๋ฒ„์™€ ๊ตฌ๊ธ€์—์„œ bi ์ด๋ฏธ์ง€๋“ค์„ ์ˆ˜์ง‘ํ•œ๋‹ค. 

3. ๊ทธ ์ค‘ ํ€„๋Ÿฌํ‹ฐ๊ฐ€ ๊ดœ์ฐฎ์€ ๊ฒƒ๋“ค๋งŒ ์ถ”๋ฆฐ๋‹ค. ๋‚˜๋Š” 100๊ฐœ ์ •๋„๋งŒ ์‚ฌ์šฉํ–ˆ๋‹ค.

4. kohya์˜ Utilies > Captioning > BLIP Captioning - ๊ตฌ๊ธ€์˜ ๋น„์ „์„ ์‚ฌ์šฉํ•œ๋‹ค. ์ด๋ฏธ์ง€์— ์บก์…˜์„ ๋ชจ๋‘ ๋‹ฌ๊ณ  kohya ๋‚ด๋ถ€์— ํด๋”๋ฅผ ํ•˜๋‚˜ ๋งŒ๋“ค์–ด ๋„ฃ๋Š”๋‹ค.

5. kohya์˜ Dreambooth LoRA๋ฅผ ์ด์šฉํ•ด ๋กœ๋ผ๋ฅผ ๋งŒ๋“ ๋‹ค. ๋‚˜๋Š” 4090 ๋•๋ถ„์— batch size๋ฅผ 8๋กœ ํ‚ค์šฐ๊ณ  Epoch์„ 1๋กœ ์ค„์—ฌ๋„ ํ€„๋ฆฌ ์ข‹๊ฒŒ ๋‚˜์™”๋‹ค. ์•ฝ 100์žฅ์œผ๋กœ bi ๋กœ๋ผ๋ฅผ ๋งŒ๋“œ๋Š”๋ฐ 1์‹œ๊ฐ„ 15๋ถ„์ด ๊ฑธ๋ ธ๋‹ค.

6. ํ”„๋กฌํ”„ํŠธ๋ฅผ ์•„๋ž˜์™€ ๊ฐ™์ด ๋งŒ๋“  ํ›„ 300๋ฐฑ์žฅ ์ •๋„ ๋ฝ‘์•˜๋‹ค.

Model: 2dn_1, Version: v1.2.1
positive
<lora:brand_identity:1>, logo, bi, ci, brand, text, "Wendy", beyond the world, dimly ufo of ghost, (moon:0.3), sky, <lora:weird:0.4>
Negative prompt: easynegative, ng_deepnegative_v1_75t, ((worst quality)), ((low quality)), easynegative,
Steps: 20, Sampler: Euler a, CFG scale: 4.5, Seed: 366227024, Size: 512x512, 

> ๊ฒฐ๋ก 

1. ๋‚ด๊ฐ€ ์ƒ์ƒ๋„ ๋ชปํ–ˆ๋˜ ๋‹ค์–‘ํ•œ ๋ถ„์œ„๊ธฐ์˜ bi๋“ค์ด ๋ฝ‘์ธ๋‹ค.

2. ์ฒดํฌํฌ์ธํŠธ์— ๋”ฐ๋ผ ๋ถ„์œ„๊ธฐ๊ฐ€ ๋”์šฑ ๋‹ค์–‘ํ•ด์ง„๋‹ค.

3. Sampler๋Š” ์ฒ˜๋ฆฌ๊ฐ€ ๋‹จ์ˆœํ•œ๊ฒŒ ์ข‹๋‹ค. ์‹ค์‚ฌ ์ด๋ฏธ์ง€๋ฅผ ๋ฝ‘๋Š”๊ฒŒ ์•„๋‹ˆ๋‹ˆ๊นŒ.

4. ํ…์ŠคํŠธ๋Š” ์ •ํ™•ํ•œ ์‚ฝ์ž…์ด ์–ด๋ ต๋‹ค. ํ…์ŠคํŠธ ์ค‘์‹ฌ ๋กœ๊ณ ๋Š” ์ข€ ๋” ๊ณ ๋ฏผํ•ด์•ผ ํ•  ๊ฒƒ ๊ฐ™๋‹ค.

5. ๊ธฐ๋ณธ์ ์ธ ํฌํ† ์ƒต, ์ผ๋Ÿฌ๊ฐ€ ๊ฐ€๋Šฅํ•œ ์‚ฌ๋žŒ์€ ์ฐ์–ด๋‚ด๋“ฏ ๋งŒ๋“ค์–ด๋‚ผ์ˆ˜๋„ ์žˆ๊ฒ ๋‹ค. ํ™˜์ƒ์ ์ด๋‹ค.

11์‹œ์— ์ผ์–ด๋‚ฌ๋‹ค.

๊ณ„์† ์ค‘์–ผ๋Œ€๊ณ  ์žˆ์—ˆ๋‹ค.

์„ฑ๊ณตํ•˜๋Š” ์ฐฝ์—…๊ฐ€๋ผ๋ฉด ์–ด๋–ป๊ฒŒ ํ–‰๋™ํ–ˆ์„์ง€์— ์ƒ๊ฐ์ด ๋ฏธ์ณค๋‹ค.

 

๋„ˆ๋ฌด ๋Šฆ๋‹ค. ๋Šฆ์–ด๋„ ๋„ˆ๋ฌด ๋Šฆ๋‹ค.

์–ด์ฉŒ๋ฉด ๋ชจ๋‘๊ฐ€ ๊ทธ๋Ÿฐ ๋‚Œ์ƒˆ๋ฅผ ๋ˆˆ์น˜์ฑ˜๋Š”์ง€ ๋ชจ๋ฅธ๋‹ค.

๊ทธ๋ž˜์„œ ๋‹ค๋“ค ๊ทธ๋žฌ๋Š”์ง€ ๋ชจ๋ฅธ๋‹ค..(๋ถ„๋ช… ๊ทธ๋žฌ์„ ๊ฒƒ์ด๋‹ค)

 

์‹ฑํฌ๋Œ€์— ์Œ“์ธ ์‹๊ธฐ๋“ค์„ ๋‹ฆ์œผ๋ฉฐ

๊นจ๋—ํžˆ ์ค€๋น„๊ฐ€ ์™„๋ฃŒ๋˜์—ˆ๋‹ค๋Š” ๊ฑธ ์•Œ์•˜๋‹ค. ๋‚˜๋Š” ๋Šฆ์„ ์ด์œ ๊ฐ€ ์—†์—ˆ๋‹ค.

์ž๊พธ ๋Šฆ์œผ๋‹ˆ๊นŒ ๋‚ด ์‹ฌ์žฅ์กฐ์ฐจ ๋‘๊ทผ๊ฑฐ๋ฆฌ์ง€ ์•Š๋Š” ๊ฒƒ์ด์—ˆ๋‹ค.

 

์„ฑ๊ณตํ•˜๋Š” ์ฐฝ์—…๊ฐ€๋ผ๋ฉด ์–ด๋–ป๊ฒŒ ํ–‰๋™ํ–ˆ์„์ง€ ์ƒ๊ฐํ–ˆ๋‹ค.

๊ทธ๋“ค์€ ์ƒˆ๋ฒฝ 5์‹œ์— ์žค๋‹ค๋Š” ์ด์œ ๋ฅผ ํ•‘๊ณ„๋กœ ๋Œ€์ง€ ์•Š์•˜์„ ๊ฒƒ์ด๋‹ค.

์„ฑ๊ณตํ•œ ์ ์ด ์—†์–ด์„œ ํ™•์‹คํ•˜์ง„ ์•Š์ง€๋งŒ ์ด๋Œ€๋กœ ์„ฑ๊ณตํ•  ์ˆ˜ ์—†๋‹ค๋Š” ๊ฑด ์„ฑ๊ณตํ•œ ์ ์ด ์—†๋Š” ๋‚˜๋„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.

 

๋Šฆ์Œ์€ ๋‚ฉ๋“ ๊ฐ€๋Šฅํ•˜๋‹ค.

ํ•˜์ง€๋งŒ ๋‚ด ๊ฟˆ์€ ๋‚ฉ๋“ ๊ฐ€๋Šฅํ•œ ๊ฒƒ์ด ์•„๋‹ˆ๋‹ค.

๋ง์ด ๋˜์ง€ ์•Š๋Š” ๊ฒƒ์„ ๋ง์ด ๋˜๊ฒŒ ์ด๋ค„๋‚ผ ์ˆ˜๋Š” ์—†๋‹ค.

 

์†๋„๋ฅผ ๋†’์—ฌ์•ผ ํ•œ๋‹ค.

* ๋ธŒ๋ผ์šฐ์ €๋ฅผ ๋„์šฐ์ง€ ์•Š๊ณ  ๋ฉ”๋ชจ๋ฆฌ์—๋งŒ selenium์„ ๋„์›Œ ํฌ๋กค๋งํ•˜๊ธฐ

  1. ๊ธฐ์กด ์ฝ”๋“œ( https://blog.himion.com/176 )์™€ ๊ฑฐ์˜ ์œ ์‚ฌํ•˜๋‹ค.
  2. headlessDriver() ํ•จ์ˆ˜๊ฐ€ ์ถ”๊ฐ€๋˜์—ˆ๋‹ค
'''
* ๊ตฌ๊ธ€ ์ด๋ฏธ์ง€ ๊ฐ€์ ธ์˜ค๊ธฐ ver. Headless
'''

import os
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import undetected_chromedriver as uc

import urllib
import time, datetime

ITEM_LIST = [ "Keith Thompson", "Zdzislaw Beksinski", "dariusz zawadzki"] # 1๋ฒˆ
FOLDER = 'google' # 2๋ฒˆ
IMG_XPATH = '//*[@id="Sva75c"]/div[2]/div/div[2]/div[2]/div[2]/c-wiz/div/div/div/div[3]/div[1]/a/img[1]' # 3๋ฒˆ
SIGNINURL = 'https://accounts.google.com/signin/v2/identifier?hl=ko&passive=true&continue=https%3A%2F%2Fwww.google.com%2F&ec=GAZAmgQ&flowName=GlifWebSignIn&flowEntry=ServiceLogin'
ID = 'xxxx@gmail.com' # 4๋ฒˆ
PASSWORD = 'xxxx' # 5๋ฒˆ

def main():
  start = check_start() # ์‹œ๊ฐ„ ์ธก์ • ์‹œ์ž‘
  driver = headlessDriver()# headless๋ฅผ ์ ์šฉํ•˜๊ณ  ์‹ถ์„๋•Œ
  driver.get(SIGNINURL)
  googleSignIn(driver)# ๊ตฌ๊ธ€๋กœ๊ทธ์ธํ•˜๊ณ 
  
  for searchItem in ITEM_LIST:
    saveDir = makeFolder(searchItem)
    
    url = makeUrl(searchItem)# ๊ฒ€์ƒ‰ํ•  url ๊ฐ€์ ธ์™€์„œ
    driver.get(url)# ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰์œผ๋กœ ๊ฐ€์„œ
    maximizeWindow(driver)# ์ฐฝ์ตœ๋Œ€ํ™”
    scrollToEnd(driver)

    forbiddenCount = saveImgs(driver, saveDir, start)# ๋ชจ๋“  ์ƒ์„ธ ์ด๋ฏธ์ง€ src๋“ค์„ ๊ฐ€์ ธ์˜จ๋‹ค
    sec = check_time(start)
    print(f'์‹คํŒจ์ˆ˜{str(forbiddenCount)}, {sec}, {datetime.datetime.now().time()}')
  time.sleep(10)
  driver.quit() 
  
def headlessDriver():
  options = uc.ChromeOptions()
  options.headless=True
  options.add_argument('--headless=new')
  driver = uc.Chrome(options=options)
  return driver

# ๊ตฌ๊ธ€ ๋กœ๊ทธ์ธ
def googleSignIn(driver):
  idBtn = driver.find_element(By.XPATH,'//*[@id="identifierId"]')# id ์ž…๋ ฅ์นธ
  idBtn.send_keys(ID)
  nextBtn = driver.find_element(By.XPATH,'//*[@id="identifierNext"]/div/button')
  nextBtn.click()# ๋‹ค์Œ ๋ฒ„ํŠผ ํด๋ฆญ

  # ์•„๋ž˜ ์ฝ”๋“œ๋Š” ๋น„๋ฐ€๋ฒˆํ˜ธ ์š”์†Œ๊ฐ€ ํ™”๋ฉด์— ๋‚˜ํƒ€๋‚ ๋•Œ๊ฐ€์ง€ 10์ดˆ๊ฐ„ ๊ธฐ๋‹ค๋ฆฌ๋Š” ์ฝ”๋“œ์ด๋‚˜
  # ๋น„๋ฒˆ์˜ ๊ฒฝ์šฐ not interactive elem๋ผ์„œ ์—๋Ÿฌ๊ฐ€ ๋œฌ๋‹ค. ํ•˜์ง€๋งŒ ๋Œ์•„๊ฐ€๋Š” ์ฝ”๋“œ์ด๋‹ˆ ๊ธฐ๋‹ค๋ฆผ์ด ํ•„์š”ํ• ๋•Œ ์“ฐ์ž.
  try:
    passwordBtn = WebDriverWait(driver, timeout=10).until(EC.presence_of_element_located( (By.XPATH,'//*[@id="password"]/div[1]/div/div[1]/input') ))
    time.sleep(4)
    passwordBtn = driver.find_element(By.XPATH,'//*[@id="password"]/div[1]/div/div[1]/input')# ๋น„๋ฐ€๋ฒˆํ˜ธ ์ž…๋ ฅ์นธ
    passwordBtn.send_keys(PASSWORD)
    passwordNextBtn = driver.find_element(By.XPATH,'//*[@id="passwordNext"]/div/button')
    passwordNextBtn.click()# ๋น„๋ฐ€๋ฒˆํ˜ธ ๋‹ค์Œ ๋ฒ„ํŠผ
    print('๊ตฌ๊ธ€ ๋กœ๊ทธ์ธ ์„ฑ๊ณต')
    # driver.implicitly_wait(10)
  except OSError as e:
    print(e)
    
  time.sleep(20)# ํœด๋Œ€ํฐ ๋ณธ์ธ ์ธ์ฆ๋“ฑ์˜ ์‹œ๊ฐ„์ด ์ถฉ๋ถ„ํžˆ ํ•„์š”ํ•˜๋‹ค


# ๊ตฌ๊ธ€ ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰ url ๋งŒ๋“ค๊ธฐ
def makeUrl(searchItem):
  url = 'https://www.google.com/search'
  params ={# q์™€ tbm์ด ํ•„์ˆ˜
    'q'     : searchItem,
    'tbm'   : 'isch',
  }
  url = url + '?' + urllib.parse.urlencode(params)
  return url


# ํด๋” ์ƒ์„ฑ
def makeFolder(searchItem):
  saveDir = os.path.join(os.getcwd(), 'data', f'{FOLDER}_{searchItem}')
  try:
    if not(os.path.isdir(saveDir)): # ํ•ด๋‹น ํด๋”๊ฐ€ ์—†๋‹ค๋ฉด
      os.makedirs(os.path.join(saveDir)) # ๋งŒ๋“ค์–ด๋ผ
    return saveDir
  except OSError as e:
    print(e+'ํด๋” ์ƒ์„ฑ ์‹คํŒจ')

# ์ฐฝ ์ตœ๋Œ€ํ™”
def maximizeWindow(driver):
  driver.maximize_window()

# ๋ชจ๋“  ์ด๋ฏธ์ง€ ๋ชฉ๋ก์„ ๊ฐ€์ ธ์˜ค๊ธฐ ์œ„ํ•ด ๋ฌดํ•œ ์Šคํฌ๋กค ๋‹ค์šด
def scrollToEnd(driver):
  prev_height = driver.execute_script('return document.body.scrollHeight')
  print(f'prev_height: {prev_height}')
  
  while True:
    time.sleep(1) #๋„ค์ด๋ฒ„๋Š” sleep์—†์ด ์ด๋™ํ•  ๊ฒฝ์šฐ ๋ฌดํ•œ๋กœ๋”ฉ์— ๊ฑธ๋ฆฐ๋‹ค.
    driver.execute_script('window.scrollTo(0, document.body.scrollHeight)')
    time.sleep(3)
    
    cur_height = driver.execute_script('return document.body.scrollHeight')
    print(f'cur_height: {cur_height}')
    if cur_height == prev_height:
      print('๋†’์ด๊ฐ€ ๊ฐ™์•„์ง')
      break
    prev_height = cur_height
  # ํŽ˜์ด์ง€๋ฅผ ๋ชจ๋‘ ๋กœ๋”ฉํ•œ ํ›„์—๋Š” ์ตœ์ƒ๋‹จ์œผ๋กœ ๋‹ค์‹œ ์˜ฌ๋ผ๊ฐ€๊ธฐ
  driver.execute_script('window.scrollTo(0, 0)')

# ๋ชจ๋“  ์ด๋ฏธ์ง€๋“ค์„ ์ €์žฅํ•œ๋‹ค
def saveImgs(driver, saveDir, start):
  time.sleep(1)
  forbiddenCount = 0
  imgs = driver.find_elements(By.CSS_SELECTOR, '.rg_i.Q4LuWd')
  img_count = len(imgs)
  print(f'์ „์ฒด ์ด๋ฏธ์ง€์ˆ˜ : {img_count}')
  # ํ•˜๋‚˜์”ฉ ํด๋ฆญํ•ด๊ฐ€๋ฉฐ ์ €์žฅ
  for imgNum, img in enumerate(imgs): # imgNum์— ์ด๋ฏธ์ง€๋ฒˆํ˜ธ๊ฐ€ 0๋ถ€ํ„ฐ ๋“ค์–ด๊ฐ„๋‹ค
    try:
      img.click()
      time.sleep(3)
      
      # ์•„๋ž˜์˜ xPath๋Š” ์ž์ฃผ ๋ฐ”๋€Œ๋Š” ๊ฒƒ ๊ฐ™๋‹ค. ๋‚˜๋จธ์ง€๋Š” ๊ณ ์ •์ธ๊ฑฐ ๊ฐ™์œผ๋‹ˆ ์ด๊ฒƒ๋งŒ ๊ฐ€๋” ํ™•์ธํ•ด์ฃผ์ž
      bigImg = driver.find_element(By.XPATH, IMG_XPATH)
      src = bigImg.get_attribute('src')
      urllib.request.urlretrieve(src, saveDir + '/' + str(imgNum) + '.jpg')
      sec = check_time(start)
      print(f'{imgNum+1}/{img_count} saved {sec}')

    except Exception as e:
      print(e)
      forbiddenCount += 1# ์ €์žฅ ์‹คํŒจํ•œ ๊ฐœ์ˆ˜. forbidden์ด๋‚˜ ํŒŒ์ผ์—๋Ÿฌ๋„ ๊ฝค ๋งŽ๋‹ค
      continue
  return forbiddenCount


# ์‹œ๊ฐ„ ์ธก์ •
def check_start():
    start_time = time.time()
    print("Start! now.." + str(start_time))
    return start_time
def check_time(start):
    end = time.time()
    during = end - start
    sec = str(datetime.timedelta(seconds=during)).split('.')[0]
    return sec
main()

* ๋กœ๊ทธ์ธ์„ ํ•˜๊ณ  ํฌ๋กค๋ง์„ ํ•˜๋Š” ์ด์œ 

  1. ๊ตฌ๊ธ€์˜ ๊ฒฝ์šฐ ๋กœ๊ทธ์ธํ•˜๊ณ  ๋‚˜์˜ค๋Š” ์ด๋ฏธ์ง€์™€ ๋กœ๊ทธ์ธ์„ ํ•˜์ง€ ์•Š๊ณ  ๋‚˜์˜ค๋Š” ์ด๋ฏธ์ง€ ๋ชฉ๋ก์ด ๋‹ค๋ฅผ๋•Œ๊ฐ€ ๋งŽ๋‹ค.
  2. ์„ฑ์ธ์ธ์ฆ์ด ํ•„์š”ํ•œ ์ด๋ฏธ์ง€๋“ค์€ ๋กœ๊ทธ์ธ์„ ํ•ด์•ผ๋งŒ ๊ฐ€์ ธ์˜ฌ์ˆ˜ ์žˆ๋‹ค.

 

* ์‚ฌ์šฉ๋ฒ•

  1. ์ •์ƒ์ ์œผ๋กœ ํฌ๋กค๋ง๋˜๋Š”์ง€ ํ™•์ธ์™„๋ฃŒ [23.06.20]
  2. ๋ชจ๋“ˆ ์„ค์น˜ - pip install undetected_chromedriver selenium
  3. ์ฃผ์„ 1๋ฒˆ์— ์ด๋ฏธ์ง€๋ฅผ ์›ํ•˜๋Š” ๊ฒ€์ƒ‰์–ด ๋ชฉ๋ก ์ž…๋ ฅ
  4. ์ฃผ์„ 2๋ฒˆ์— ํด๋”์ด๋ฆ„ ์ž…๋ ฅ. ์ด๋ฏธ์ง€๋Š” data\google\ ์•„๋ž˜ ์ €์žฅ๋จ
  5. ์ฃผ์„ 3๋ฒˆ์— ์ƒ์„ธ์ด๋ฏธ์ง€์˜ xPath ์ž…๋ ฅ. ๊ตฌ๊ธ€์˜ ๊ฒฝ์šฐ ์ž์ฃผ ๋ฐ”๋€Œ๋Š” ๊ฒƒ ๊ฐ™๋‹ค.
  6. ์ฃผ์„ 4๋ฒˆ์— ๊ตฌ๊ธ€ ID์ž…๋ ฅ
  7. ์ฃผ์„ 5๋ฒˆ์— ๊ตฌ๊ธ€ ๋น„๋ฐ€๋ฒˆํ˜ธ ์ž…๋ ฅ. ์ดํ›„ ์ถ”๊ฐ€๋กœ ์Šค๋งˆํŠธํฐ ์ธ์ฆํ™”๋ฉด์ด ๋œฐ ๊ฒฝ์šฐ์— ๋Œ€๋น„ํ•ด 20์ดˆ๊ฐ„ ๊ธฐ๋‹ค๋ฆฐ๋‹ค.
'''
* ๊ตฌ๊ธ€ ์ด๋ฏธ์ง€ ๊ฐ€์ ธ์˜ค๊ธฐ (23.06.20)
'''

import os
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import undetected_chromedriver as uc

import urllib
import time, datetime

ITEM_LIST = [ "Keith Thompson", "Zdzislaw Beksinski", "dariusz zawadzki"] # 1๋ฒˆ
FOLDER = 'google' # 2๋ฒˆ
IMG_XPATH = '//*[@id="Sva75c"]/div[2]/div/div[2]/div[2]/div[2]/c-wiz/div/div/div/div[3]/div[1]/a/img[1]' # 3๋ฒˆ
SIGNINURL = 'https://accounts.google.com/signin/v2/identifier?hl=ko&passive=true&continue=https%3A%2F%2Fwww.google.com%2F&ec=GAZAmgQ&flowName=GlifWebSignIn&flowEntry=ServiceLogin'
ID = 'xxxx@gmail.com' # 4๋ฒˆ
PASSWORD = 'xxxx' # 5๋ฒˆ

def main():
  start = check_start() # ์‹œ๊ฐ„ ์ธก์ • ์‹œ์ž‘
  driver = uc.Chrome()# ๊ตฌ๊ธ€๋กœ๊ทธ์ธ์„ ์œ„ํ•œ ๋ชจ๋“ˆ์„ ์ผœ๊ณ 
  driver.get(SIGNINURL)
  googleSignIn(driver)# ๊ตฌ๊ธ€๋กœ๊ทธ์ธํ•˜๊ณ 
  
  for searchItem in ITEM_LIST:
    saveDir = makeFolder(searchItem)
    
    url = makeUrl(searchItem)# ๊ฒ€์ƒ‰ํ•  url ๊ฐ€์ ธ์™€์„œ
    driver.get(url)# ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰์œผ๋กœ ๊ฐ€์„œ
    maximizeWindow(driver)# ์ฐฝ์ตœ๋Œ€ํ™”
    scrollToEnd(driver)

    forbiddenCount = saveImgs(driver, saveDir, start)# ๋ชจ๋“  ์ƒ์„ธ ์ด๋ฏธ์ง€ src๋“ค์„ ๊ฐ€์ ธ์˜จ๋‹ค
    sec = check_time(start)
    print(f'์‹คํŒจ์ˆ˜{str(forbiddenCount)}, {sec}, {datetime.datetime.now().time()}')
  time.sleep(10)
  driver.quit() 
  
# ๊ตฌ๊ธ€ ๋กœ๊ทธ์ธ
def googleSignIn(driver):
  idBtn = driver.find_element(By.XPATH,'//*[@id="identifierId"]')# id ์ž…๋ ฅ์นธ
  idBtn.send_keys(ID)
  nextBtn = driver.find_element(By.XPATH,'//*[@id="identifierNext"]/div/button')
  nextBtn.click()# ๋‹ค์Œ ๋ฒ„ํŠผ ํด๋ฆญ

  # ์•„๋ž˜ ์ฝ”๋“œ๋Š” ๋น„๋ฐ€๋ฒˆํ˜ธ ์š”์†Œ๊ฐ€ ํ™”๋ฉด์— ๋‚˜ํƒ€๋‚ ๋•Œ๊ฐ€์ง€ 10์ดˆ๊ฐ„ ๊ธฐ๋‹ค๋ฆฌ๋Š” ์ฝ”๋“œ์ด๋‚˜
  # ๋น„๋ฒˆ์˜ ๊ฒฝ์šฐ not interactive elem๋ผ์„œ ์—๋Ÿฌ๊ฐ€ ๋œฌ๋‹ค. ํ•˜์ง€๋งŒ ๋Œ์•„๊ฐ€๋Š” ์ฝ”๋“œ์ด๋‹ˆ ๊ธฐ๋‹ค๋ฆผ์ด ํ•„์š”ํ• ๋•Œ ์“ฐ์ž.
  try:
    passwordBtn = WebDriverWait(driver, timeout=10).until(EC.presence_of_element_located( (By.XPATH,'//*[@id="password"]/div[1]/div/div[1]/input') ))
    time.sleep(4)
    passwordBtn = driver.find_element(By.XPATH,'//*[@id="password"]/div[1]/div/div[1]/input')# ๋น„๋ฐ€๋ฒˆํ˜ธ ์ž…๋ ฅ์นธ
    passwordBtn.send_keys(PASSWORD)
    passwordNextBtn = driver.find_element(By.XPATH,'//*[@id="passwordNext"]/div/button')
    passwordNextBtn.click()# ๋น„๋ฐ€๋ฒˆํ˜ธ ๋‹ค์Œ ๋ฒ„ํŠผ
    print('๊ตฌ๊ธ€ ๋กœ๊ทธ์ธ ์„ฑ๊ณต')
    # driver.implicitly_wait(10)
  except OSError as e:
    print(e)
    
  time.sleep(20)# ํœด๋Œ€ํฐ ๋ณธ์ธ ์ธ์ฆ๋“ฑ์˜ ์‹œ๊ฐ„์ด ์ถฉ๋ถ„ํžˆ ํ•„์š”ํ•˜๋‹ค


# ๊ตฌ๊ธ€ ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰ url ๋งŒ๋“ค๊ธฐ
def makeUrl(searchItem):
  url = 'https://www.google.com/search'
  params ={# q์™€ tbm์ด ํ•„์ˆ˜
    'q'     : searchItem,
    'tbm'   : 'isch',
  }
  url = url + '?' + urllib.parse.urlencode(params)
  return url


# ํด๋” ์ƒ์„ฑ
def makeFolder(searchItem):
  saveDir = os.path.join(os.getcwd(), 'data', f'{FOLDER}_{searchItem}')
  try:
    if not(os.path.isdir(saveDir)): # ํ•ด๋‹น ํด๋”๊ฐ€ ์—†๋‹ค๋ฉด
      os.makedirs(os.path.join(saveDir)) # ๋งŒ๋“ค์–ด๋ผ
    return saveDir
  except OSError as e:
    print(e+'ํด๋” ์ƒ์„ฑ ์‹คํŒจ')

# ์ฐฝ ์ตœ๋Œ€ํ™”
def maximizeWindow(driver):
  driver.maximize_window()

# ๋ชจ๋“  ์ด๋ฏธ์ง€ ๋ชฉ๋ก์„ ๊ฐ€์ ธ์˜ค๊ธฐ ์œ„ํ•ด ๋ฌดํ•œ ์Šคํฌ๋กค ๋‹ค์šด
def scrollToEnd(driver):
  prev_height = driver.execute_script('return document.body.scrollHeight')
  print(f'prev_height: {prev_height}')
  
  while True:
    time.sleep(1) #๋„ค์ด๋ฒ„๋Š” sleep์—†์ด ์ด๋™ํ•  ๊ฒฝ์šฐ ๋ฌดํ•œ๋กœ๋”ฉ์— ๊ฑธ๋ฆฐ๋‹ค.
    driver.execute_script('window.scrollTo(0, document.body.scrollHeight)')
    time.sleep(3)
    
    cur_height = driver.execute_script('return document.body.scrollHeight')
    print(f'cur_height: {cur_height}')
    if cur_height == prev_height:
      print('๋†’์ด๊ฐ€ ๊ฐ™์•„์ง')
      break
    prev_height = cur_height
  # ํŽ˜์ด์ง€๋ฅผ ๋ชจ๋‘ ๋กœ๋”ฉํ•œ ํ›„์—๋Š” ์ตœ์ƒ๋‹จ์œผ๋กœ ๋‹ค์‹œ ์˜ฌ๋ผ๊ฐ€๊ธฐ
  driver.execute_script('window.scrollTo(0, 0)')

# ๋ชจ๋“  ์ด๋ฏธ์ง€๋“ค์„ ์ €์žฅํ•œ๋‹ค
def saveImgs(driver, saveDir, start):
  time.sleep(1)
  forbiddenCount = 0
  imgs = driver.find_elements(By.CSS_SELECTOR, '.rg_i.Q4LuWd')
  img_count = len(imgs)
  print(f'์ „์ฒด ์ด๋ฏธ์ง€์ˆ˜ : {img_count}')
  # ํ•˜๋‚˜์”ฉ ํด๋ฆญํ•ด๊ฐ€๋ฉฐ ์ €์žฅ
  for imgNum, img in enumerate(imgs): # imgNum์— ์ด๋ฏธ์ง€๋ฒˆํ˜ธ๊ฐ€ 0๋ถ€ํ„ฐ ๋“ค์–ด๊ฐ„๋‹ค
    try:
      img.click()
      time.sleep(3)
      
      # ์•„๋ž˜์˜ xPath๋Š” ์ž์ฃผ ๋ฐ”๋€Œ๋Š” ๊ฒƒ ๊ฐ™๋‹ค. ๋‚˜๋จธ์ง€๋Š” ๊ณ ์ •์ธ๊ฑฐ ๊ฐ™์œผ๋‹ˆ ์ด๊ฒƒ๋งŒ ๊ฐ€๋” ํ™•์ธํ•ด์ฃผ์ž
      bigImg = driver.find_element(By.XPATH, IMG_XPATH)
      src = bigImg.get_attribute('src')
      urllib.request.urlretrieve(src, saveDir + '/' + str(imgNum) + '.jpg')
      sec = check_time(start)
      print(f'{imgNum+1}/{img_count} saved {sec}')

    except Exception as e:
      print(e)
      forbiddenCount += 1# ์ €์žฅ ์‹คํŒจํ•œ ๊ฐœ์ˆ˜. forbidden์ด๋‚˜ ํŒŒ์ผ์—๋Ÿฌ๋„ ๊ฝค ๋งŽ๋‹ค
      continue
  return forbiddenCount


# ์‹œ๊ฐ„ ์ธก์ •
def check_start():
    start_time = time.time()
    print("Start! now.." + str(start_time))
    return start_time
def check_time(start):
    end = time.time()
    during = end - start
    sec = str(datetime.timedelta(seconds=during)).split('.')[0]
    return sec
main()

* ์ƒ์„ธ ์ด๋ฏธ์ง€์˜ xPath ์•Œ์•„๋‚ด๋Š” ๋ฐฉ๋ฒ•

- ํฌ๋กฌ์˜ ์ด๋ฏธ์ง€ ํด๋ฆญ ํ›„ ๋œจ๋Š” ์ƒ์„ธ์ด๋ฏธ์ง€ ํ™”๋ฉด์—์„œ ํ•ด๋‹น elements์˜ xPath๋ฅผ ์•„๋ž˜์™€ ๊ฐ™์ด ๋ณต์‚ฌํ•จ

 

๋‹ค์Œ์—๋Š” headless ํฌ๋กค๋ง์— ๋Œ€ํ•ด ์ •๋ฆฌํ•˜๊ฒ ๋‹ค

- headless ํฌ๋กค๋ง์€ ํ™”๋ฉด์— ๋ธŒ๋ผ์šฐ์ € ์ฐฝ์„ ๋„์šฐ์ง€ ์•Š๊ณ  ๋ฉ”๋ชจ๋ฆฌ์—์„œ๋งŒ ์ž‘๋™ํ•˜๋Š” ๋ฐฉ์‹์ด๋‹ค.

๋„ค์ด๋ฒ„ ์ด๋ฏธ์ง€ ํฌ๋กค๋งํ•˜๊ธฐ

๋™๊ธฐ

1. ์ข‹์€ ๋กœ๋ผ๋ฅผ ๋งŒ๋“ค๊ธฐ ์œ„ํ•ด์„œ๋Š” ์ด๋ฏธ์ง€ ํŒŒ์ผ๋“ค์ด ๋งŽ์„์ˆ˜๋ก ์ข‹๋‹ค.
2. ๊ฒฝํ—˜์ƒ ์ด๋ฏธ์ง€ ํ€„๋Ÿฌํ‹ฐ๋Š” ๊ตฌ๊ธ€๋ณด๋‹ค ๋„ค์ด๋ฒ„๊ฐ€ ๋” ์ข‹์•˜๋‹ค - ๋ฌผ๋ก  ๊ตฌ๊ธ€ ์ด๋ฏธ์ง€ ํฌ๋กค๋ง ์ฝ”๋“œ๋„ ๊ณต์œ ํ•  ์˜ˆ์ •
3. ๋„ค์ด๋ฒ„๋Š” ๊ตฌ๊ธ€์— ๋น„ํ•ด ํฌ๋กค๋ง์— ๊ด€๋Œ€ํ•˜๋‹ค. ๊ทธ๋ž˜์„œ ์ฝ”๋“œ๊ฐ€ ๊ฐ„๊ฒฐํ•ด์ง„๋‹ค - ๊ตฌ๊ธ€์€ undetected_chromedriver ๋“ฑ์„ ์‚ฌ์šฉํ•ด์•ผํ•œ๋‹ค.

 

์‚ฌ์šฉ๋ฒ•

1. 2023๋…„ 6์›” 19์ผ ํ˜„์žฌ ์•„๋ž˜ ์ฝ”๋“œ๋Š” ์ž˜ ๋Œ์•„๊ฐ„๋‹ค. ์‚ฌ์šฉํ•˜๊ธฐ ์‰ฝ๊ฒŒ ์ฃผ์„์„ ๋งŽ์ด ๋‹ฌ์•„๋†“์•˜๋‹ค.
2. ์…€๋ฆฌ๋‹ˆ์›€์ด๋‚˜ urllib๋“ฑ์˜ ๋ชจ๋“ˆ ์„ค์น˜๊ฐ€ ์šฐ์„ ์ด๋‹ค.
pip install selenium
3. ์‹คํ–‰ ํ›„ ํฌ๋กฌ์ฐฝ์ด ๋œจ๊ณ  ์ฐฝ์ด ์ตœ๋Œ€ํ™” ๋œ๋‹ค. 
4. ์ž๋™์œผ๋กœ ์Šคํฌ๋กค๋˜๋ฉฐ ์ด๋ฏธ์ง€ ๋ชฉ๋ก์„ ๊ฐ€์ ธ์˜จ๋‹ค. ์ด๋•Œ๋Š” ์ฐฝ์„ ๋‚ด๋ฆฌ์ง€๋ง๊ณ  ์Šคํฌ๋กค์ด ๋๊นŒ์ง€ ๋‚ด๋ ค๊ฐ€ ๋” ์ด์ƒ ๊ฐ€์ ธ์˜ฌ ์ด๋ฏธ์ง€๊ฐ€ ์—†์„๋•Œ๊นŒ์ง€ ๊ธฐ๋‹ค๋ ค์ฃผ์ž. ์ด๋ฏธ์ง€ ๋ชฉ๋ก์„ ๋‹ค ๊ฐ€์ ธ์˜จ ํ›„ ์ž๋™์œผ๋กœ ์Šคํฌ๋กค์ด ๋งจ์œ„๋กœ ์˜ฌ๋ผ๊ฐ„๋‹ค. ์ด ๋‹ค์Œ๋ถ€ํ„ฐ๋Š” ์ฐฝ์„ ๋‚ด๋ ค๋„ ๋œ๋‹ค. 
5. ๊ฐ€์ ธ์˜ค๊ธธ ์›ํ•˜๋Š” ์ด๋ฏธ์ง€ ์ด๋ฆ„์„ ์•„๋ž˜ ์ฝ”๋“œ์˜ item_list ์•ˆ์— ๋„ฃ๋Š”๋‹ค. ์ฃผ์„ #1๋ฒˆ
6. ํด๋” ์ด๋ฆ„์„ ์ฃผ์„#2๋ฒˆ์— ๋„ฃ๋Š”๋‹ค. ์•„๋ž˜ ์ฝ”๋“œ์ฒ˜๋Ÿผ naver๋กœ ํ•  ๊ฒฝ์šฐ ์ด๋ฏธ์ง€๋Š” data\naver\ ์•ˆ์— ์ €์žฅ๋œ๋‹ค. ํ•ด๋‹น ํด๋”๋ฅผ ๋ฏธ๋ฆฌ ๋งŒ๋“ค์–ด ๋†“์ง€ ์•Š์•„๋„ ์•Œ์•„์„œ ๋งŒ๋“ค๊ณ  ์ €์žฅํ•œ๋‹ค.
7. ์ƒ์„ธ ์ด๋ฏธ์ง€์˜ xpath๋ฅผ ๋„ฃ๋Š”๋‹ค. ์ฃผ์„#3๋ฒˆ. ํฌ๋กค๋ง์„ ํ•ด๋ณด๋‹ˆ xpath๊ฐ€ ์•„~์ฃผ ๊ฐ€๋” ๋ฐ”๋€”๋•Œ๊ฐ€ ์žˆ๋‹ค. ์•„๋งˆ ๊ทธ๋Œ€๋กœ  ๋†”๋‘ฌ๋„ ํฐ ๋ฌธ์ œ์—†์„ ๊ฒƒ์ด๋‹ค.

 

์ตœ์‹  ํด๋ž˜์Šค์™€ xPath๋กœ ์žฌ์„ค์ • ํ•จ - 24.07.26

'''
* ๋„ค์ด๋ฒ„ ์ด๋ฏธ์ง€ ๊ฐ€์ ธ์˜ค๊ธฐ (24.07.26)
'''

import os
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
import urllib
import time, datetime

item_list = [ "๋„๋‹ค๋ฆฌ"] # 1๋ฒˆ
FOLDER = 'naver' # 2๋ฒˆ
IMG_XPATH = '/html/body/div[4]/div/div/div[1]/div[2]/div[1]/img'

def main():
    start = check_start() # ์‹œ๊ฐ„ ์ธก์ • ์‹œ์ž‘
    driver = webdriver.Chrome()
    
    for searchItem in item_list:
        saveDir = makeFolder(searchItem)
        
        url = makeUrl(searchItem)# ๊ฒ€์ƒ‰ํ•  url ๊ฐ€์ ธ์™€์„œ
        driver.get(url)# ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰์œผ๋กœ ๊ฐ€์„œ
        maximizeWindow(driver)# ์ฐฝ์ตœ๋Œ€ํ™”
        scrollToEnd(driver)

        forbiddenCount = saveImgs(driver, saveDir, start)# ๋ชจ๋“  ์ƒ์„ธ ์ด๋ฏธ์ง€ src๋“ค์„ ๊ฐ€์ ธ์˜จ๋‹ค
        sec = check_time(start)
        print(f'์‹คํŒจ์ˆ˜{str(forbiddenCount)}, {sec}, {datetime.datetime.now().time()}')
    time.sleep(10)
    driver.quit() 

# ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰ url ๋งŒ๋“ค๊ธฐ
def makeUrl(searchItem):
    url = 'https://search.naver.com/search.naver'
    params ={
        'where' : 'image',
        'sm'    : 'tab_jum',
        'query' : searchItem
    }
    url = url + '?' + urllib.parse.urlencode(params)
    return url

# ํด๋” ์ƒ์„ฑ
def makeFolder(searchItem):
    saveDir = os.path.join(os.getcwd(), 'data', f'{FOLDER}_{searchItem}')
    try:
        if not(os.path.isdir(saveDir)): # ํ•ด๋‹น ํด๋”๊ฐ€ ์—†๋‹ค๋ฉด
            os.makedirs(os.path.join(saveDir)) # ๋งŒ๋“ค์–ด๋ผ
        return saveDir
    except OSError as e:
        print(e+'ํด๋” ์ƒ์„ฑ ์‹คํŒจ')

# ์ฐฝ ์ตœ๋Œ€ํ™”
def maximizeWindow(driver):
    driver.maximize_window()

# ๋ชจ๋“  ์ด๋ฏธ์ง€ ๋ชฉ๋ก์„ ๊ฐ€์ ธ์˜ค๊ธฐ ์œ„ํ•ด ๋ฌดํ•œ ์Šคํฌ๋กค ๋‹ค์šด
def scrollToEnd(driver):
    prev_height = driver.execute_script('return document.body.scrollHeight')
    print(f'prev_height: {prev_height}')
    
    while True:
        time.sleep(1) #๋„ค์ด๋ฒ„๋Š” sleep์—†์ด ์ด๋™ํ•  ๊ฒฝ์šฐ ๋ฌดํ•œ๋กœ๋”ฉ์— ๊ฑธ๋ฆฐ๋‹ค.
        driver.execute_script('window.scrollTo(0, document.body.scrollHeight)')
        time.sleep(3)
        
        cur_height = driver.execute_script('return document.body.scrollHeight')
        print(f'cur_height: {cur_height}')
        if cur_height == prev_height:
            print('๋†’์ด๊ฐ€ ๊ฐ™์•„์ง')
            break
        prev_height = cur_height
    
    # ํŽ˜์ด์ง€๋ฅผ ๋ชจ๋‘ ๋กœ๋”ฉํ•œ ํ›„์—๋Š” ์ตœ์ƒ๋‹จ์œผ๋กœ ๋‹ค์‹œ ์˜ฌ๋ผ๊ฐ€๊ธฐ
    driver.execute_script('window.scrollTo(0, 0)')

# ๋ชจ๋“  ์ด๋ฏธ์ง€๋“ค์„ ์ €์žฅํ•œ๋‹ค
def saveImgs(driver, saveDir, start):
    time.sleep(1)
    forbiddenCount = 0
    imgs = driver.find_elements(By.CSS_SELECTOR, '._fe_image_tab_content_thumbnail_image')
    
    print('imgs')
    print(imgs)
    srcList = []
    img_count = len(imgs)
    print(f'์ „์ฒด ์ด๋ฏธ์ง€์ˆ˜ : {img_count}')
    # ํ•˜๋‚˜์”ฉ ํด๋ฆญํ•ด๊ฐ€๋ฉฐ ์ €์žฅ
    for imgNum, img in enumerate(imgs): # imgNum์— ์ด๋ฏธ์ง€๋ฒˆํ˜ธ๊ฐ€ 0๋ถ€ํ„ฐ ๋“ค์–ด๊ฐ„๋‹ค
        try:
            img.click()
            time.sleep(3)
            
            # ์•„๋ž˜์˜ xPath๋Š” ์ž์ฃผ ๋ฐ”๋€Œ๋Š” ๊ฒƒ ๊ฐ™๋‹ค. ๋‚˜๋จธ์ง€๋Š” ๊ณ ์ •์ธ๊ฑฐ ๊ฐ™์œผ๋‹ˆ ์ด๊ฒƒ๋งŒ ๊ฐ€๋” ํ™•์ธํ•ด์ฃผ์ž
            bigImg = driver.find_element(By.XPATH, IMG_XPATH)
            src = bigImg.get_attribute('src')
            urllib.request.urlretrieve(src, saveDir + '/' + str(imgNum) + '.jpg')
            sec = check_time(start)
            print(f'{imgNum+1}/{img_count} saved {sec}')

        except Exception as e:
            print(e)
            forbiddenCount += 1# ์ €์žฅ ์‹คํŒจํ•œ ๊ฐœ์ˆ˜. forbidden์ด๋‚˜ ํŒŒ์ผ์—๋Ÿฌ๋„ ๊ฝค ๋งŽ๋‹ค
            continue
    return forbiddenCount


# ์‹œ๊ฐ„ ์ธก์ •
def check_start():
    start_time = time.time()
    print("Start! now.." + str(start_time))
    return start_time
def check_time(start):
    end = time.time()
    during = end - start
    sec = str(datetime.timedelta(seconds=during)).split('.')[0]
    return sec
main()

 

์ด์ œ ์ฝ”๋“œ๋ฅผ ์‹คํ–‰ํ•ด๋ณด์ž. ์ž˜ ์‹คํ–‰๋  ๊ฒƒ์ด๋‹ค.

์ผ๋ฐ˜์ ์œผ๋กœ ๋„ค์ด๋ฒ„๋Š” ์ตœ๋Œ€ 500๊ฐœ์˜ ์ด๋ฏธ์ง€๋ฅผ ๊ฐ€์ ธ์˜จ๋‹ค.

+ Recent posts