ํŒŒ์ธํŠœ๋‹ ๋ฐ์ดํ„ฐ๋ฅผ ๋งŒ๋“ค๋˜ ์ค‘ ์•„๋ž˜์ฒ˜๋Ÿผ ์˜๋‹จ์–ด์™€ ๋œป์œผ๋กœ ๊ตฌ์„ฑ๋œ jsonํŒŒ์ผ์ด ํ•„์š”ํ–ˆ๋‹ค. ๋Œ€๋žต 500๊ฐœ์ •๋„ ์žˆ์œผ๋ฉด ์ถฉ๋ถ„ํ•  ๊ฑฐ ๊ฐ™์•˜๋‹ค.

{
    "plain": "์†”์งํ•œ",
    "parallel": "ํ‰ํ–‰ํ•œ",
    "crack": "๊ธˆ์ด ๊ฐ€๋‹ค",
    "Red": "๋นจ๊ฐ„์ƒ‰",
    "deal": "๊ฑฐ๋ž˜",
    "size": "ํฌ๊ธฐ",
}

 

ํ˜น์‹œ๋‚˜ํ•ด์„œ chatgpt์—๊ฒŒ ๋ฌผ์–ด๋ณด๋‹ˆ ๋ฐ”๋กœ ๋งŒ๋“ค์–ด์คฌ๋‹ค. ๊ทธ๋Ÿฐ๋ฐ ๋‹จ์ง€ ๊ฒฐ๊ณผ๋งŒ ์•Œ๋ ค์ค€๊ฒŒ ์•„๋‹ˆ๋ผ ํŒŒ์ด์ฌ ์ฝ”๋“œ๋„ ํ•จ๊ป˜ ์•Œ๋ ค์คฌ๋‹ค.

๋‚˜๋Š” ์ฝ”๋“œ๋ฅผ ์š”๊ตฌํ•œ์ ๋„ ์—†์—ˆ๋Š”๋ฐ ๋ง์ด๋‹ค. (์•„๋งˆ๋„ ์ด์ „ ๋Œ€ํ™”๊ธฐ๋ก์„ ๋ฐ”ํƒ•์œผ๋กœ ์ถ”๋ก ํ–ˆ์„๊ฒƒ์ด๋‹ค)

 

๋”๊ตฌ๋‚˜ ๋‚˜๋Š” ์ฝ”๋“œ๋กœ ์ด๋Ÿฐ๊ฑธ ๋งŒ๋“ค์–ด๋‚ด๋Š”์ˆ˜๊ฐ€ ์žˆ์„๊ฑฐ๋ผ ์ƒ๊ฐํ•œ์ ๋„ ์—†์—ˆ๋‹ค.

์ฝ”๋“œ๋ฅผ ๋ณด๋‹ˆ ntlk๋ผ๋Š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์•ˆ์— ์˜์–ด์‚ฌ์ „์ด ์žˆ๊ณ  ์ด๊ฒƒ์„ ๊ฐ€์ ธ์™€ ๊ตฌ๊ธ€ ๋ฒˆ์—ญ api๋กœ ๋ฒˆ์—ญ์„ ํ•˜๋Š” ๊ฒƒ ๊ฐ™๋‹ค.

 

๋ง‰์ƒ ์ฝ”๋“œ๋ฅผ ๋Œ๋ ค๋ณด๋‹ˆ ์‚ฌ๋žŒ๋“ค์ด ๊ฑฐ์˜ ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š” ๋‹จ์–ด๋“ค์„ ๋ฌด์ž‘์œ„๋กœ 500๊ฐœ ๊ฐ€์ ธ์˜ค๊ธฐ์—

ํ•œ๋ฒˆ ๋” chatgpt์˜ ํž˜์„ ๋นŒ๋ ค ์ฝ”๋“œ๋ฅผ ๋ณด๊ฐ•ํ–ˆ๋‹ค.

 

๊ธฐ๋Šฅ์€ ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

1. ์ƒ์„ฑํ•˜๊ณ ์ž ํ•˜๋Š” ์˜๋‹จ์–ด ์ˆ˜๋ฅผ ์ž…๋ ฅํ•˜๋ฉด(์•„๋ž˜ ์ฃผ์„๋ถ€๋ถ„์„ ์ˆ˜์ •)
2. ์ƒ์œ„๋นˆ๋„ 5000๊ฐœ์˜ ๋‹จ์–ด๋ฅผ ๋จผ์ € ๊ฐ€์ ธ์™€(๋ฌผ๋ก  ์ด๊ฒƒ๋„ ์•„๋ž˜ ์ฝ”๋“œ์—์„œ 5000์„ ์›ํ•˜๋Š” ๊ฐ’์œผ๋กœ ์ˆ˜์ •ํ•˜๋ฉด๋œ๋‹ค)
3. ๊ทธ ์ค‘ ๋ช…์‚ฌ์™€ ๋™์‚ฌ๋งŒ ์ถ”๋ ค๋‚ธ ํ›„
4. ๋žœ๋คํ•˜๊ฒŒ 500๊ฐœ๋งŒ jsonํ˜•์‹์œผ๋กœ ํŒŒ์ผ๋กœ ์ €์žฅํ•˜๋Š” ๊ฒƒ์ด๋‹ค.

 

* ์ด์ œ ๋Œ๋ ค๋ณด์ž

1. ๋จผ์ € ์„ค์น˜ํ• ๊ฒŒ ํ•˜๋‚˜์žˆ๋‹ค

pip install nltk googletrans==4.0.0-rc1

 

2. ๋‹ค์Œ ์•„๋ž˜ ์ฝ”๋“œ๋ฅผ ๊ทธ๋ƒฅ ๋ถ™์—ฌ๋„ฃ๊ธฐํ•˜๊ณ  ๋Œ๋ฆฌ๋ฉด ๋œ๋‹ค.

- ์ฒ˜์Œ์—๋Š” ์‚ฌ์ „์„ ๋‹ค์šด๋กœ๋“œ ํ•˜๋Š”๋ฐ ์‹œ๊ฐ„์ด ์ข€ ๊ฑธ๋ฆฐ๋‹ค

import random, time
import nltk
from nltk.corpus import words, brown
from collections import Counter
from googletrans import Translator
import json

# NLTK ๋ฐ์ดํ„ฐ ๋‹ค์šด๋กœ๋“œ (ํ•œ๋ฒˆ๋งŒ ์‹คํ–‰ํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค)
nltk.download('words')
nltk.download('brown')
nltk.download('averaged_perceptron_tagger')

# ์ƒ์„ฑํ•˜๊ณ ์ž ํ•˜๋Š” ์˜๋‹จ์–ด ๊ฐœ์ˆ˜๋ฅผ ์ž…๋ ฅ
word_count = 500

# ๋ชจ๋“  ์˜์–ด ๋‹จ์–ด ๋ชฉ๋ก
word_list = words.words()

# Brown ์ฝ”ํผ์Šค์˜ ๋‹จ์–ด ๋ชฉ๋ก๊ณผ ๋นˆ๋„ ๊ณ„์‚ฐ
brown_words = brown.words()
word_freq = Counter(brown_words)

# ๋นˆ๋„๊ฐ€ ๋†’์€ ์ƒ์œ„ ๋‹จ์–ด 5000๊ฐœ ์„ ํƒ
common_words = {word for word, freq in word_freq.most_common(5000)}

# word_list์—์„œ ์ƒ์œ„ ๋นˆ๋„ ๋‹จ์–ด๋งŒ ์„ ํƒ
filtered_word_list = [word for word in word_list if word.lower() in common_words]

# ํ’ˆ์‚ฌ ํƒœ๊น…ํ•˜์—ฌ ๋ช…์‚ฌ์™€ ๋™์‚ฌ๋งŒ ํ•„ํ„ฐ๋ง
tagged_words = nltk.pos_tag(filtered_word_list)

# ๋ช…์‚ฌ์™€ ๋™์‚ฌ ํ’ˆ์‚ฌ ํƒœ๊ทธ ๋ชฉ๋ก
noun_tags = {'NN', 'NNS', 'NNP', 'NNPS'}
verb_tags = {'VB', 'VBD', 'VBG', 'VBN', 'VBP', 'VBZ'}

# ๋ช…์‚ฌ์™€ ๋™์‚ฌ๋งŒ ํ•„ํ„ฐ๋ง
filtered_nouns_and_verbs = [word for word, tag in tagged_words if tag in noun_tags or tag in verb_tags]

# ๊ทธ ์ค‘ ๋žœ๋คํ•˜๊ฒŒ 500๊ฐœ์˜ ๋‹จ์–ด๋ฅผ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค.
random_words = random.sample(filtered_nouns_and_verbs, word_count)
# Google Translate API๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฒˆ์—ญํ•ฉ๋‹ˆ๋‹ค.
translator = Translator()
translated_dict = {}

def save():
    with open('translated_words.json', 'w', encoding='utf-8') as f:
        json.dump(translated_dict, f, ensure_ascii=False, indent=4)

for idx, word in enumerate(random_words):
    try:
        translated_word = translator.translate(word, src='en', dest='ko').text
        print(idx+1, word, translated_word)
        translated_dict[word] = translated_word
        time.sleep(1)
    except:
        print('์ค‘๊ฐ„์— ์˜ค๋ฅ˜๋ฐœ์ƒ. ์ง€๊ธˆ๊นŒ์ง€ ์ž‘์—…ํ•œ ๊ฒƒ๋“ค ์ €์žฅํ•จ')
        save()
        
    
# JSON ํŒŒ์ผ๋กœ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.
save()

print(f"{len(random_words)}๊ฐœ์˜ ๋‹จ์–ด๊ฐ€ ๋ฒˆ์—ญ๋˜๊ณ  JSON ํŒŒ์ผ๋กœ ์ €์žฅ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.")

 

์œ„ ์ฝ”๋“œ๋ฅผ ๋Œ๋ฆฌ๋ฉด ์•„๋ž˜ ํŒŒ์ผ๊ณผ ๊ฐ™์€ ๊ฒฐ๊ณผ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค.

translated_words.json
0.01MB

 

๊ฑฐ์˜ 100% chatgpt๊ฐ€ ์ฝ”๋”ฉํ•œ ๊ฒฐ๊ณผ์ด๊ณ  ๋‚˜๋Š” ํŽธ์˜๋ฅผ ์œ„ํ•œ ์‚ฌ์†Œํ•œ ์ˆ˜์ •๊ณผ ์ฝ”๋“œ ์กฐํ•ฉ๋งŒ ํ–ˆ๋‹ค.

์ด์ชฝ ๋ถ„์•ผ์— ์žˆ์œผ๋ฉด์„œ๋„ ์ƒˆ์‚ผ ๋†€๋ผ์šด ์„ธ์ƒ์ž„์„ ๋Š๋‚€ ํ•˜๋ฃจ์˜€๋‹ค.

+ Recent posts