โ— ํ•™์Šต ๋ชจ๋ธ ์ €์žฅ ๋ฐ ์ฝ์–ด์˜ค๊ธฐ


> ๋“ค์–ด๊ฐ€๋Š” ๋ง

ํ…Œ์ŠคํŠธ๋ฅผ ํ• ๋•Œ๋งˆ๋‹ค ๋งค๋ฒˆ ํ•™์Šต์„ ์‹œํ‚ฌ ์ˆ˜ ์—†์œผ๋‹ˆ ์ €์žฅํ•˜๋Š” ๊ฑด ๋‹น์—ฐํ•˜๋‹ค.

1. ํ•™์Šต ๋ชจ๋ธ์„ ์ €์žฅ ๋ฐฉ์‹์—๋Š” (ํ˜„์žฌ ๋‚ด๊ฐ€ ์•Œ๊ณ  ์žˆ๋Š” ๊ฒƒ์ด ๋”ฑ) 2๊ฐ€์ง€ ๋ฐฉ๋ฒ•์ด ์žˆ๋‹ค.

2. pickle๋ชจ๋“ˆ๋กœ ์ง๋ ฌํ™”ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ํ•˜๋‚˜๊ณ 

3. skilearn.externals์˜ joblib๋ชจ๋“ˆ์ด ๋‹ค๋ฅธ ํ•˜๋‚˜๋‹ค.


> ๊ณผ์ •

1. pickle์ด ๊ฐ€์žฅ ๋ณดํŽธ์ ์ด๋ผ๊ณ  ํ•ด์„œ ์‚ฌ์šฉํ•ด ๋ณด๋‹ˆ ํŒŒ์ผ ์šฉ๋Ÿ‰์ด 42.5MB๊ฐ€ ๋‚˜์™”๋‹ค.

2. joblib๋กœ compress=9๋กœ ํ•ด์„œ ์ €์žฅํ•ด ๋ณด๋‹ˆ ํŒŒ์ผ ์šฉ๋Ÿ‰์ด 9.5MB๊ฐ€ ๋‚˜์™”๋‹ค.

3. ์†๋„๋Š” ๋‘˜๋‹ค ๋น„์Šท.

4. ํ•™์Šต ๋‚ด์šฉ์„ ๋ฐ”์ด๋„ˆ๋ฆฌ๋กœ ์ €์žฅํ•ด ๋†จ๋‹ค๊ฐ€ ๋‹ค์‹œ ๋กœ๋“œํ•ด์„œ ์“ฐ๋Š” ๋ฐฉ์‹์ด๋‹ค.

5. ์ง€๊ธˆ๊นŒ์ง€๋Š” ํ™•์‹คํ•˜๊ฒŒ joblib์˜ ์Šน๋ฆฌ๋‹ค.


> ์‹คํ–‰

1. GridSearchCV()๋ฅผ ๋Œ๋ ค ์ตœ์ ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋“ค์„ ์ฐพ์•„๋ดค๋‹ค.

Fitting 3 folds for each of 288 candidates, totalling 864 fits

[Parallel(n_jobs=-1)]: Done  42 tasks      | elapsed:   24.0s

[Parallel(n_jobs=-1)]: Done 192 tasks      | elapsed:  2.5min

[Parallel(n_jobs=-1)]: Done 442 tasks      | elapsed:  6.5min

[Parallel(n_jobs=-1)]: Done 792 tasks      | elapsed: 11.5min

[Parallel(n_jobs=-1)]: Done 864 out of 864 | elapsed: 13.4min finished

Best score: 0.60695468914647

Best parameter set:

        clf__alpha: 1.0

        vect__max_features: None

        vect__ngram_range: (1, 2)

        vect__norm: None

        vect__smooth_idf: False

        vect__sublinear_tf: True

        vect__use_idf: True

์ด๊ฒƒ์„ ๋ฐ”ํƒ•์œผ๋กœ MultinomialNB() ํด๋ž˜์Šค๋ฅผ ๋Œ๋ ค์„œ ๋‚˜์˜จ ํ…Œ์ŠคํŠธ ๊ฒฐ๊ณผ๋Š” ์•„๋ž˜์™€ ๊ฐ™๋‹ค. ๋ช‡๋ฒˆ ๋Œ๋ ค๋ด๋„ ๊ทธ๋‹ค์ง€ ์ข‹์•„์ง„๊ฑธ ๋ชจ๋ฅด๊ฒ ๋‹ค ใ…ก,ใ…ก

์ตœ์ ํ™”๋Š” ์ข€ ๋” ๊ณต๋ถ€ํ•˜๊ณ  ์ €์žฅ๊ณผ ๋กœ๋“œ๋งŒ ์ˆ™๋‹ฌ์‹œ์ผœ์•ผ๊ฒ ๋‹ค.




2. SGDClassifier()ํด๋ž˜์Šค๋ฅผ ์ด์šฉํ•ด ํ•™์Šต์‹œํ‚จ ํ›„ ์ €์žฅ ๋ฐ ๋กœ๋“œ๋„ ํ•ด๋ดค๋‹ค. ์—ญ์‹œ ์ข‹๋‹ค. MultinomialNB() ์•„๋ฌด๋ฆฌ ์ตœ์ ํ™” ์‹œ์ผœ๋ดค์ž SGDClassifier() ์ƒˆ๋ฐœ์˜ ํ”ผ๋‹ค.





+ Recent posts