์ด ๋ฌธ์„œ์˜ ์›๋ณธ์€ ์™ธ๋ถ€ ์œ„ํ‚ค์—์„œ ๊ฐ€์ ธ์™”์Šต๋‹ˆ๋‹ค.

1. ๊ฐœ์š”2. ์ƒ์„ธ
2.1. ์šฉ๋„ ๋ฐ ํŠน์ง•2.2. ROCm ์†Œํ”„ํŠธ์›จ์–ด ํ”Œ๋žซํผ
3. ์ œํ’ˆ ์ผ๋žŒ
3.1. gfx800 / 3์„ธ๋Œ€ GCN ๋งˆ์ดํฌ๋กœ์•„ํ‚คํ…์ฒ˜3.2. gfx800 / Polaris ๋งˆ์ดํฌ๋กœ์•„ํ‚คํ…์ฒ˜3.3. gfx900 / Vega ๋งˆ์ดํฌ๋กœ์•„ํ‚คํ…์ฒ˜3.4. gfx900 / CDNA ๋งˆ์ดํฌ๋กœ์•„ํ‚คํ…์ฒ˜
4. ๊ด€๋ จ ๋ฌธ์„œ

1. ๊ฐœ์š”[ํŽธ์ง‘]

ํ™ˆํŽ˜์ด์ง€

AMD์˜ ๋จธ์‹  ๋Ÿฌ๋‹ ์ „์šฉ ๊ทธ๋ž˜ํ”ฝ ์นด๋“œ. AMD Tech Summit 2016์—์„œ ๋ฐœํ‘œํ–ˆ์œผ๋ฉฐ 2016๋…„ 12์›” 12์ผ ์— ๋ฐ”๊ณ ๊ฐ€ ํ•ด์ œ๋˜์–ด ๊ณต๊ฐœ๋˜์—ˆ๋‹ค.

2. ์ƒ์„ธ[ํŽธ์ง‘]

2.1. ์šฉ๋„ ๋ฐ ํŠน์ง•[ํŽธ์ง‘]

๋จธ์‹  ๋Ÿฌ๋‹ ์ž‘์—…์„ ํ•˜๋Š”๋ฐ ์žˆ์–ด์„œ ๋งค์šฐ ์ž์ฃผ ์‚ฌ์šฉ๋˜๋Š” ๊ฒŒ ๊ทธ๋ž˜ํ”ฝ์นด๋“œ์ด๋‹ค. ๋งŽ์ด ๋ฐฐ์šด ์‚ฌ๋žŒ์ด ๋” ๋งŽ์ด ์•Œ๋“ฏ์ด, ์ธ๊ณต์ง€๋Šฅ๋„ ๋จธ์‹  ๋Ÿฌ๋‹์„ ํ†ตํ•ด ๊ฐ€๋Šฅํ•œ ๋งŽ์€ ์—ฐ์‚ฐ์„ ์ฒ˜๋ฆฌํ•ด ๊ทธ ๊ฐ’๋“ค์„ ๋ฐฐ์šฐ๊ณ  ํ›ˆ๋ จํ•ด์•ผ ๋˜‘๋˜‘ํ•ด์ง€๋Š” ๊ฑด ๋‹น์—ฐ์ง€์‚ฌ. ๋ณ‘๋ ฌ ์—ฐ์‚ฐ์— ์žˆ์–ด ์ ˆ๋Œ€์ ์ธ ์šฐ์œ„๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ๊ทธ๋ž˜ํ”ฝ ์นด๋“œ(GPGPU ํ•ญ๋ชฉ์„ ์ฝ๊ณ  ์˜ค๋ฉด ์ข‹๋‹ค.)๊ฐ€ ๋จธ์‹ ๋Ÿฌ๋‹์— ์“ฐ์ด๋Š” ์ด์œ ๊ฐ€ ์ด ๋•Œ๋ฌธ์ด๋‹ค. ๋‹ค๋งŒ ๋จธ์‹  ๋Ÿฌ๋‹์— ์žˆ์–ด ์ค‘์š”ํ•œ ๊ฒƒ์€ ์ˆ˜๋งŽ์€ ์ƒํ™ฉ์„ ๋ฐ˜๋ณต์ ์œผ๋กœ, ๋น„์Šทํ•œ ์ƒํ™ฉ์„ ์ˆ˜์ฒœ ๊ฐ€์ง€์”ฉ ์กฐ๊ธˆ์”ฉ ํŒŒ์ƒํ•ด๊ฐ€๋ฉฐ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๋Š” ๋ง‰๋Œ€ํ•œ ์—ฐ์‚ฐ๋Ÿ‰์ด์ง€, ์‚ฌ์šฉ์ž๊ฐ€ ์›ํ•˜๋Š” ํ•˜๋‚˜์˜ ๋ชฉํ‘œ๊ฐ’์„ ์–ป๊ธฐ ์œ„ํ•ด ๋ณต์žกํ•œ ๊ณ„์‚ฐ์„ ํ•˜๋Š”๊ฑฐ์—” ํฌ๊ฒŒ ์‹ ๊ฒฝ ์“ธ ํ•„์š”๊ฐ€ ์—†๋‹ค. ์ด ๋•Œ๋ฌธ์— ๋จธ์‹  ๋Ÿฌ๋‹์˜ ์ŠนํŒจ๋Š” ๋‹จ์ˆœํ•œ ์—ฐ์‚ฐ์„ ์–ผ๋งˆ๋งŒํผ ์••๋„์ ์ธ ์—ฐ์‚ฐ๋Ÿ‰์œผ๋กœ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๋Š”๊ฐ€์— ๋‹ฌ๋ ค์žˆ์œผ๋ฉฐ, ์ด๋Š” ๊ณง ๋‹จ์ •๋ฐ€๋„ ์—ฐ์‚ฐ๊ณผ ์ €์ •๋ฐ€๋„ ์—ฐ์‚ฐ๊ณผ ์—ฐ๊ด€๋˜์–ด์žˆ๋‹ค ํ•  ์ˆ˜ ์žˆ๋‹ค. ๋ผ๋ฐ์˜จ ์ธ์ŠคํŒ…ํŠธ๋Š” ๋‹จ์ •๋ฐ€๋„ ์—ฐ์‚ฐ/์ €์ •๋ฐ€๋„ ์—ฐ์‚ฐ์„ ํŠนํ™”์‹œํ‚จ GPU๋กœ ๊ธฐ์กด ๋ผ๋ฐ์˜จ์ด๋‚˜ ๋ผ๋ฐ์˜จ ํ”„๋กœ์™€ ๋‹ค๋ฅด๊ฒŒ ๋จธ์‹ ๋Ÿฌ๋‹์— ์ตœ์ ํ™” ๋˜์–ด์žˆ๋‹ค ํ•  ์ˆ˜ ์žˆ๋‹ค. ๋ฐ˜๋Œ€๋กœ ๋ฐฐ์ •๋ฐ€๋„/๊ณ ์ •๋ฐ€๋„ ์—ฐ์‚ฐ ์„ฑ๋Šฅ์€ ์“ธ๋ฐ๊ฐ€ ๋”ฑํžˆ ์—†์œผ๋‹ˆ ๋‹น์—ฐํžˆ ์„ฑ๋Šฅ์ด ๋†’์ง€ ์•Š๋‹ค. ๋‹ค๋งŒ ๋ผ๋ฐ์˜จ ์ธ์ŠคํŒ…ํŠธ ์ค‘ ํ˜„์žฌ ๊ฐ€์žฅ ๋†’์€ ์œ„์น˜์— ์žˆ๋Š” MI25๋Š” ์›Œ๋‚™์— ๊ดด๋ฌผ๊ธ‰ ์„ฑ๋Šฅ์„ ๊ฐ€์ง€๊ณ  ์žˆ๋Š”์ง€๋ผ ๊ณ ์ •๋ฐ€๋„ ์—ฐ์‚ฐ ์„ฑ๋Šฅ์ด ์ €์ •๋ฐ€๋„ ์—ฐ์‚ฐ์„ฑ๋Šฅ์— ๋น„ํ•ด ํ•œ์ฐธ ํ›„๋‹ฌ๋ ค๋„, ๋‹ค๋ฅธ ํ•˜์ด์—”๋“œ ๊ทธ๋ž˜ํ”ฝ์นด๋“œ๋“ค์— ๋น„ํ•ด ๊ณ ์ •๋ฐ€๋„ ์—ฐ์‚ฐ์„ฑ๋Šฅ์ด ๋†’๊ธดํ•˜๋‹ค. ์—”๋น„๋””์•„๊ฐ€ GPU ์•„ํ‚คํ…์ฒ˜๋ฅผ ์ผ€ํ”Œ๋Ÿฌ๊ณผ ๋งฅ์Šค์›ฐ๋กœ ์ด์›ํ™”์‹œ์ผœ ํ•˜์ด์—”๋“œ ๊ทธ๋ž˜ํ”ฝ์นด๋“œ ์ค‘ ๋จธ์‹ ๋Ÿฌ๋‹์— ์œ ๋ฆฌํ•œ ๋งฅ์Šค์›ฐ ์•„ํ‚คํ…์ฒ˜ ์ชฝ์„ ๋จธ์‹ ๋Ÿฌ๋‹ ํŠนํ™”๋กœ ์‹œ์žฅ์— ํ‘ผ ๊ฒƒ๊ณผ ๋น„๊ต[1]ํ–ˆ์„ ๋•Œ AMD๋Š” ์•„์˜ˆ ๋จธ์‹ ๋Ÿฌ๋‹์šฉ ๊ทธ๋ž˜ํ”ฝ์นด๋“œ๋ฅผ ๋…๋ฆฝ์ ์ธ ๋ธŒ๋žœ๋“œ(๋ผ๋ฐ์˜จ์˜ ํ•˜์œ„ ๋ธŒ๋žœ๋“œ)๋กœ ์ถœํ’ˆ์‹œ์ผฐ๋‹ค๊ณ  ํ•  ์ˆ˜ ์žˆ๋‹ค. ์‚ฌ์‹ค ์œ„์— ํŠนํ™”๋‹ˆ ๋ญ๋‹ˆ ํ•˜๊ณ  ์žฅํ™ฉํ•˜๊ฒŒ ์จ๋†จ์ง€๋งŒ ๋ฒ ๊ฐ€10XT๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ํƒ€ ์ œํ’ˆ๊ตฐ์˜ GPU์™€ ๋น„๊ตํ–ˆ์„๋•Œ,ํ•˜๋“œ์›จ์–ด์ ์ธ ์ฐจ์ด๋Š” ์—†๋‹ค. [2] ์‚ฌ์‹ค์ƒ AMD๊ฐ€ ๋ฒ ๊ฐ€๋ฅผ ๋ผ๋ฐ์˜จ์— ํ’€๊ธด ํ–ˆ์ง€๋งŒ ๋ง๊ทธ๋Œ€๋กœ ์ถœ์‹œ๋Š” ํ–ˆ๋‹ค ์ˆ˜์ค€์ด๊ณ  ๋ฒ ๊ฐ€๊ฐ€ ์•„์˜ˆ ๊ฐ์žก๊ณ  ๋‹จ์ •๋ฐ€๋„ ํŠนํ™”์‹œ์ผœ์„œ ๋‚˜์™”๋‹ค๊ณ  ๋ด๋„ ๋˜๋Š” ์•„ํ‚คํ…์ฒ˜์ด๊ธฐ ๋•Œ๋ฌธ์— ๋ฒ ๊ฐ€๋ฅผ ์‚ฌ์šฉํ•œ ๋ผ๋ฐ์˜จ๊ณผ ๋ผ๋ฐ์˜จ ํ”„๋กœ ์—ญ์‹œ ๋‹จ์ •๋ฐ€๋„ ์„ฑ๋Šฅ์ด ๋†’๊ฒŒ ๋‚˜์˜จ๋‹ค. ๋‹ค๋งŒ ์ธ์ŠคํŒ…ํŠธ๋Š” ๊ณต์‹์ ์ธ ์†Œํ”„ํŠธ์›จ์–ด ์ง€์›์ด ๋œ๋‹ค๋Š” ์ ์ด ์ฐจ์ด.

์ด๋ฏธ์ง€ ์ปฌ๋Ÿฌ๋Š” ๋…ธ๋ž€์ƒ‰. ์›๋ž˜ ๋…ธ๋ž€์ƒ‰์€ S3 Graphics๊ฐ€ ์“ฐ๊ณ  ์žˆ์—ˆ์œผ๋‚˜, ๊ทธ๋ž˜ํ”ฝ ์‚ฌ์—…์„ ์ ‘์€ ์ง€ ์˜ค๋ž˜๋ผ(...) ๊ทธ๋ƒฅ ์“ฐ๋Š” ๋“ฏํ•˜๋‹ค.

2.2. ROCm ์†Œํ”„ํŠธ์›จ์–ด ํ”Œ๋žซํผ[ํŽธ์ง‘]

2015๋…„ 11์›”์— ๊ฐœ์ตœ๋œ SC15์—์„œ ๋ฐœํ‘œํ•œ ๋ณผ์ธ ๋งŒ ๊ณ„ํš(Boltzmann Initiative)์—์„œ ์‹œ์ž‘๋œ ๋จธ์‹  ๋Ÿฌ๋‹์„ ์œ„ํ•œ ์†Œํ”„ํŠธ์›จ์–ด ํ”Œ๋žซํผ. #

HIP(Heterogeneous-Compute Interface for Portability)๋ผ๋Š” ์–ธ์–ด๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜๋ฉฐ, HCC(Heterogeneous Compute Compiler)๋ผ๋Š” ์ปดํŒŒ์ผ๋Ÿฌ๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. ์ด HIP์€ CUDA๋‚˜ HCC C++ ์ปดํŒŒ์ผ๋Ÿฌ๋ฅผ ์‚ฌ์šฉํ•ด ์—”๋น„๋””์•„์—์„œ๋„, AMD์—์„œ๋„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ๋ฒ”์šฉ์„ฑ์„ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค.

3. ์ œํ’ˆ ์ผ๋žŒ[ํŽธ์ง‘]

3.1. gfx800 / 3์„ธ๋Œ€ GCN ๋งˆ์ดํฌ๋กœ์•„ํ‚คํ…์ฒ˜[ํŽธ์ง‘]

๋ชจ๋ธ๋ช…
GPU
๊ทธ๋ž˜ํ”ฝ ๋ฉ”๋ชจ๋ฆฌ
TGP
(W)
์ถœ๊ณ 
๊ฐ€๊ฒฉ
($)
์ฝ”๋“œ๋„ค์ž„
(๊ณต์ •)
(๋ฉด์ )
SP
(ACE, SU)
ํด๋Ÿญ
(๋ถ€์ŠคํŠธ)
(MHz)
L2
์บ์‹œ
๋ฉ”๋ชจ๋ฆฌ
(MB)
๋ฒ„์Šค
(bit)
๊ทœ๊ฒฉ
ํด๋Ÿญ
(๋น„ํŠธ๋ ˆ์ดํŠธ)
(MHz)
(Mbps)
์šฉ๋Ÿ‰
(GB)
Instinct MI8
Fiji
(28 ใŽš)
(596 ใŽŸ)
4096
(4, 64)
1000
2
4096
HBM
500
(1000)
4
175
?
ใ€์ด๋ก ์ ์ธ ์„ฑ๋Šฅ ๊ณ„์‚ฐ์‹ ํŽผ์น˜๊ธฐ ยท ์ ‘๊ธฐใ€‘
< ๋ฒ”์šฉ ์—ฐ์‚ฐ ์„ฑ๋Šฅ >
(GPU ํด๋Ÿญ) ร— (SP์˜ ๊ฐœ์ˆ˜) ร— 2 รท 1000 = (FP32 ์—ฐ์‚ฐ ์†๋„) [gflops]
(FP32 ์—ฐ์‚ฐ ์†๋„) รท 16 = (FP64 ์—ฐ์‚ฐ ์†๋„) [gflops]
(FP32 ์—ฐ์‚ฐ ์†๋„) ร— 2 = (FP16 ์—ฐ์‚ฐ ์†๋„) [gflops]
< ๊ทธ๋ž˜ํ”ฝ ๋ฉ”๋ชจ๋ฆฌ ์„ฑ๋Šฅ >
(๋ฉ”๋ชจ๋ฆฌ ๋ฒ„์Šค) รท 8 ร— (๋ฉ”๋ชจ๋ฆฌ ๋น„ํŠธ๋ ˆ์ดํŠธ) รท 1000 = (๋ฉ”๋ชจ๋ฆฌ ๋Œ€์—ญํญ) [gb/s]
ใ€์šฉ์–ด ์ „์ฒด ์ด๋ฆ„ ํŽผ์น˜๊ธฐ ยท ์ ‘๊ธฐใ€‘
Half-Precision Floating-Point = FP16
24-bit Precision Floating-Point = FP24
Single-Precision Floating-Point = FP32
Double-Precision Floating-Point = FP64
16-bit Integer = INT16
32-bit Integer = INT32
Asynchronous Compute Engine = ACE
Stream Processor = SP
Scalar Unit = SU
Total Board Power = TBP


3.2. gfx800 / Polaris ๋งˆ์ดํฌ๋กœ์•„ํ‚คํ…์ฒ˜[ํŽธ์ง‘]

๋ชจ๋ธ๋ช…
GPU
๊ทธ๋ž˜ํ”ฝ ๋ฉ”๋ชจ๋ฆฌ
TGP
(W)
์ถœ๊ณ 
๊ฐ€๊ฒฉ
($)
์ฝ”๋“œ๋„ค์ž„
(๊ณต์ •)
(๋ฉด์ )
SP
(ACE, SU)
ํด๋Ÿญ
(๋ถ€์ŠคํŠธ)
(MHz)
L2
์บ์‹œ
๋ฉ”๋ชจ๋ฆฌ
(MB)
๋ฒ„์Šค
(bit)
๊ทœ๊ฒฉ
ํด๋Ÿญ
(๋น„ํŠธ๋ ˆ์ดํŠธ)
(MHz)
(Mbps)
์šฉ๋Ÿ‰
(GB)
Instinct MI6
Polaris 10
(14 ใŽš)
(232 ใŽŸ)
2304
(4, 36)
1120
(1233)
2
256
GDDR5
1750
(7000)
8
150
?
ใ€์ด๋ก ์ ์ธ ์„ฑ๋Šฅ ๊ณ„์‚ฐ์‹ ํŽผ์น˜๊ธฐ ยท ์ ‘๊ธฐใ€‘
< ๋ฒ”์šฉ ์—ฐ์‚ฐ ์„ฑ๋Šฅ >
(GPU ํด๋Ÿญ) ร— (SP์˜ ๊ฐœ์ˆ˜) ร— 2 รท 1000 = (FP32 ์—ฐ์‚ฐ ์†๋„) [gflops]
(FP32 ์—ฐ์‚ฐ ์†๋„) รท 16 = (FP64 ์—ฐ์‚ฐ ์†๋„) [gflops]
(FP32 ์—ฐ์‚ฐ ์†๋„) ร— 2 = (FP16 ์—ฐ์‚ฐ ์†๋„) [gflops]
< ๊ทธ๋ž˜ํ”ฝ ๋ฉ”๋ชจ๋ฆฌ ์„ฑ๋Šฅ >
(๋ฉ”๋ชจ๋ฆฌ ๋ฒ„์Šค) รท 8 ร— (๋ฉ”๋ชจ๋ฆฌ ๋น„ํŠธ๋ ˆ์ดํŠธ) รท 1000 = (๋ฉ”๋ชจ๋ฆฌ ๋Œ€์—ญํญ) [gb/s]
ใ€์šฉ์–ด ์ „์ฒด ์ด๋ฆ„ ํŽผ์น˜๊ธฐ ยท ์ ‘๊ธฐใ€‘
Half-Precision Floating-Point = FP16
24-bit Precision Floating-Point = FP24
Single-Precision Floating-Point = FP32
Double-Precision Floating-Point = FP64
16-bit Integer = INT16
32-bit Integer = INT32
Asynchronous Compute Engine = ACE
Stream Processor = SP
Scalar Unit = SU
Total Board Power = TBP


3.3. gfx900 / Vega ๋งˆ์ดํฌ๋กœ์•„ํ‚คํ…์ฒ˜[ํŽธ์ง‘]

๋ชจ๋ธ๋ช…
GPU
๊ทธ๋ž˜ํ”ฝ ๋ฉ”๋ชจ๋ฆฌ
TGP
(W)
์ถœ๊ณ 
๊ฐ€๊ฒฉ
($)
์ฝ”๋“œ๋„ค์ž„
(๊ณต์ •)
(๋ฉด์ )
SP
(ACE, SU)
ํด๋Ÿญ
(๋ถ€์ŠคํŠธ)
(MHz)
L2
์บ์‹œ
๋ฉ”๋ชจ๋ฆฌ
(MB)
๋ฒ„์Šค
(bit)
๊ทœ๊ฒฉ
ํด๋Ÿญ
(๋น„ํŠธ๋ ˆ์ดํŠธ)
(MHz)
(Mbps)
์šฉ๋Ÿ‰
(GB)
Instinct MI60
Vega 20
(7 ใŽš)
(331 ใŽŸ)
4096
(4, 64)
1200
(1800)
4
4096
HBM2
1000
(2000)
32
300
?
Instinct MI50
3840
(4, 60)
1200
(1746)
4
4096
HBM2
1000
(2000)
16
300
?
Instinct MI25
Vega 10
(14 ใŽš)
(495 ใŽŸ)
4096
(4, 64)
1400
(1500)
4
2048
HBM2
852
(1704)
16
300
?
ใ€์ด๋ก ์ ์ธ ์„ฑ๋Šฅ ๊ณ„์‚ฐ์‹ ํŽผ์น˜๊ธฐ ยท ์ ‘๊ธฐใ€‘
< ๋ฒ”์šฉ ์—ฐ์‚ฐ ์„ฑ๋Šฅ >
(GPU ํด๋Ÿญ) ร— (SP์˜ ๊ฐœ์ˆ˜) ร— 2 รท 1000 = (FP32 ์—ฐ์‚ฐ ์†๋„) [gflops]
(FP32 ์—ฐ์‚ฐ ์†๋„) รท 16 = (FP64 ์—ฐ์‚ฐ ์†๋„) [gflops]
(FP32 ์—ฐ์‚ฐ ์†๋„) ร— 2 = (FP16 ์—ฐ์‚ฐ ์†๋„) [gflops]
< ๊ทธ๋ž˜ํ”ฝ ๋ฉ”๋ชจ๋ฆฌ ์„ฑ๋Šฅ >
(๋ฉ”๋ชจ๋ฆฌ ๋ฒ„์Šค) รท 8 ร— (๋ฉ”๋ชจ๋ฆฌ ๋น„ํŠธ๋ ˆ์ดํŠธ) รท 1000 = (๋ฉ”๋ชจ๋ฆฌ ๋Œ€์—ญํญ) [gb/s]
ใ€์šฉ์–ด ์ „์ฒด ์ด๋ฆ„ ํŽผ์น˜๊ธฐ ยท ์ ‘๊ธฐใ€‘
Half-Precision Floating-Point = FP16
24-bit Precision Floating-Point = FP24
Single-Precision Floating-Point = FP32
Double-Precision Floating-Point = FP64
16-bit Integer = INT16
32-bit Integer = INT32
Asynchronous Compute Engine = ACE
Stream Processor = SP
Scalar Unit = SU
Total Board Power = TBP


3.4. gfx900 / CDNA ๋งˆ์ดํฌ๋กœ์•„ํ‚คํ…์ฒ˜[ํŽธ์ง‘]

GPU๋ณ„ ์ตœ๋Œ€ ๋‚ด๋ถ€ ๊ตฌ์„ฑ ์š”์†Œ
GPU
์ด๋ฆ„
๊ณต์ •
(ใŽš)
๋ฉด์ 
(ใŽŸ)
HWS
ACE
GP
SE
PU
RZ
CU
SP
(FP32)
(INT32)
SU
RA
TFU
LDS
(KB)
L1
์บ์‹œ
๋ฉ”๋ชจ๋ฆฌ
(KB)
L2
์บ์‹œ
๋ฉ”๋ชจ๋ฆฌ
(MB)
GDS
(KB)
RB
ROP
MC
(bit)
(์ฑ„๋„)
CDNA
Arcturus
7
750
1
4
-
8
-
-
128
8192
128
-
-
64ร—128
16ร—128
8
64
-
-
1024ร—4



GPU๋ณ„ ํŠน์„ฑ
GPU
์ด๋ฆ„
๊ทธ๋ž˜ํ”ฝ
๊ฐ€์†
GPGPU
๊ฐ€์†
๋น„๋””์˜ค
๊ฐ€์†
ํ˜ธ์ŠคํŠธ
์ธํ„ฐํŽ˜์ด์Šค
๋ฉ”๋ชจ๋ฆฌ
๊ทœ๊ฒฉ
๋””์Šคํ”Œ๋ ˆ์ด
์ถœ๋ ฅ
CDNA
Arcturus
-
OpenCL 2.0
VCN 2.5
PCIe 4.0 ร—16
HBM2
-



๋ชจ๋ธ๋ช…
GPU
๊ทธ๋ž˜ํ”ฝ ๋ฉ”๋ชจ๋ฆฌ
TGP
(W)
์ถœ๊ณ 
๊ฐ€๊ฒฉ
($)
์ฝ”๋“œ๋„ค์ž„
(๊ณต์ •)
(๋ฉด์ )
SP
(ACE, SU)
ํด๋Ÿญ
(๋ถ€์ŠคํŠธ)
(MHz)
L2
์บ์‹œ
๋ฉ”๋ชจ๋ฆฌ
(MB)
๋ฒ„์Šค
(bit)
๊ทœ๊ฒฉ
ํด๋Ÿญ
(๋น„ํŠธ๋ ˆ์ดํŠธ)
(MHz)
(Mbps)
์šฉ๋Ÿ‰
(GB)
Instinct MI100
Arcturus
(7 ใŽš)
(750 ใŽŸ)
7680
(4, 120)
1000
(1502)
8
4096
HBM2
1200
(2400)
32
300
6400
ใ€์ด๋ก ์ ์ธ ์„ฑ๋Šฅ ๊ณ„์‚ฐ์‹ ํŽผ์น˜๊ธฐ ยท ์ ‘๊ธฐใ€‘
< ๋ฒ”์šฉ ์—ฐ์‚ฐ ์„ฑ๋Šฅ >
(GPU ํด๋Ÿญ) ร— (SP์˜ ๊ฐœ์ˆ˜) ร— 2 รท 1000 = (FP32 ์—ฐ์‚ฐ ์†๋„) [gflops]
(FP32 ์—ฐ์‚ฐ ์†๋„) รท 16 = (FP64 ์—ฐ์‚ฐ ์†๋„) [gflops]
(FP32 ์—ฐ์‚ฐ ์†๋„) ร— 2 = (FP16 ์—ฐ์‚ฐ ์†๋„) [gflops]
< ๊ทธ๋ž˜ํ”ฝ ๋ฉ”๋ชจ๋ฆฌ ์„ฑ๋Šฅ >
(๋ฉ”๋ชจ๋ฆฌ ๋ฒ„์Šค) รท 8 ร— (๋ฉ”๋ชจ๋ฆฌ ๋น„ํŠธ๋ ˆ์ดํŠธ) รท 1000 = (๋ฉ”๋ชจ๋ฆฌ ๋Œ€์—ญํญ) [gb/s]
ใ€์šฉ์–ด ์ „์ฒด ์ด๋ฆ„ ํŽผ์น˜๊ธฐ ยท ์ ‘๊ธฐใ€‘
Half-Precision Floating-Point = FP16
24-bit Precision Floating-Point = FP24
Single-Precision Floating-Point = FP32
Double-Precision Floating-Point = FP64
16-bit Integer = INT16
32-bit Integer = INT32
Asynchronous Compute Engine = ACE
Stream Processor = SP
Scalar Unit = SU
Total Board Power = TBP



2020๋…„ 11์›” 16์ผ์— ๋ฐœํ‘œ๋œ ์ปดํ“จํŒ… ์นด๋“œ๋กœ, NVIDIA๊ฐ€ A100๋ถ€ํ„ฐ TESLA ๋ธŒ๋žœ๋“œ ๋„ค์ž„์„ ์‚ญ์ œํ–ˆ๋˜ ๊ฒƒ์ฒ˜๋Ÿผ AMD๋„ ์ด๋•Œ๋ถ€ํ„ฐ ๋ผ๋ฐ์˜จ ๋ธŒ๋žœ๋“œ ๋„ค์ž„์„ ์‚ญ์ œํ•˜๊ณ  ์ธ์ŠคํŒ…ํŠธ๋งŒ ๋‚จ๊ฒŒ ๋˜์—ˆ๋‹ค. AMD๊ฐ€ 2020๋…„ 3์›” ํŒŒ์ด๋‚ธ์…œ ๋ฐ์ด๋ฅผ ํ†ตํ•ด ๊ฒŒ์ด๋ฐ์€ RDNA, ์ปดํ“จํŒ…์€ CDNA๋กœ ๋ถ„ํ™”ํ•  ๊ฒƒ์„ ์•ฝ์†ํ–ˆ๊ธฐ ๋•Œ๋ฌธ์— ์—ฐ์‚ฐ ํŠนํ™” ๋ชฉ์ ์— ๋งž๊ฒŒ ๊ทธ๋ž˜ํ”ฝ์Šค ๊ด€๋ จ ๊ธฐ๋Šฅ๋“ค์ด ์ „๋ถ€ ์‚ญ์ œ๋˜์—ˆ๋‹ค. ๊ทธ ๋Œ€์‹  ๋น„๋””์˜ค ๋””์ฝ”๋”ฉ, ์ธ์ฝ”๋”ฉ ๊ธฐ๋Šฅ๋งŒ ์กด์†๋˜์–ด ๋™์˜์ƒ ๊ฐ€์†์„ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.

์ด์ „ ์„ธ๋Œ€ ์ตœ์ƒ์œ„ ๋ผ์ธ์ด์—ˆ๋˜ MI60 ๋Œ€๋น„ FP32 ์—ฐ์‚ฐ ์„ฑ๋Šฅ์ด 1.5๋ฐฐ ํ–ฅ์ƒ๋˜์–ด, ๋จผ์ € ๋‚˜์˜จ ๊ฒฝ์Ÿ์‚ฌ์˜ A100๋ณด๋‹ค ์•ฝ 20% ๋†’์€ FP32 ์—ฐ์‚ฐ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ค€๋‹ค. ํ•˜์ง€๋งŒ, ๋ฉ”๋ชจ๋ฆฌ ๋Œ€์—ญํญ์ด MI60 ๋Œ€๋น„ 1.2๋ฐฐ ํ–ฅ์ƒ์— ๊ทธ์ณ์„œ ์‹ค์„ฑ๋Šฅ์€ ๋ฉ”๋ชจ๋ฆฌ ์„ฑ๋Šฅ ๋ณ‘๋ชฉ์œผ๋กœ 1.5๋ฐฐ๊ฐ€ ์•ˆ ๋  ํ™•๋ฅ ์ด ๋†’๋‹ค. ๋ฌธ์ œ๋Š” ๊ฐ™์€ ์‹œ๊ธฐ์— ๊ฒฝ์Ÿ์‚ฌ๊ฐ€ ๊ธฐ์กด A100์˜ ๋ฉ”๋ชจ๋ฆฌ ๊ทœ๊ฒฉ์„ HBM2์—์„œ HBM2E๋กœ ์—…๊ทธ๋ ˆ์ด๋“œํ•ด์„œ ๋ฉ”๋ชจ๋ฆฌ ๋Œ€์—ญํญ์ด ์•ฝ 30% ๋” ๋นจ๋ผ์ง„ ๊ฐœ์„ ํŒ์„ ๋‚ด๋†“์•˜๋‹ค๋Š” ์ .
์•ˆ ๊ทธ๋ž˜๋„ ๋ฉ”๋ชจ๋ฆฌ ๋Œ€์—ญํญ์ด ์•ฝ 1.23 TB/s๋ผ์„œ 1.56 TB/s์ธ ๊ธฐ์กด A100 ๋Œ€๋น„ ์•ฝ 20% ๋„˜๊ฒŒ ๋А๋ ธ๋Š”๋ฐ, HBM2E ๋ฒ„์ „์˜ ๋“ฑ์žฅ์œผ๋กœ ์•ฝ 2.04 TB/s ๋Œ€์—ญํญ์ด ๊ตฌํ˜„๋จ์— ๋”ฐ๋ผ 40% ๋„˜๋Š” ๊ฒฉ์ฐจ๋กœ ๋ฒŒ์–ด์กŒ๋‹ค. ๊ฒฐ๊ณผ์ ์ธ ์ ˆ๋Œ€ ์„ฑ๋Šฅ๋ฉด์—์„œ๋Š” A100์„ ๋„˜๋Š”๋ฐ ์‹คํŒจํ–ˆ๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์žˆ์œผ๋ฉฐ, ๊ฒฐ๊ตญ A100๋ณด๋‹ค ์ ˆ๋ฐ˜ ๊ฐ€๊นŒ์ด ์ €๋ ดํ•œ ๊ฐ€๊ฒฉ์œผ๋กœ ์Šน๋ถ€ ํ•ด์•ผ ํ•  ๊ฒƒ์œผ๋กœ ๋ณด์ธ๋‹ค.

4. ๊ด€๋ จ ๋ฌธ์„œ[ํŽธ์ง‘]

[1] ์˜ˆ๋ฅผ ๋“ค๋ฉด GeForce TITAN X์™€ ๊ฐ™๋‹ค๊ฑฐ๋‚˜. ํ•ด๋‹น ํ•ญ๋ชฉ ์ฐธ์กฐ.
[2] ATi์˜ ์˜ค๋žœ ์ „ํ†ต์— ๋”ฐ๋ผ, ์ด ์ œํ’ˆ ์—ญ์‹œ ํƒ€ ์ œํ’ˆ๊ตฐ์— ๋“ค์–ด๊ฐ€๋Š” VEGA 10 XT GPU์™€ ๋™์ผํ•˜๋‹ค. ํ˜„๊ธˆ์ด ๋ถ€์กฑํ•œ AMD ํŠน์„ฑ์ƒ ATi๋•Œ๋„ ํ•˜๋‚˜์˜ GPU๋ฅผ ์†Œํ”„ํŠธ์›จ์–ด๋ž‘ ๊ธฐํŒ๋งŒ ๋ฐ”๊ฟ”์„œ FireCL FireGL Radeon์— ๋Œ๋ ค์จ์™”์ง€๋งŒ,GCN ๋“ค์–ด์„œ๋Š” ์•„์˜ˆ ์•„ํ‚คํ…์ณ์˜ ๋ชฉํ‘œ ์ž์ฒด๊ฐ€ ๋‹จ์ผ ์•„ํ‚คํ…์ฒ˜๋กœ ์—ฐ์‚ฐ ๊ฒŒ์ด๋ฐ ๋ Œ๋”๋ง ๋ชจ๋‘์žก๊ฒ ๋‹ค ๋Š”๊ฒƒ์ด ๋ชฉํ‘œ์˜€๋‹ค. ๊ทธ๋ ‡๋‹ค๋ณด๋‹ˆ ์—”๋น„๋””์•„๋Š” ์ฟผ๋“œ๋กœ ํƒ€์ดํƒ„ ์ง€ํฌ์Šค ํ…Œ์Šฌ๋ผ ๊ฐ ์ œํ’ˆ๋“ค์˜ ์œ ๋‹› ๊ตฌ์„ฑ๋น„๊ฐ€ ๋‹ค๋ฅธ๋ฐ(์ง€ํฌ์Šค๋Š” ROP๋น„์ค‘์ด ๋†’๋‹ค๋˜๊ฐ€) AMD๋Š” ๊ทธ๋Ÿฐ๊ฑฐ ์—†๋‹ค.