Challenges in AI Infrastructure for Enterprise Foundation Models

Challenges in AI Infrastructure for

Enterprise Foundation Models

Jeffrey L. Burns, Ph.D.

Director, AI Compute and IBM Research AI Hardware Center

IBM Research

August 9, 2023

Foundation Models: An inflection point in

generalizable and adaptable representations

Expert Systems

Hand-crafted symbolic

representations

Machine Learning

Task-specific hand-crafted

feature representations

Deep Learning

Task-specific learnt

feature representations

1980s

1980s to 2012

Big data

Massive labeled data

Compute

Foundation Models

Generalizable & adaptable

learnt representations

Self-supervision at scale

Massive unlabeled data

Compute

2018+

Incredible opportunities around enterprise applications

In each of these domains there is ample unlabeled data

available in enterprises, which can be used to train custom

foundation models, potentially opening the doors for solving

business problems that were previously considered intractable.

Chemistry

& Materials

Sensor Data

Natural Language

Programming

Languages (Code)

Structured

Business

Data

Geospatial

Data

Speech

IT Data

Geospatial Foundation Models

Pre-trained on sufficient datasets in

partnership with content-rich institutions

(e.g. NASA)

Leverage self-supervised learning (i.e., masking

imagery or timeseries)

Able to effectively complete multiple downstream tasks

while meeting accuracy baselines (e.g., flood mapping, land

cover classification, outage prediction)

Note: while transformer architecture is most prevalent in foundation

models, definition not restricted by model architecture

IBM and NASA have teamed up to apply

foundation

model AI technology

to leverage earth science data

for

geospatial intelligence

This work with NASA is part of an effort across IBM

Research to pioneer

applications of foundation

models beyond language.

https://www.earthdata.nasa.gov/news/impact-ibm-hls-foundation-model

The flip side

“So, we think it’s fair to say

that, right now, access to

compute resources — at

the lowest total cost —

has become a

determining factor

for the success of AI

companies.”

a16z analysis, April 2023

Optimizing the infrastructure for Foundation Models

Across the whole AI workflow

InferenceModel adaptationDistributed training

and model validation

May have sensitivity to

latency/throughput,

always cost-sensitive

Long-running job on

massive infrastructure

Data preparation

e.g., remove hate

and profanity,

deduplicate, etc.

ELMo

BERT Large

GPT-1

GPT-2 RoBERTa Large

Megatron

ALBERT xxl

Microsoft T-NLG

ELECTRA Large

GPT-3

GShard

Baidu RecSys-C

Baidu RecSys-E

1.00E+07

1.00E+08

1.00E+09

1.00E+10

1.00E+11

1.00E+12

1.00E+13

1.00E+14

Model Size (# of Params)

2018 2019 2020

1 billion

1 trillion

Model tuning with

custom data set for

downstream tasks

AWS AzureOn-Prem

World-class infrastructure for training, tuning

and serving foundation models

(on-prem and in the cloud)

Platform that deliver portability and abstracts

infrastructure complexity

Middleware that simplifies end-to-end AI

workflow and optimizes use of underlying

infrastructure

Building the FM technology stack

AI-optimized infrastructure

Cloud-native design for

large-scale distributed

model training

Training: Vela

N. Wang et al, NeurIPS 2022

https://research.ibm.com/blog/AI-supercompu te r -Vela-GP U-cluster https://research.ibm.com/blog/ibm-artificial-intellige nce-unit-aiu

Inference: IBM AIU

Designed for energy-

efficient AI compute at

reduced precision

IBM Research AIU background

Gen-3 AI Core

Prototype

2019

AI accelerator

Core

IBM z16 Te l u m Chip

2022 GA

1 Gen-3 AI Core

AIU (Artifical Intelligence Unit)

2022

32 Gen-3 AI Cores

AIU overview:

• Complete AI accelerator, plugs into a standard PCIe slot

• 32 Gen-3 AI cores

• Optimized for AI inferencing, supports all operations for fine-tuning

and training as well

• Designed to ease cloud integration, enabled in Red Hat stack

• Support for all common neural network types

zAIU overview:

• One Gen-3 AI core, integrated in the z16

processor chip

• Off-loads AI tasks from the 8 CPU cores

• Optimized for in-transaction AI inferencing

• Seamless integration into z software stack

zAIU

IBM Artificial Intelligence Unit (AIU)

SoC implements IBM’s leadership innovations in low-

precision AI arithmetic and algorithms

– Chip architecture optimized for enterprise AI workloads,

including foundation models

– Enabled in the Red Hat and Foundation Models software

stacks

– Supports multi-precision inference (and some training)

FP16, FP8, INT8, INT4, INT2

– Implemented in leading edge 5nm technology

IBM AIU inference stack integrated with watsonx

TVM: Tensor Virtual

Machine, open-

source

framework/runtime

Model

Optimizer

(SenQNN)

Key challenge: develop the entire AIU software stack

in parallel with developing the SoC and PCIe card

PCIe

IBM AIU Card

watsonx Services

AIU (DeepTools) Compiler

Model Optimizer

AIU Runtime

AIU Driver

AI Framework Integration

User’s view: watsonx

services (only)

Internal software architecture

components

• Emulation systems have been essential for:

• Hardware verification: Uncover functional/performance bugs

• Software development: Provide platform for chip internal/external software development

IBM AIU emulation overview

IBM AIU

Synopsys ZeBu

• 96 Xilinx VU440 FPGAs

• Hardware verification

• Compiler / hardware co-

development

Synopsys HAPS

• 4-8 Xilinx VU440 FPGAs

• Device driver development

Full AIU computational emulation

SenLib

Driver

Virtual Machine

ZeBu Host PC

PCIe-Xtor

Software

PCIe-Xtor

Hardware

Virtual PCIe

Link

IBM AIU

§ Objective: high-fidelity model of all computational

elements – cores and interconnect – of the SoC

§ Model build:

§ ZeBu system from Synopsys

§ 96 Xilinx VU440 FPGAs

§ Very high fill rate, ~90% LUT utilization

§ 24h model build time (RTL to bitfiles)

§ 1 – 1.5 MHz operating frequency; limited by

memory interface

§ Impact highlights:

§ Found several high impact hardware bugs

§ Rare, hard to hit scenarios, practically

impossible to find in simulation

§ Vital for compiler development

§ Complete cycle-accurate processing of 1

image: 1 min on ZeBu vs. 9 hours in

simulation

Example

Number of different NNs exercised

Tests run (32 images/features per run)

100,000

Image/feature inferences completed

3.2 million

Total emulation run time

7000 hours

Equivalent SoC run time

7 hours

AIU nest emulation

Why a second emulation platform?

• Develop device driver stack for AIU: require SoC-

like hardware fidelity (e.g., host-PCIe interface)

Platform and model details:

• HAPS system from Synopsys

• 4-8 Xilinx VU440 FPGAs emulate a mini SoC

– SoC faithful nest + 1 AI core (vs 32 AI cores)

– Running at MHz speed

• Includes PCIe Gen5 PHY daughter card from

Synopsys

• Includes DDR4 DIMMs

• Uniquely suited for AIU driver development

– Faithfully realizes the host-PCIe interface of the

SoC

Network

HAPS runtime

(sec/image or

sec/feature)

ZeBu

runtime

(sec/image or

sec/feature)

ResNet50

1.46

10.02

MobileNetV1

0.59

3.37

InceptionV4

4.35

43.76

BERT

-large

(seq=384)

292

Modeling and emulation impact

§ Multiple software and FPGA-based methods have been essential to IBM’s full-stack AIU and AI

system development

§ Our SoC design process leverages multiple levels of simulation for architecture development, logic and

chip design, and design verification

§ Our software stack development, accelerator software integration development, and compiler /

hardware co-optimization leveraged FPGA-based emulation systems

§ Full-chip emulation via ZeBu for full-chip performance & accuracy analyses of AI models on multi-

core models, compiler optimizations, architectural modifications and power estimation

§ Detailed SoC nest emulation via HAPS for device driver development, low-level software stack

development, and evaluation of multi-chip configurations

§ These methods enabled us to develop a full system, end-to-end hardware and software stack for

Foundation Model inference in parallel to SoC and PCIe card development

Foundation Models are an inflection point for enterprise AI

Sevilla, arXiv ‘22

• Our approach emphasizes:

• Cloud-native architectures

• Ease-of-use for developers and

clients

• Hybrid cloud consumption

• AI accelerator design and

technology innovations

Innovation: Algorithm + Architecture

Augmented w/ Heterogeneous

Integration

Reduced Precision Arithmetic

Augmented w/ Analog Compute

+ +

• FMs enable a proliferation of task-

specific models, but with large and

escalating compute demands

• Inference, fine-tuning, and distributed

training systems differing in

requirements

• Full-system innovation is required