High-Performance
Computing

Supercomputers used for scientific research leverage AI accelerators for complex simulations, modeling, and data analysis.

Cloud Computing

Cloud service providers like AWS, Azure, and Google Cloud offer access to GPUs, FPGAs, and other AI accelerators via the cloud to their customers for AI workloads. The AI chips and modules are designed into their data center hardware.

Data Centers

The massive compute power needed for training and inference of large AI models relies on specialized AI chips integrated into servers and accelerators used in hyperscale data centers by companies like Google, Amazon, and Microsoft.

Autonomous Vehicles

AI chips power the advanced driver assistance systems (ADAS) and self-driving capabilities in autonomous vehicles across companies like Tesla, Ford, GM, and Waymo. They handle tasks like sensor fusion and driving decision making.

IoT Devices

Smaller AI chips find their way into smart home devices, wearables, robots, and industrial IoT to add automation and intelligence on-device at the edge. Custom ASICs or tiny ML modules are common.

Finance

AI algorithms for fraud detection, algorithmic trading, etc. rely on high-speed AI chips to crunch millions of data points and execute trades in real-time.

Physical Design Engineering Capabilities

Our engineers have extensive expertise implementing complex AI/ML, HPC, and advanced SoC designs at the latest manufacturing nodes:

7nm/6nm design experience - We are highly skilled in addressing the challenges of 7nm/6nm, from increased variation to complex layout requirements. Our 7nm design record is proven.
5nm/4nm capabilities - We are already enabling clients to be first to market with designs leveraging the next node jump to 5nm/4nm, optimizing PPA and leveraging our N5/N4 early access.
3nm early stage development - We are executing targeted 3nm test chip designs and process analysis to prime our skills for the next big node advance.
Advanced floorplanning - Our engineers excel at innovative floorplanning techniques like hierarchical methodology, congestion-driven planning, and IP block integration required at advanced nodes.
Leading-edge place and route - We leverage the most up-to-date tool flows for placement and routing at 7nm and below to achieve required design goals.
Timing closure expertise - Our team has repeatedly proven successful achieving timing on high-speed, high-complexity designs through signoff optimization techniques.
Power reduction capabilities - We possess extensive know-how using strategies like multi-voltage domains, power-aware analysis, and power gating to meet tight power budgets.
Full signoff competencies - We can deliver complete signoff-ready designs meeting timing, signal integrity, reliability, thermal, and power objectives at the latest process nodes.

Our advanced node physical design expertise enables clients to achieve their most ambitious chip implementations. Let us make your next 7nm, 5nm, or 3nm design a success.

Extremely congested placement -AI chips integrate massive matrices of processing elements, memory, and interconnects which makes placement extremely congested, especially at 7nm and below. This requires very complex floorplanning
Stringent power constraints - AI workloads running continuously at blazing fast speeds put huge demands on power consumption. Meeting dynamic and leakage power budgets across clock domains is very difficult.
High-speed timing closure - Supporting the fast computation speeds needed for AI leads to very aggressive timing requirements. Closing timing while minimizing power is a constant balancing act.
Thermal hotspots - The density and power consumption of AI chip architectures creates localized hotspots that can degrade performance and lifetime. Thermal-aware design techniques are critical.
Advanced node complexities - Smaller geometries at 7nm FinFET and below introduce more process variation, RC delay, and other physical effects that complicate design closure.
IP integration challenges - Carefully integrating critical IP blocks like HBM2 controllers, SerDes, and embedded processors with the custom AI logic is non-trivial and can impact PPA.
Insufficient modeling accuracy - The immaturity of advanced nodes means our modeling and analysis tools often lag behind in accuracy compared to silicon results. More guardbanding is required.
Verification of physical design - Running equivalence checking, DRC, LVS, and generating silicon-accurate power profiles all take substantial compute resources and expert skills.

Hierarchical floorplanning - Break down large matrices into hierarchical blocks with decoupling to isolate and reduce congestion.
Power gating and MTCMOS - Strategically cut power to inactive logic blocks. Use multi-threshold cell libraries to minimize leakage.
Multi-VT libraries - Employ both high speed and high density cells to optimize timing critical and non-critical paths.
Advanced node fins - Leverage FinFET technology to reduce voltage and density for power benefits.
Liquid cooling - Partner with thermal engineers to enable microchannel liquid cooling systems for hot spots.
EM/IR analysis - Run detailed electromigration and IR drop analysis to identify and fix reliability risks early.
DFM techniques - Insert redundant vias, enable rule compliance checking, and apply physical recommended practices.
hierarchical P&R - Divide-and-conquer approach to place and route blocks independently then integrate them.
Signoff tool expertise - Master advanced STA, formal verification, litho-friendly filling, and other signoff flows.
Guardbanding - Allocate extra margins into timing, power, and other constraints to compensate for inaccuracies.
IP-friendly floorplans - Floorplan around key integrated blocks early and budget for their requirements.
Congestion-driven synthesis - Iterate between implementation and RTL/synthesis to reduce routing hotspots.

Companies like Graphcore, Cerebras, and SambaNova are developing dedicated AI chips optimized for neural network workloads. These feature ultra-high parallelism and memory bandwidth.

Chips designed to accelerate specific parts of AI workloads, like tensor processing units (TPUs) from Google, Intel Spring Crest, and Habana accelerators from Intel. These attach to CPUs or GPUs.

Specialized high-bandwidth memory technologies reduce bottlenecks for AI chips. Examples are HBM from Samsung and MCDRAM from Intel.

These try to mimic the way neurons work through architecures like spiking neural networks. Examples are Intel's Loihi and research chips from IBM.

Field programmable gate arrays tuned for AI workloads by adding blocks for convolution, matrix math, and other operations. Xilinx, Intel, and others offer these.

Using light instead of electricity for chip connections enables high throughput at low power. Intel and others are researching this.

Project Background

Our team was tasked with helping implement a 5nm AI training accelerator chip for an emerging hyperscale cloud provider. This chip included 1024 energy-efficient cores optimized for low precision INT4 training workloads like natural language processing. The design incorporated 4th-gen Tensor Cores and integrated HBM3 memory.

Challenges Faced

The 5nm node introduced complexities like increased variation and RC delay effects.
We targeted an extremely high 1GHz frequency for the Tensor Cores to provide rapid training.
Congestion around the HBM3 tiles required intricate floorplanning techniques.
Stringent power constraints were enforced to minimize data center operational costs.

Our Solutions

Advanced 5nm multi-VT libraries and innovative EM/IR analysis were leveraged.
We utilized advanced thermal simulation and analysis tools for early hotspot detection.
An HPC-optimized hierarchical approach divided the chip into segments for P&R.
Progressive physical synthesis, congestion-driven mapping, and ECOs were applied.
Power-optimized clock gating, MTCMOS, and power-aware place-and-route were employed.

Results

The accelerator delivered impressive 1.2GHz operation and 2.5x performance gains over prior gen chips.
Signoff was achieved within an aggressive 9-month time frame to meet market windows.
Post-silicon, the chip achieved new records for TFLOPS/W energy efficiency.
Our team's 5nm expertise was instrumental in taping out this leading-edge design on schedule.

This project exemplified our physical design skills scaling to meet the demands of the latest AI training workloads for our customer. The methodologies we implemented will guide the development of future generations of accelerators. Our partnership with the customer continues to expand.

Project Background

Our team was engaged to design a new AI inference accelerator for a major cloud services provider. This 7nm chip would provide inference-as-a-service to cloud customers across application domains like computer vision, NLP, and recommendation systems. The accelerator had 384 INT8 cores and integrated HBM2E controllers.

Challenges Faced

The complex heterogeneous architecture resulted in highly congested placement and routing.
Multiple clock domains for the different blocks posed timing closure difficulties.
Power consumption had to be minimized to reduce data center TCO for the cloud provider.
The chip integrated multiple HBM2E stacks which drove challenging floorplanning.

Our Solutions

A hierarchical methodology was used dividing the chip into semi-independent blocks.
Congestion maps guided optimization of logic synthesis and placement to reduce hotspots.
Multi-voltage power domains, MTCMOS techniques, and clock gating optimized power.
Progressive STA closure was achieved through ECO fixes guided by timing reports.
The HBM2E tiles were strategically floorplanned using soft macros to reduce implementation disruption.

Results

The chip taped out within the 10 month schedule to hit product launch timelines.
Thermal and power goals were achieved, enabling strong data center economics.
The accelerator delivered up to 100 TOPS of low latency inference throughput for customers.
Our advanced physical design skills were critical in implementing this complex 7nm AI chip on schedule.
The engagement led to expanded business opportunities with the cloud provider.

This case study demonstrated our team's expertise in addressing the unique physical design challenges presented by advanced AI accelerator systems-on-chip at 7nm geometries. Our continued partnership with the customer is fueling innovation in cloud-scale AI infrastructure.

Project Background

We were brought on to help implement the physical design for a new 5nm AI inference chip for an autonomous vehicle customer. This chip would perform real-time sensor fusion and perception algorithms for self-driving tasks. Key requirements were high throughput, low latency, and automotive-grade reliability.

Challenges Faced

The advanced 5nm node's complexity exacerbated effects like variation, resistive faults, and NBTI aging.
Hundreds of unique voltage domains were required to minimize power across operating modes.
The design integrated multiple external sensors with high-speed SERDES interfaces.
Automotive-grade reliability mandated compliance to standards like ISO 26262.

Our Solutions

Rigorous EM/IR, fault simulation, and aging analysis validated reliability.
A hierarchical approach divided the large chip into separate power domains for P&R.
SerDes logic was optimized through SI-aware floorplanning and routing techniques.
We leveraged adaptive voltage scaling, MTCMOS, and multi-mode libraries to optimize power.
Auto-grade tool flows for safety, security, traceability, and documentation were used.

Results

The design achieved full ISO 26262 compliance and signoff within the 1-year timeline.
Post-fabrication, the AI chip demonstrated functional safety across use cases.
Thermal and power budgets were met, enabling integration into vehicle ADAS systems.
The SERDES interfaces provided the required data throughput from sensors.
This automotive project expanded our expertise into a key emerging market.

This case study highlights our team's success in applying physical design techniques to deliver a complex, reliable, and high-performance AI chip design for the automotive industry. Our auto expertise has been instrumental in winning future autonomous vehicle engagements.

Project Background

We were engaged by a supercomputing company to implement a 5nm AI accelerator design optimized for scientific HPC workloads like physics simulations, bioinformatics, and seismic modeling. The chip integrated thousands of low-precision matrix compute engines and HBM3 memory stacks.

Challenges Faced

The dense compute arrays required extremely complex hierarchical floorplanning and placement at 5nm.
Meeting frequency targets over 1GHz for the fast matrix units posed timing closure issues.
Power consumption had to be minimized to reduce supercomputer facility costs.
Large amounts of HBM3 memory drove challenges integrating the memory controllers.

Our Solutions

We leveraged progressive divide-and-conquer floorplanning and P&R to simplify the large matrix blocks.
HBM3 integration was improved through soft macro planning and pin stacking optimizations.
Advanced clock meshing, useful skew, and incremental STA optimization achieved timing closure.
Power reduction was achieved using multi-voltage domains, power-aware analysis, and gate-level optimizations.

Results

Our advanced 5nm expertise allowed taping out the accelerator within the 1-year timeline.
The HBM3 integration resulted in highly efficient memory throughput.
The compute engines operated at 1.25GHz, delivering record TFLOPS efficiency.
Post-silicon, the chip provided the massive speedups needed for HPC breakthroughs.
This project positioned us a key physical design partner for future supercomputing initiatives.

This case study demonstrates our proven physical design skills in implementing high-complexity, high-performance AI chip architectures for the HPC market. Our expertise was instrumental in the success of their next-generation supercomputer.

Project Background

We were brought on to implement the physical design for a new mobile AI accelerator chip for a leading smartphone vendor. Fabricated on a 5nm process, this chip would provide performant neural network inference in a power-constrained smartphone form factor.

Challenges Faced

Thermal dissipation was highly challenging given the strict power and size limits.
Meeting timing closure at 5nm while minimizing power required complex optimization.
The floorplan arrangement had to account for integration with the mobile SoC architecture.
High-speed interfaces were needed to connect to imaging sensors and modems.

Our Solutions

Liquid cooling techniques were explored along with thermal-aware placement and routing.
Progressive synthesis, incremental STA, and ECO optimizations achieved timing.
Power reduction was achieved by extensive clock gating and power-aware design.
The chip was floorplanned and pin-assigned to seamlessly integrate with the mobile SoC.
High-speed SERDES interfaces were optimized through isolation and SI management.

Results

Our mobile expertise allowed completing the demanding 5nm physical design on time.
Thermal simulation validated safe operation within smartphone power and form factor constraints.
The chip delivered leading-edge AI inference performance per watt.
Post-launch, the AI accelerator received strong reviews for enabling new camera and AI features.
This successful chip increased our engagements across their mobile roadmap.

This case study demonstrates our team’s skills in addressing the tough integration and optimization challenges involved in implementing advanced mobile AI. Our partnership with the customer continues to fuel their smartphone AI capabilities.

"TeamUP has been our preferred firm to work with to fill our specialized engineering needs. TeamUP has consistently provided us with very experienced and highly qualified candidates to complement our experienced full-time staff. We now use TeamUP as our main agency for our engineering needs and I can highly recommend their service"

Marc, Sr. Design Director, Amazon

"If I look at the world, you’ve got a thousand software engineers, you’ve got a hundred silicon hardware engineers, and then you have one or two CPU development engineers, scale wise, and I've been working with four or five other suppliers specifically trying to find CPU development skills. TeamUP was the only one in the last four months that provided engineers that have actually worked inside a CPU with development experience. They have been able to get me the contractors I need."

Kip

Principal Manager, Logic Design and Verification | Microsoft

"We’ve used dozens of contractors from TeamUP, ranging from physcial design, analog layout, analog design, RTL, HW, DV, DFT to CAD. Our technical bar is high and our needs are specific. TeamUP listens to what we’re looking for and delivers solutions to our needs timely. What makes them stand out from other service providers is that they are assertive; yet, not pushy. They are certainly a valuable business partner.”

Alinna

Recruiting Manager | GOODIX Technology, Inc

"The methodologies they pioneered at 5nm paved the way for our next generation of high-performance AI cores. An invaluable collaborator."

Lead Physical Design Engineer

ASIC Design Services | Marvell Technologies

"Our most difficult mobile AI chip tapeout would not have been possible without their skills optimizing power, performance, and area at 5nm. Top notch team."

Lead Physical Design Engineer

Samsung

"A key reason for the success of our latest AI accelerator. Their physical design team works seamlessly with our overall engineering organization."

Director of Engineering

AI accelerator, Stealth Mode Startup

Physical Design Engineering Services

High-Performance
Computing

Cloud Computing

Data Centers

Autonomous Vehicles

IoT Devices

Finance

Physical Design Engineering Capabilities

Our engineers have extensive expertise implementing complex AI/ML, HPC, and advanced SoC designs at the latest manufacturing nodes:

Cutting Edge Technology

Common challenges we have helped teams alleviate

Solutions we have implemented to these challenges

Semiconductor Components We Help Teams Design

Companies like Graphcore, Cerebras, and SambaNova are developing dedicated AI chips optimized for neural network workloads. These feature ultra-high parallelism and memory bandwidth.

Chips designed to accelerate specific parts of AI workloads, like tensor processing units (TPUs) from Google, Intel Spring Crest, and Habana accelerators from Intel. These attach to CPUs or GPUs.

Specialized high-bandwidth memory technologies reduce bottlenecks for AI chips. Examples are HBM from Samsung and MCDRAM from Intel.

These try to mimic the way neurons work through architecures like spiking neural networks. Examples are Intel's Loihi and research chips from IBM.

Field programmable gate arrays tuned for AI workloads by adding blocks for convolution, matrix math, and other operations. Xilinx, Intel, and others offer these.

Using light instead of electricity for chip connections enables high throughput at low power. Intel and others are researching this.

Successful Projects and Case Studies

Don't compromise on your chip's PPA goals. TeamUP's dedicated physical design services for AI chips, SoCs, and more provide the expertise needed for breakthrough products. Let's connect.

Physical Design Engineering Services

High-PerformanceComputing

Cloud Computing

Data Centers

Autonomous Vehicles

IoT Devices

Finance

Physical Design Engineering Capabilities

Our engineers have extensive expertise implementing complex AI/ML, HPC, and advanced SoC designs at the latest manufacturing nodes:

Cutting Edge Technology

Common challenges we have helped teams alleviate

Solutions we have implemented to these challenges

Semiconductor Components We Help Teams Design

Companies like Graphcore, Cerebras, and SambaNova are developing dedicated AI chips optimized for neural network workloads. These feature ultra-high parallelism and memory bandwidth.

Chips designed to accelerate specific parts of AI workloads, like tensor processing units (TPUs) from Google, Intel Spring Crest, and Habana accelerators from Intel. These attach to CPUs or GPUs.

Specialized high-bandwidth memory technologies reduce bottlenecks for AI chips. Examples are HBM from Samsung and MCDRAM from Intel.

These try to mimic the way neurons work through architecures like spiking neural networks. Examples are Intel's Loihi and research chips from IBM.

Field programmable gate arrays tuned for AI workloads by adding blocks for convolution, matrix math, and other operations. Xilinx, Intel, and others offer these.

Using light instead of electricity for chip connections enables high throughput at low power. Intel and others are researching this.

Successful Projects and Case Studies

Don't compromise on your chip's PPA goals. TeamUP's dedicated physical design services for AI chips, SoCs, and more provide the expertise needed for breakthrough products. Let's connect.

High-Performance
Computing