Usenix Atc 25 Clone Customizing Llms For Efficient Latency Aware Inference At The Edge

Media Summary: Presenter: Dong-Woo Kim - Department of Smart Factory Convergence 1. Title: CLONE: Customizing LLMs for Efficient Latency ... Optimus: Accelerating Large-Scale Multi-Modal Joint Keynote Address: Accelerating Software Development: The

Usenix Atc 25 Clone Customizing Llms For Efficient Latency Aware Inference At The Edge - Detailed Analysis & Overview

Presenter: Dong-Woo Kim - Department of Smart Factory Convergence 1. Title: CLONE: Customizing LLMs for Efficient Latency ... Optimus: Accelerating Large-Scale Multi-Modal Joint Keynote Address: Accelerating Software Development: The Resource Multiplexing in Tuning and Serving Large Language Models Yongjun He and Haofeng Yang, ETH Zurich; Yao Lu, ... Multi-agent AI systems are becoming essential for multi-step production applications, but require specialized models that are ... In this video, you will see a practical example of Agentic AI using a compound workflow of multiple large language models (

Photo Gallery

USENIX ATC '25 - CLONE: Customizing LLMs for Efficient Latency-Aware Inference at the Edge

CLONE: Customizing LLMs for Efficient Latency-Aware Inference at the Edge

USENIX Security '25 - ELFuzz: Efficient Input Generation via LLM-driven Synthesis Over Fuzzer Space

USENIX ATC '25 - Optimus: Accelerating Large-Scale Multi-Modal LLM Training by Bubble Exploitation

ATC '25 and OSDI '25 -Joint Keynote Address: Accelerating Software Development: The LLM (R)evolution

USENIX ATC '25 - Resource Multiplexing in Tuning and Serving Large Language Models

USENIX ATC '25 - Efficient Performance-Aware GPU Sharing with Compatibility and Isolation through...

NSDI '26 - HydraServe: Minimizing Cold Start Latency for Serverless LLM Serving in Public Clouds

AWS re:Invent 2025 - Fine-tuning LLMs for Multi-Agent Orchestration: Cosine AI Case Study (SPS402)

Chaining LLMs for Better Results using Agentic AI

NSDI '26 - Cortex: Achieving Low-Latency, Cost-Efficient Remote Data Access For LLM via

USENIX ATC '24 - FwdLLM: Efficient Federated Finetuning of Large Language Models with Perturbed...

View Detailed Profile

USENIX ATC '25 - CLONE: Customizing LLMs for Efficient Latency-Aware Inference at the Edge

USENIX ATC '25 - CLONE: Customizing LLMs for Efficient Latency-Aware Inference at the Edge

CLONE

CLONE: Customizing LLMs for Efficient Latency-Aware Inference at the Edge

CLONE: Customizing LLMs for Efficient Latency-Aware Inference at the Edge

Presenter: Dong-Woo Kim - Department of Smart Factory Convergence 1. Title: CLONE: Customizing LLMs for Efficient Latency ...

USENIX Security '25 - ELFuzz: Efficient Input Generation via LLM-driven Synthesis Over Fuzzer Space

USENIX Security '25 - ELFuzz: Efficient Input Generation via LLM-driven Synthesis Over Fuzzer Space

ELFuzz:

USENIX ATC '25 - Optimus: Accelerating Large-Scale Multi-Modal LLM Training by Bubble Exploitation

USENIX ATC '25 - Optimus: Accelerating Large-Scale Multi-Modal LLM Training by Bubble Exploitation

Optimus: Accelerating Large-Scale Multi-Modal

ATC '25 and OSDI '25 -Joint Keynote Address: Accelerating Software Development: The LLM (R)evolution

ATC '25 and OSDI '25 -Joint Keynote Address: Accelerating Software Development: The LLM (R)evolution

Joint Keynote Address: Accelerating Software Development: The

USENIX ATC '25 - Resource Multiplexing in Tuning and Serving Large Language Models

USENIX ATC '25 - Resource Multiplexing in Tuning and Serving Large Language Models

Resource Multiplexing in Tuning and Serving Large Language Models Yongjun He and Haofeng Yang, ETH Zurich; Yao Lu, ...

USENIX ATC '25 - Efficient Performance-Aware GPU Sharing with Compatibility and Isolation through...

USENIX ATC '25 - Efficient Performance-Aware GPU Sharing with Compatibility and Isolation through...

Efficient

NSDI '26 - HydraServe: Minimizing Cold Start Latency for Serverless LLM Serving in Public Clouds

NSDI '26 - HydraServe: Minimizing Cold Start Latency for Serverless LLM Serving in Public Clouds

HydraServe: Minimizing Cold Start

AWS re:Invent 2025 - Fine-tuning LLMs for Multi-Agent Orchestration: Cosine AI Case Study (SPS402)

AWS re:Invent 2025 - Fine-tuning LLMs for Multi-Agent Orchestration: Cosine AI Case Study (SPS402)

Multi-agent AI systems are becoming essential for multi-step production applications, but require specialized models that are ...

Chaining LLMs for Better Results using Agentic AI

Chaining LLMs for Better Results using Agentic AI

In this video, you will see a practical example of Agentic AI using a compound workflow of multiple large language models (

NSDI '26 - Cortex: Achieving Low-Latency, Cost-Efficient Remote Data Access For LLM via

NSDI '26 - Cortex: Achieving Low-Latency, Cost-Efficient Remote Data Access For LLM via

Cortex: Achieving Low-

USENIX ATC '24 - FwdLLM: Efficient Federated Finetuning of Large Language Models with Perturbed...

USENIX ATC '24 - FwdLLM: Efficient Federated Finetuning of Large Language Models with Perturbed...

FwdLLM:

SREcon26 Americas - Observability for LLMs: Understanding What’s Happening Under the Hood

SREcon26 Americas - Observability for LLMs: Understanding What’s Happening Under the Hood

Observability for