TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection 11-18
VL-Cache: Sparsity and Modality-Aware KV Cache Compression for Vision-Language Model Inference Acceleration 11-18
The CoT Collection Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning 11-14
Scientific Beta Multi-Beta Multi-Strategy Indices Implementing Multi-Factor Equity Portfolios with Smart Factor Indices 11-11
RAG4ITOps A Supervised Fine-Tunable and Comprehensive RAG Framework for IT Operations and Maintenance 11-11
Enabling Tensor Language Model to Assist in Generating High-Performance Tensor Programs for Deep Learning 11-07
BUZZ Beehive-structured Sparse KV Cache with Segmented Heavy Hitters for Efficient LLM Inference 11-06
SpecExec Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices 11-04
GraphPipe Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism 11-04
KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches 11-01
논문 정리 Chameleon Adaptive Code Optimization for Expedited Deep Neural Network Compilation(ICLR 2020) 02-12
간단논문 정리 Fast and Effective Orchestration of Compiler Optimizations(Zhelong Pan,Rudolf Eigenmann;Purdue University ;CGO’06) 02-12