Jaehun's Blog

For Efficient AI


  • 홈

  • 카테고리

  • 태그

  • 아카이브

  • About

  • 검색

Abseil Tip 146 기본 초기화와 값 초기화

작성일 2024-12-10 | In cpp , abseil , |

Dominic Hamon (dominic@google.com) 작성
최초 게시일: 2018년 4월 19일
최종 업데이트: 2020년 4월 6일

Read more »

Abseil Tip 132 Avoid Redundant Map Lookups

작성일 2024-12-10 | In cpp , abseil , |

Matt Kulukundis (kfm@google.com) 작성
최초 게시일: 2017년 3월 30일
최종 업데이트: 2019년 11월 25일

Read more »

Abseil Tip 108 std::bind를 피하세요

작성일 2024-12-10 | In cpp , abseil , |

Roman Perepelitsa (roman.perepelitsa@gmail.com) 작성
최초 게시일: 2016년 1월 7일
최종 업데이트: 2020년 8월 19일

Read more »

Benchmarks as Limits to Arbitrage: Understanding the Low-Volatility Anomaly

작성일 2024-12-10 | In paper-review , with-gpt , finance , |

논문 링크

Read more »

High Idiosyncratic Volatility and Low Returns: International and Further U.S. Evidence

작성일 2024-12-10 | In paper-review , with-gpt , finance , |

논문 링크

Read more »

CHAI: Clustered Head Attention for Efficient LLM Inference

작성일 2024-12-10 | In paper-review , with-gpt , LLM-Inference , |

논문 링크

Read more »

QAQ: Quality Adaptive Quantization for LLM KV Cache

작성일 2024-12-10 | In paper-review , with-gpt , LLM-Inference , |

논문 링크

Read more »

Transformers are Multi-State RNNs

작성일 2024-12-10 | In paper-review , with-gpt , LLM , |

논문 링크

Read more »

Compressed Context Memory For Online Language Model Interaction

작성일 2024-12-10 | In paper-review , with-gpt , LLM-Inference , |

논문 링크

Read more »

CacheGen: KV Cache Compression and Streaming for Fast Large Language Model Serving

작성일 2024-12-10 | In paper-review , with-gpt , LLM-Inference , |

논문 링크

Read more »

LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression

작성일 2024-12-10 | In paper-review , with-gpt , LLM-Inference , |

논문 링크

Read more »

Galactica: A Large Language Model for Science

작성일 2024-12-10 | In paper-review , with-gpt , LLM , |

논문 링크

Read more »

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

작성일 2024-12-10 | In paper-review , with-gpt , LLM , |

논문 링크

Read more »

Abseil Tip 182 정수형 변수를 초기화하세요!

작성일 2024-12-09 | In cpp , abseil , |

주간 팁 #182: 정수형 변수를 초기화하세요!

Read more »

Abseil Tip 180 Dangling References(유효하지 않은 참조) 피하기

작성일 2024-12-09 | In cpp , abseil , |

주간 팁 #180: Dangling References(유효하지 않은 참조) 피하기

Read more »

Abseil Tip 158 Abseil 연관 컨테이너와 contains()

작성일 2024-12-09 | In cpp , abseil , |

주간 팁 #158: Abseil 연관 컨테이너와 contains()

Read more »

Abseil Tip 147 Exhaustive switch 문을 책임감 있게 사용하기

작성일 2024-12-09 | In cpp , abseil , |

주간 팁 #147: Exhaustive switch 문을 책임감 있게 사용하기

Read more »

Momentum Strategies

작성일 2024-12-09 | In paper-review , with-gpt , finance , |

논문 링크

Read more »

Mixed Precision Quantization

작성일 2024-12-09 | In paper-review , with-gpt , LLM-Inference , |

논문 링크

Read more »

WKVQuant: Quantizing Weight and Key/Value Cache for Large Language Models Gains More

작성일 2024-12-09 | In paper-review , with-gpt , LLM-Inference , |

논문 링크

Read more »

Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference

작성일 2024-12-09 | In paper-review , with-gpt , LLM-Inference , |

논문 링크

Read more »

KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache

작성일 2024-12-09 | In paper-review , with-gpt , LLM-Inference , |

논문 링크

Read more »

KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

작성일 2024-12-09 | In paper-review , with-gpt , LLM-Inference , |

논문 링크

Read more »

DeepCache: Accelerating Diffusion Models for Free

작성일 2024-12-09 | In paper-review , with-gpt , LLM-Inference , |

논문 링크

Read more »

Abseil Tip 90 Retired Flags(사용 중단된 플래그)

작성일 2024-12-08 | In cpp , abseil , |

주간 팁 #90: Retired Flags(사용 중단된 플래그)

Read more »

Abseil Tip 45 플래그를 피하라, 특히 라이브러리 코드에서

작성일 2024-12-08 | In cpp , abseil , |

주간 팁 #45: 플래그를 피하라, 특히 라이브러리 코드에서

Read more »

Abseil Tip 103 플래그는 전역 변수입니다

작성일 2024-12-08 | In cpp , abseil , |

주간 팁 #103: 플래그는 전역 변수입니다

Read more »

Improving Language Understanding by Generative Pre-Training

작성일 2024-12-08 | In paper-review , with-gpt , LLM , |

논문 링크

Read more »

QAQ: Quality Adaptive Quantization for LLM KV Cache

작성일 2024-12-08 | In paper-review , with-gpt , LLM-Inference , |

논문 링크

Read more »

PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU

작성일 2024-12-08 | In paper-review , with-gpt , LLM-Inference , |

논문 링크

Read more »

PaLM 2 Technical Report

작성일 2024-12-08 | In paper-review , with-gpt , LLM , |

논문 링크

Read more »

Abseil Tip 153 using-directives를 사용하지 마세요

작성일 2024-12-06 | In cpp , abseil , |

title: “주간 팁 #153: using-directives를 사용하지 마세요” layout: tips sidenav: side-nav-tips.html published: true permalink: tips/153 type: markdown order: “153” —

Read more »

Abseil Tip 152 AbslHashValue과 함께

작성일 2024-12-06 | In cpp , abseil , |

title: “주간 팁 #152: AbslHashValue과 함께” layout: tips sidenav: side-nav-tips.html published: true permalink: tips/152 type: markdown order: “152” —

Read more »

Abseil Tip 144 연관 컨테이너에서의 이종 조회(Heterogeneous Lookup)

작성일 2024-12-06 | In cpp , abseil , |

주간 팁 #144: 연관 컨테이너에서의 이종 조회(Heterogeneous Lookup)

Read more »

Abseil Tip 136 Unordered Containers

작성일 2024-12-06 | In cpp , abseil , |

주간 팁 #136: Unordered Containers

Read more »

ALISA: Accelerating Large Language Model Inference via Sparsity-Aware KV Caching

작성일 2024-12-06 | In paper-review , with-gpt , LLM-Inference , |

논문 링크

Read more »

LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression

작성일 2024-12-06 | In paper-review , with-gpt , LLMLingua-2 , LLM-Inference , |

논문 링크

Read more »

FASTDECODE: High-Throughput GPU-Efficient LLM Serving using Heterogeneous Pipelines

작성일 2024-12-06 | In paper-review , with-gpt , FASTDECODE , LLM-Inference , |

논문 링크

Read more »

Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference

작성일 2024-12-06 | In paper-review , with-gpt , Dynamic Memory Compression , LLM-Inference , |

논문 링크

Read more »

Fast Inference from Transformers via Speculative Decoding

작성일 2024-12-06 | In paper-review , with-gpt , LLM-Infernce , Speculative Decoding , |

논문 링크

Read more »

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

작성일 2024-12-06 | In paper-review , with-gpt , LLM , |

논문 링크

Read more »

Abseil Tip 24 복사, 축약

작성일 2024-12-05 | In cpp , abseil , |

title: “이번 주의 팁 #24: 복사, 축약” layout: tips sidenav: side-nav-tips.html published: true permalink: tips/24 type: markdown order: “024” —

Read more »

Abseil Tip 149 Object Lifetimes vs = delete

작성일 2024-12-05 | In cpp , abseil , |

title: “이번 주의 팁 #149: 객체 수명 vs. = delete” layout: tips sidenav: side-nav-tips.html published: true permalink: tips/149 type: markdown order: “149” —

Read more »

Abseil Tip 148 Overload Sets

작성일 2024-12-05 | In cpp , abseil , |

원래 TotW #148로 2018년 5월 3일 게시됨

Read more »

Abseil Tip 117 복사 생략과 값으로 전달하기

작성일 2024-12-05 | In cpp , abseil , |

원래 TotW #117로 2016년 6월 8일 게시됨

Read more »

HIERARCHICAL CONTEXT MERGING: BETTER LONG CONTEXT UNDERSTANDING FOR PRE-TRAINED LLMS

작성일 2024-12-05 | In paper-review , with-gpt , |

논문 링크

Read more »

MuxServe:FlexibleSpatial-TemporalMultiplexingforMultipleLLMServing

작성일 2024-12-05 | In paper-review , with-gpt , |

논문 링크

Read more »

Transformer-Lite: High-efficiency Deployment of Large Language Models on Mobile Phone GPUs

작성일 2024-12-05 | In paper-review , with-gpt , |

논문 링크

Read more »

ALISA: Accelerating Large Language Model Inference via Sparsity-Aware KV Caching

작성일 2024-12-05 | In paper-review , with-gpt , |

논문 링크

Read more »

LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression

작성일 2024-12-05 | In paper-review , with-gpt , |

논문 링크

Read more »

MELTing point: Mobile Evaluation of Language Transformers

작성일 2024-12-05 | In paper-review , with-gpt , |

논문 링크

Read more »

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

작성일 2024-12-05 | In paper-review , with-gpt , |

논문 링크

Read more »

Tree of Thoughts: Deliberate Problem Solving with Large Language Models

작성일 2024-12-05 | In paper-review , with-gpt , |

논문 링크

Read more »

The Flan Collection: Designing Data and Methods for Effective Instruction Tuning

작성일 2024-12-05 | In paper-review , with-gpt , |

논문 링크

Read more »

Abseil Tip 143 C++11 삭제된 함수 (= delete)

작성일 2024-12-04 | In cpp , abseil , |

주간 팁 #143: C++11 삭제된 함수 (= delete)

Read more »

Abseil Tip 120 반환 값은 건드리지 마세요

작성일 2024-12-04 | In cpp , abseil , |

주간 팁 #120: 반환 값은 건드리지 마세요

Read more »

Abseil Tip 11 반환 정책

작성일 2024-12-04 | In cpp , abseil , |

주간 팁 #11: 반환 정책

Read more »

CORM: Cache Optimization with Recent Message for Large Language Model Inference

작성일 2024-12-04 | In paper-review , with-gpt , |

논문 링크

Read more »

Retrieval Head Mechanistically Explains Long-Context Factuality

작성일 2024-12-04 | In paper-review , with-gpt , |

논문 링크

Read more »

SnapKV: LLM Knows What You are Looking for Before Generation

작성일 2024-12-04 | In paper-review , with-gpt , |

논문 링크

Read more »

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

작성일 2024-12-04 | In paper-review , with-gpt , |

논문 링크

Read more »

SqueezeAttention: 2D Management of KV-Cache in LLM Inference via Layer-wise Optimal Budget

작성일 2024-12-04 | In paper-review , with-gpt , |

논문 링크

Read more »

Toward Inference-optimal Mixture-of-Expert Large Language Models

작성일 2024-12-04 | In paper-review , with-gpt , |

논문 링크

Read more »

Mistral 7B

작성일 2024-12-04 | In paper-review , with-gpt , |

논문 링크

Read more »

Llama 2: Open Foundation and Fine-Tuned Chat Models

작성일 2024-12-04 | In paper-review , with-gpt , |

논문 링크

Read more »

Abseil Tip 93 absl::Span 사용하기

작성일 2024-12-03 | In cpp , abseil , |

Read more »

Abseil Tip 61 기본 멤버 초기화 (Default Member Initializers)

작성일 2024-12-03 | In cpp , abseil , |

Read more »

Abseil Tip 141 bool로의 암시적 변환에 주의하라

작성일 2024-12-03 | In cpp , abseil , |

주간 팁 #141: bool로의 암시적 변환에 주의하라

Read more »

Abseil Tip 134 make_unique와 private 생성자

작성일 2024-12-03 | In cpp , abseil , |

주간 팁 #134: make_unique와 private 생성자

Read more »

MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases

작성일 2024-12-03 | In paper-review , with-gpt , |

논문 링크

Read more »

PowerInfer-2: Fast Large Language Model Inference on a Smartphone

작성일 2024-12-03 | In paper-review , with-gpt , |

논문 링크

Read more »

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

작성일 2024-12-03 | In paper-review , with-gpt , |

논문 링크

Read more »

RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation

작성일 2024-12-03 | In paper-review , with-gpt , |

논문 링크

Read more »

Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration

작성일 2024-12-03 | In paper-review , with-gpt , |

논문 링크

Read more »

Tree-based Speculative Inference and Verification

작성일 2024-12-03 | In paper-review , with-gpt , |

논문 링크

Read more »

Fast Inference from Transformers via Speculative Decoding

작성일 2024-12-03 | In paper-review , with-gpt , |

논문 링크

Read more »

Abseil Tip 88 초기화 방법 =, (), 그리고 {}

작성일 2024-12-02 | In cpp , abseil , |

title: “Tip of the Week #88: 초기화 방법: =, (), 그리고 {}” layout: tips sidenav: side-nav-tips.html published: true permalink: tips/88 type: markdown order: “088” —

Read more »

Abseil Tip 59 튜플 연결하기

작성일 2024-12-02 | In cpp , abseil , |

title: “Tip of the Week #59: 튜플 연결하기” layout: tips sidenav: side-nav-tips.html published: true permalink: tips/59 type: markdown order: “059” —

Read more »

Abseil Tip 142 다중 매개변수 생성자와 explicit

작성일 2024-12-02 | In cpp , abseil , |

title: “Tip of the Week #142: 다중 매개변수 생성자와 explicit” layout: tips sidenav: side-nav-tips.html published: true permalink: tips/142 type: markdown order: “142” —

Read more »

KV Cache Compression

작성일 2024-12-02 | In paper-review , with-gpt , |

논문 링크

Read more »

PyramidInfer: Pyramid KV Cache Compression for High-throughput LLM Inference

작성일 2024-12-02 | In paper-review , with-gpt , |

논문 링크

Read more »

Layer-Condensed KV Cache for Efficient Inference of Large Language Models

작성일 2024-12-02 | In paper-review , with-gpt , |

논문 링크

Read more »

SKVQ:Sliding-window Key and Value Cache Quantization for Large Language Models

작성일 2024-12-02 | In paper-review , with-gpt , |

논문 링크

Read more »

You Only Cache Once: Decoder-Decoder Architectures for Language Models

작성일 2024-12-02 | In paper-review , with-gpt , |

논문 링크

Read more »

Abseil Tip 36 새로운 Join API

작성일 2024-11-29 | In cpp , abseil , |

title: “주간 팁 #131: 특별 멤버 함수와 = default” layout: tips sidenav: side-nav-tips.html published: true permalink: tips/131 type: markdown order: “131” —

Read more »

Abseil Tip 3 문자열 연결과 operator+ vs. StrCat()

작성일 2024-11-29 | In cpp , abseil , |

title: “이번 주의 팁 #3: 문자열 연결과 operator+ vs. StrCat()” layout: tips sidenav: side-nav-tips.html published: true permalink: tips/3 type: markdown order: “003” —

Read more »

Abseil Tip 10 문자열 분리, 골치 아프지 않게

작성일 2024-11-29 | In cpp , abseil , |

title: “이번 주의 팁 #10: 문자열 분리, 골치 아프지 않게!” layout: tips sidenav: side-nav-tips.html published: true permalink: tips/10 type: markdown order: “010” —

Read more »

LoCoCo: Dropping In Convolutions for Long Context Compression

작성일 2024-11-29 | In paper-review , with-gpt , |

논문 링크

Read more »

Loki: Low-rank Keys for Efficient Sparse Attention

작성일 2024-11-29 | In paper-review , with-gpt , |

논문 링크

Read more »

PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling

작성일 2024-11-29 | In paper-review , with-gpt , |

논문 링크

Read more »

MiniCache: KV Cache Compression in Depth Dimension for Large Language Models

작성일 2024-11-29 | In paper-review , with-gpt , |

논문 링크

Read more »

Reducing Transformer Key-Value Cache Size with Cross-Layer Attention

작성일 2024-11-29 | In paper-review , with-gpt , |

논문 링크

Read more »

Abseil Tip 74 위임 생성자와 상속 생성자

작성일 2024-11-28 | In cpp , abseil , |

원래 2014-04-21에 totw/74로 게시됨
작성자: Bradley White (bww@google.com)

Read more »

Abseil Tip 42 초기화 메서드보다 팩토리 함수를 선호하세요

작성일 2024-11-28 | In cpp , abseil , |

title: “주간 팁 #42: 초기화 메서드보다 팩토리 함수를 선호하세요” layout: tips sidenav: side-nav-tips.html published: true permalink: tips/42 type: markdown order: “042” —

Read more »

Abseil Tip 131 Special 멤버 함수와 = default

작성일 2024-11-28 | In cpp , abseil , |

title: “주간 팁 #131: 특별 멤버 함수와 = default” layout: tips sidenav: side-nav-tips.html published: true permalink: tips/131 type: markdown order: “131” —

Read more »

Attention Score is not All You Need for Token Importance Indicator in KV Cache Reduction: Value Also Matters

작성일 2024-11-28 | In paper-review , with-gpt , |

논문 링크

Read more »

CItruS : ChunkedInstruction-aware State Eviction for Long Sequence Modeling

작성일 2024-11-28 | In paper-review , with-gpt , |

논문 링크

Read more »

ASimple and Effective L2 Norm-Based Strategy for KV Cache Compression

작성일 2024-11-28 | In paper-review , with-gpt , |

논문 링크

Read more »

MLKV:Multi-Layer Key-Value Heads for Memory Efficient Transformer Decoding

작성일 2024-11-28 | In paper-review , with-gpt , |

논문 링크

Read more »

Effectively Compress KV Heads for LLM

작성일 2024-11-28 | In paper-review , with-gpt , |

논문 링크

Read more »
1 2 3 4 5
류재훈

류재훈

444 포스트
23 카테고리
1 태그
RSS
e-mail Linkedin
© 2025 류재훈
Powered by Jekyll
Theme - NexT.Mist