LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models

작성일 2024-12-12 | In paper-review , with-gpt , |

Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs

작성일 2024-12-12 | In paper-review , with-gpt , |

Scaling Instruction-Finetuned Language Models

작성일 2024-12-12 | In paper-review , with-gpt , |

GLM-130B: An Open Bilingual Pre-trained Model

작성일 2024-12-12 | In paper-review , with-gpt , |

Abseil Tip 168 inline 변수

작성일 2024-12-10 | In cpp , abseil , |

James Dennett (jdennett@google.com) 작성
최초 게시일: 2019년 9월 12일
최종 업데이트: 2020년 4월 6일

Abseil Tip 166 복사가 복사가 아닐 때

작성일 2024-12-10 | In cpp , abseil , |

Richard Smith (richardsmith@google.com) 작성
최초 게시일: 2019년 8월 28일
최종 업데이트: 2020년 4월 6일

Abseil Tip 161 좋은 지역 변수와 나쁜 지역 변수

작성일 2024-12-10 | In cpp , abseil , |

James Dennett (jdennett@google.com) 작성
최초 게시일: 2019년 4월 16일
최종 업데이트: 2020년 4월 6일

Abseil Tip 146 기본 초기화와 값 초기화

작성일 2024-12-10 | In cpp , abseil , |

Dominic Hamon (dominic@google.com) 작성
최초 게시일: 2018년 4월 19일
최종 업데이트: 2020년 4월 6일

Abseil Tip 132 Avoid Redundant Map Lookups

작성일 2024-12-10 | In cpp , abseil , |

Matt Kulukundis (kfm@google.com) 작성
최초 게시일: 2017년 3월 30일
최종 업데이트: 2019년 11월 25일

Abseil Tip 108 std::bind를 피하세요

작성일 2024-12-10 | In cpp , abseil , |

Roman Perepelitsa (roman.perepelitsa@gmail.com) 작성
최초 게시일: 2016년 1월 7일
최종 업데이트: 2020년 8월 19일

Benchmarks as Limits to Arbitrage: Understanding the Low-Volatility Anomaly

작성일 2024-12-10 | In paper-review , with-gpt , finance , |

High Idiosyncratic Volatility and Low Returns: International and Further U.S. Evidence

작성일 2024-12-10 | In paper-review , with-gpt , finance , |

CHAI: Clustered Head Attention for Efficient LLM Inference

작성일 2024-12-10 | In paper-review , with-gpt , LLM-Inference , |

QAQ: Quality Adaptive Quantization for LLM KV Cache

작성일 2024-12-10 | In paper-review , with-gpt , LLM-Inference , |

Transformers are Multi-State RNNs

작성일 2024-12-10 | In paper-review , with-gpt , LLM , |

Compressed Context Memory For Online Language Model Interaction

작성일 2024-12-10 | In paper-review , with-gpt , LLM-Inference , |

CacheGen: KV Cache Compression and Streaming for Fast Large Language Model Serving

작성일 2024-12-10 | In paper-review , with-gpt , LLM-Inference , |

LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression

작성일 2024-12-10 | In paper-review , with-gpt , LLM-Inference , |

Galactica: A Large Language Model for Science

작성일 2024-12-10 | In paper-review , with-gpt , LLM , |

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

작성일 2024-12-10 | In paper-review , with-gpt , LLM , |

Abseil Tip 182 정수형 변수를 초기화하세요!

작성일 2024-12-09 | In cpp , abseil , |

주간 팁 #182: 정수형 변수를 초기화하세요!

Abseil Tip 180 Dangling References(유효하지 않은 참조) 피하기

작성일 2024-12-09 | In cpp , abseil , |

주간 팁 #180: Dangling References(유효하지 않은 참조) 피하기

Abseil Tip 158 Abseil 연관 컨테이너와 contains()

작성일 2024-12-09 | In cpp , abseil , |

주간 팁 #158: Abseil 연관 컨테이너와 `contains()`

Abseil Tip 147 Exhaustive switch 문을 책임감 있게 사용하기

작성일 2024-12-09 | In cpp , abseil , |

주간 팁 #147: Exhaustive `switch` 문을 책임감 있게 사용하기

Momentum Strategies

작성일 2024-12-09 | In paper-review , with-gpt , finance , |

Mixed Precision Quantization

작성일 2024-12-09 | In paper-review , with-gpt , LLM-Inference , |

WKVQuant: Quantizing Weight and Key/Value Cache for Large Language Models Gains More

작성일 2024-12-09 | In paper-review , with-gpt , LLM-Inference , |

Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference

작성일 2024-12-09 | In paper-review , with-gpt , LLM-Inference , |

KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache

작성일 2024-12-09 | In paper-review , with-gpt , LLM-Inference , |

KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

작성일 2024-12-09 | In paper-review , with-gpt , LLM-Inference , |

DeepCache: Accelerating Diffusion Models for Free

작성일 2024-12-09 | In paper-review , with-gpt , LLM-Inference , |

Abseil Tip 90 Retired Flags(사용 중단된 플래그)

작성일 2024-12-08 | In cpp , abseil , |

주간 팁 #90: Retired Flags(사용 중단된 플래그)

Abseil Tip 45 플래그를 피하라, 특히 라이브러리 코드에서

작성일 2024-12-08 | In cpp , abseil , |

주간 팁 #45: 플래그를 피하라, 특히 라이브러리 코드에서

Abseil Tip 103 플래그는 전역 변수입니다

작성일 2024-12-08 | In cpp , abseil , |

주간 팁 #103: 플래그는 전역 변수입니다

Improving Language Understanding by Generative Pre-Training

작성일 2024-12-08 | In paper-review , with-gpt , LLM , |

QAQ: Quality Adaptive Quantization for LLM KV Cache

작성일 2024-12-08 | In paper-review , with-gpt , LLM-Inference , |

PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU

작성일 2024-12-08 | In paper-review , with-gpt , LLM-Inference , |

PaLM 2 Technical Report

작성일 2024-12-08 | In paper-review , with-gpt , LLM , |

Abseil Tip 153 using-directives를 사용하지 마세요

작성일 2024-12-06 | In cpp , abseil , |

title: “주간 팁 #153: using-directives를 사용하지 마세요” layout: tips sidenav: side-nav-tips.html published: true permalink: tips/153 type: markdown order: “153” —

Abseil Tip 152 AbslHashValue과 함께

작성일 2024-12-06 | In cpp , abseil , |

title: “주간 팁 #152: AbslHashValue과 함께” layout: tips sidenav: side-nav-tips.html published: true permalink: tips/152 type: markdown order: “152” —

Abseil Tip 144 연관 컨테이너에서의 이종 조회(Heterogeneous Lookup)

작성일 2024-12-06 | In cpp , abseil , |

주간 팁 #144: 연관 컨테이너에서의 이종 조회(Heterogeneous Lookup)

Abseil Tip 136 Unordered Containers

작성일 2024-12-06 | In cpp , abseil , |

주간 팁 #136: Unordered Containers

ALISA: Accelerating Large Language Model Inference via Sparsity-Aware KV Caching

작성일 2024-12-06 | In paper-review , with-gpt , LLM-Inference , |

LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression

작성일 2024-12-06 | In paper-review , with-gpt , LLMLingua-2 , LLM-Inference , |

FASTDECODE: High-Throughput GPU-Efficient LLM Serving using Heterogeneous Pipelines

작성일 2024-12-06 | In paper-review , with-gpt , FASTDECODE , LLM-Inference , |

Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference

작성일 2024-12-06 | In paper-review , with-gpt , Dynamic Memory Compression , LLM-Inference , |

Fast Inference from Transformers via Speculative Decoding

작성일 2024-12-06 | In paper-review , with-gpt , LLM-Infernce , Speculative Decoding , |

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

작성일 2024-12-06 | In paper-review , with-gpt , LLM , |

Abseil Tip 24 복사, 축약

작성일 2024-12-05 | In cpp , abseil , |

title: “이번 주의 팁 #24: 복사, 축약” layout: tips sidenav: side-nav-tips.html published: true permalink: tips/24 type: markdown order: “024” —

Abseil Tip 149 Object Lifetimes vs = delete

작성일 2024-12-05 | In cpp , abseil , |

title: “이번 주의 팁 #149: 객체 수명 vs. = delete” layout: tips sidenav: side-nav-tips.html published: true permalink: tips/149 type: markdown order: “149” —

Abseil Tip 148 Overload Sets

작성일 2024-12-05 | In cpp , abseil , |

원래 TotW #148로 2018년 5월 3일 게시됨

Abseil Tip 117 복사 생략과 값으로 전달하기

작성일 2024-12-05 | In cpp , abseil , |

원래 TotW #117로 2016년 6월 8일 게시됨

HIERARCHICAL CONTEXT MERGING: BETTER LONG CONTEXT UNDERSTANDING FOR PRE-TRAINED LLMS

작성일 2024-12-05 | In paper-review , with-gpt , |

MuxServe:FlexibleSpatial-TemporalMultiplexingforMultipleLLMServing

작성일 2024-12-05 | In paper-review , with-gpt , |

Transformer-Lite: High-efficiency Deployment of Large Language Models on Mobile Phone GPUs

작성일 2024-12-05 | In paper-review , with-gpt , |

ALISA: Accelerating Large Language Model Inference via Sparsity-Aware KV Caching

작성일 2024-12-05 | In paper-review , with-gpt , |

LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression

작성일 2024-12-05 | In paper-review , with-gpt , |

MELTing point: Mobile Evaluation of Language Transformers

작성일 2024-12-05 | In paper-review , with-gpt , |

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

작성일 2024-12-05 | In paper-review , with-gpt , |

Tree of Thoughts: Deliberate Problem Solving with Large Language Models

작성일 2024-12-05 | In paper-review , with-gpt , |

The Flan Collection: Designing Data and Methods for Effective Instruction Tuning

작성일 2024-12-05 | In paper-review , with-gpt , |

Abseil Tip 143 C++11 삭제된 함수 (= delete)

작성일 2024-12-04 | In cpp , abseil , |

주간 팁 #143: C++11 삭제된 함수 (`= delete`)

Abseil Tip 120 반환 값은 건드리지 마세요

작성일 2024-12-04 | In cpp , abseil , |

주간 팁 #120: 반환 값은 건드리지 마세요

Abseil Tip 11 반환 정책

작성일 2024-12-04 | In cpp , abseil , |

주간 팁 #11: 반환 정책

CORM: Cache Optimization with Recent Message for Large Language Model Inference

작성일 2024-12-04 | In paper-review , with-gpt , |

Retrieval Head Mechanistically Explains Long-Context Factuality

작성일 2024-12-04 | In paper-review , with-gpt , |

SnapKV: LLM Knows What You are Looking for Before Generation

작성일 2024-12-04 | In paper-review , with-gpt , |

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

작성일 2024-12-04 | In paper-review , with-gpt , |

SqueezeAttention: 2D Management of KV-Cache in LLM Inference via Layer-wise Optimal Budget

작성일 2024-12-04 | In paper-review , with-gpt , |

Toward Inference-optimal Mixture-of-Expert Large Language Models

작성일 2024-12-04 | In paper-review , with-gpt , |

Mistral 7B

작성일 2024-12-04 | In paper-review , with-gpt , |

Llama 2: Open Foundation and Fine-Tuned Chat Models

작성일 2024-12-04 | In paper-review , with-gpt , |

Abseil Tip 93 absl::Span 사용하기

작성일 2024-12-03 | In cpp , abseil , |

Abseil Tip 61 기본 멤버 초기화 (Default Member Initializers)

작성일 2024-12-03 | In cpp , abseil , |

Abseil Tip 141 bool로의 암시적 변환에 주의하라

작성일 2024-12-03 | In cpp , abseil , |

주간 팁 #141: `bool`로의 암시적 변환에 주의하라

Abseil Tip 134 make_unique와 private 생성자

작성일 2024-12-03 | In cpp , abseil , |

주간 팁 #134: `make_unique`와 `private` 생성자

MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases

작성일 2024-12-03 | In paper-review , with-gpt , |

PowerInfer-2: Fast Large Language Model Inference on a Smartphone

작성일 2024-12-03 | In paper-review , with-gpt , |

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

작성일 2024-12-03 | In paper-review , with-gpt , |

RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation

작성일 2024-12-03 | In paper-review , with-gpt , |

Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration

작성일 2024-12-03 | In paper-review , with-gpt , |

Tree-based Speculative Inference and Verification

작성일 2024-12-03 | In paper-review , with-gpt , |

Fast Inference from Transformers via Speculative Decoding

작성일 2024-12-03 | In paper-review , with-gpt , |

Abseil Tip 88 초기화 방법 =, (), 그리고 {}

작성일 2024-12-02 | In cpp , abseil , |

title: “Tip of the Week #88: 초기화 방법: =, (), 그리고 {}” layout: tips sidenav: side-nav-tips.html published: true permalink: tips/88 type: markdown order: “088” —

Abseil Tip 59 튜플 연결하기

작성일 2024-12-02 | In cpp , abseil , |

title: “Tip of the Week #59: 튜플 연결하기” layout: tips sidenav: side-nav-tips.html published: true permalink: tips/59 type: markdown order: “059” —

Abseil Tip 142 다중 매개변수 생성자와 explicit

작성일 2024-12-02 | In cpp , abseil , |

title: “Tip of the Week #142: 다중 매개변수 생성자와 explicit” layout: tips sidenav: side-nav-tips.html published: true permalink: tips/142 type: markdown order: “142” —

KV Cache Compression

작성일 2024-12-02 | In paper-review , with-gpt , |

PyramidInfer: Pyramid KV Cache Compression for High-throughput LLM Inference

작성일 2024-12-02 | In paper-review , with-gpt , |

Layer-Condensed KV Cache for Efficient Inference of Large Language Models

작성일 2024-12-02 | In paper-review , with-gpt , |

SKVQ:Sliding-window Key and Value Cache Quantization for Large Language Models

작성일 2024-12-02 | In paper-review , with-gpt , |

You Only Cache Once: Decoder-Decoder Architectures for Language Models

작성일 2024-12-02 | In paper-review , with-gpt , |

Abseil Tip 36 새로운 Join API

작성일 2024-11-29 | In cpp , abseil , |

title: “주간 팁 #131: 특별 멤버 함수와 = default” layout: tips sidenav: side-nav-tips.html published: true permalink: tips/131 type: markdown order: “131” —

Abseil Tip 3 문자열 연결과 operator+ vs. StrCat()

작성일 2024-11-29 | In cpp , abseil , |

title: “이번 주의 팁 #3: 문자열 연결과 operator+ vs. StrCat()” layout: tips sidenav: side-nav-tips.html published: true permalink: tips/3 type: markdown order: “003” —

Abseil Tip 10 문자열 분리, 골치 아프지 않게

작성일 2024-11-29 | In cpp , abseil , |

title: “이번 주의 팁 #10: 문자열 분리, 골치 아프지 않게!” layout: tips sidenav: side-nav-tips.html published: true permalink: tips/10 type: markdown order: “010” —

LoCoCo: Dropping In Convolutions for Long Context Compression

작성일 2024-11-29 | In paper-review , with-gpt , |

Loki: Low-rank Keys for Efficient Sparse Attention

작성일 2024-11-29 | In paper-review , with-gpt , |

PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling

작성일 2024-11-29 | In paper-review , with-gpt , |

MiniCache: KV Cache Compression in Depth Dimension for Large Language Models

작성일 2024-11-29 | In paper-review , with-gpt , |

Reducing Transformer Key-Value Cache Size with Cross-Layer Attention

작성일 2024-11-29 | In paper-review , with-gpt , |

Abseil Tip 74 위임 생성자와 상속 생성자

작성일 2024-11-28 | In cpp , abseil , |

원래 2014-04-21에 totw/74로 게시됨
작성자: Bradley White (bww@google.com)