Abseil Tip 42 초기화 메서드보다 팩토리 함수를 선호하세요

작성일 2024-11-28 | In cpp , abseil , |

title: “주간 팁 #42: 초기화 메서드보다 팩토리 함수를 선호하세요” layout: tips sidenav: side-nav-tips.html published: true permalink: tips/42 type: markdown order: “042” —

Abseil Tip 131 Special 멤버 함수와 = default

작성일 2024-11-28 | In cpp , abseil , |

title: “주간 팁 #131: 특별 멤버 함수와 = default” layout: tips sidenav: side-nav-tips.html published: true permalink: tips/131 type: markdown order: “131” —

Attention Score is not All You Need for Token Importance Indicator in KV Cache Reduction: Value Also Matters

작성일 2024-11-28 | In paper-review , with-gpt , |

CItruS : ChunkedInstruction-aware State Eviction for Long Sequence Modeling

작성일 2024-11-28 | In paper-review , with-gpt , |

ASimple and Effective L2 Norm-Based Strategy for KV Cache Compression

작성일 2024-11-28 | In paper-review , with-gpt , |

MLKV:Multi-Layer Key-Value Heads for Memory Efficient Transformer Decoding

작성일 2024-11-28 | In paper-review , with-gpt , |

Effectively Compress KV Heads for LLM

작성일 2024-11-28 | In paper-review , with-gpt , |

MODEL TELLS YOU WHERE TO MERGE: ADAPTIVE KV CACHE MERGING FOR LLMS ON LONG-CONTEXT TASKS

작성일 2024-11-27 | In paper-review , with-gpt , |

Efficient Sparse Attention needs Adaptive Token Release

작성일 2024-11-27 | In paper-review , with-gpt , |

Benchmark of Long Context Capable Approaches

작성일 2024-11-27 | In paper-review , with-gpt , |

LOOK-M:Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference

작성일 2024-11-27 | In paper-review , with-gpt , |

Dynamic Discriminative Operations (D2O) for Efficient Generative Inference of Large Language Models

작성일 2024-11-27 | In paper-review , with-gpt , |

Abseil Tip 130 네임스페이스 이름 지정

작성일 2024-11-26 | In cpp , abseil , |

주간 팁 #130: 네임스페이스 이름 지정

Abseil Tip 123 absl::optional과 std::unique_ptr

작성일 2024-11-26 | In cpp , abseil , |

주간 팁 #123: `absl::optional`과 `std::unique_ptr`

Abseil Tip 119 using 선언과 네임스페이스 별칭 사용하기

작성일 2024-11-26 | In cpp , abseil , |

주간 팁 #119: `using` 선언과 네임스페이스 별칭 사용하기

Pruning in Transformer Decoder

작성일 2024-11-26 | In paper-review , with-gpt , |

Keep the Cost Down: A Review on Methods to Optimize LLM’s KV Cache Consumption.

작성일 2024-11-26 | In paper-review , with-gpt , |

LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference

작성일 2024-11-26 | In paper-review , with-gpt , |

PQCache: Product Quantization-based KVCache for Long Context LLM Inference

작성일 2024-11-26 | In paper-review , with-gpt , |

Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference

작성일 2024-11-26 | In paper-review , with-gpt , |

Abseil Tip 99 비멤버 인터페이스 에티켓

작성일 2024-11-25 | In cpp , abseil , |

한글 번역

Abseil Tip 126 make_unique는 새로운 new입니다

작성일 2024-11-25 | In cpp , abseil , |

한글 번역

Abseil Tip 109 함수 선언에서 의미 있는 const 사용

작성일 2024-11-25 | In cpp , abseil , |

한글 번역

RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval

작성일 2024-11-25 | In paper-review , with-gpt , |

Post-Training Sparse Attention with Double Sparsity

작성일 2024-11-25 | In paper-review , with-gpt , |

NACL: AGeneral and Effective KV Cache Eviction Framework for LLMs at Inference Time

작성일 2024-11-25 | In paper-review , with-gpt , |

Palu: Compressing KV-Cache with Low-Rank Projection

작성일 2024-11-25 | In paper-review , with-gpt , |

ThinK: Thinner Key Cache by Query-Driven Pruning

작성일 2024-11-25 | In paper-review , with-gpt , |

Locret: Enhancing Eviction in Long-Context LLM Inference with Trained Retaining Heads

작성일 2024-11-21 | In paper-review , with-gpt , |

InfiniPot: Infinite Context Processing on Memory-Constrained LLMs

작성일 2024-11-21 | In paper-review , with-gpt , |

KV-COMPRESS: Paged KV-Cache Compression with Variable Compression Rates per Attention Head

작성일 2024-11-21 | In paper-review , with-gpt , |

Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction

작성일 2024-11-21 | In paper-review , with-gpt , |

TACO-RL: Task Aware Prompt Compression Optimization with Reinforcement Learning

작성일 2024-11-21 | In paper-review , with-gpt , |

Abseil Tip 65 제자리에 넣기

작성일 2024-11-20 | In cpp , abseil , |

한글 번역

Abseil Tip 49 인자 기반 탐색

작성일 2024-11-20 | In cpp , abseil , |

한글 번역

Abseil Tip 112 emplace vs. push_back

작성일 2024-11-20 | In cpp , abseil , |

한글 번역

title: “Tip of the Week #112: emplace vs. push_back”
layout: tips
sidenav: side-nav-tips.html
published: true
permalink: tips/112
type: markdown
order: “112”
—

DUOATTENTION: EFFICIENT LONG-CONTEXT LLM INFERENCE WITH RETRIEVAL AND STREAMING HEADS

작성일 2024-11-20 | In paper-review , with-gpt , |

TorchTitan: One-stop PyTorch native solution for production ready LLM pre-training

작성일 2024-11-20 | In paper-review , with-gpt , |

TIDALDECODE: FAST AND ACCURATE LLM DECOD ING WITH POSITION PERSISTENT SPARSE ATTENTION

작성일 2024-11-20 | In paper-review , with-gpt , |

SPARSEVLM: VISUAL TOKEN SPARSIFICATION FOR EFFICIENT VISION-LANGUAGE MODEL INFERENCE

작성일 2024-11-20 | In paper-review , with-gpt , |

SWIFTKV: FAST PREFILL-OPTIMIZED INFERENCE WITH KNOWLEDGE-PRESERVING MODEL TRANSFORMATION

작성일 2024-11-20 | In paper-review , with-gpt , |

LoRC: Low-Rank Compression for LLMs KV Cache with a Progressive Compression Strategy

작성일 2024-11-20 | In paper-review , with-gpt , |

A Systematic Study of Cross-Layer KV Sharing for Efficient LLM Inference

작성일 2024-11-19 | In paper-review , with-gpt , |

SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction

작성일 2024-11-19 | In paper-review , with-gpt , |

In-context KV-Cache Eviction for LLMs via Attention-Gate

작성일 2024-11-19 | In paper-review , with-gpt , |

Prompt Compression for Large Language Models: A Survey

작성일 2024-11-19 | In paper-review , with-gpt , |

Textbooks Are All You Need

작성일 2024-11-19 | In paper-review , with-gpt , |

Scaling Laws for Neural Language Models

작성일 2024-11-19 | In paper-review , with-gpt , |

Abseil Tip 135 계약을 테스트하라, 구현을 테스트하지 마라

작성일 2024-11-18 | In cpp , abseil , |

주간 팁 #135: 계약을 테스트하라, 구현을 테스트하지 마라

Abseil Tip 107 참조 수명 연장

작성일 2024-11-18 | In cpp , abseil , |

아래는 “이번 주의 팁 #107: 참조 수명 확장”에 대한 한글 번역입니다:

Abseil Tip 101 반환 값, 참조 및 수명

작성일 2024-11-18 | In cpp , abseil , |

주간 팁 #101: 반환 값, 참조 및 수명

Squeezed Attention: Accelerating Long Context Length LLM Inference

작성일 2024-11-18 | In paper-review , with-gpt , |

Recycled Attention: Efficient inference for long-context language models

작성일 2024-11-18 | In paper-review , with-gpt , |

TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection

작성일 2024-11-18 | In paper-review , with-gpt , |

VL-Cache: Sparsity and Modality-Aware KV Cache Compression for Vision-Language Model Inference Acceleration

작성일 2024-11-18 | In paper-review , with-gpt , |

MagicPIG: LSH Sampling for Efficient LLM Generation

작성일 2024-11-18 | In paper-review , with-gpt , |

Abseil Tip 86 클래스(enum class)를 활용한 열거형

작성일 2024-11-14 | In cpp , abseil , |

title: “이번 주의 팁 #86: 클래스(enum class)를 활용한 열거형” layout: tips sidenav: side-nav-tips.html published: true permalink: tips/86 type: markdown order: “086” —

Abseil Tip 77 임시 객체, 이동, 복사

작성일 2024-11-14 | In cpp , abseil , |

title: “이번 주의 팁 #77: 임시 객체, 이동, 복사” layout: tips sidenav: side-nav-tips.html published: true permalink: tips/77 type: markdown order: “077” —

Abseil Tip 64 Raw 문자열 리터럴

작성일 2024-11-14 | In cpp , abseil , |

title: “이번 주의 팁 #64: Raw 문자열 리터럴” layout: tips sidenav: side-nav-tips.html published: true permalink: tips/64 type: markdown order: “064” —

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

작성일 2024-11-14 | In paper-review , with-gpt , |

논문 : https://arxiv.org/abs/2201.11903

Learning Transferable Visual Models From Natural Language Supervision

작성일 2024-11-14 | In paper-review , with-gpt , |

논문 : https://arxiv.org/abs/2103.00020

HART Efficient Visual Generation with Hybrid Autoregressive Transformer

작성일 2024-11-14 | In paper-review , with-gpt , |

논문 : https://arxiv.org/abs/2410.10812

Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models

작성일 2024-11-14 | In paper-review , with-gpt , |

논문 : https://arxiv.org/abs/2410.10733v2

The CoT Collection Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning

작성일 2024-11-14 | In paper-review , with-gpt , |

논문 : https://arxiv.org/abs/2305.14045

Abseil Tip 55 이름 개수 세기와 unique_ptr

작성일 2024-11-13 | In cpp , abseil , |

title: “이번 주의 팁 #55: 이름 개수 세기와 unique_ptr” layout: tips sidenav: side-nav-tips.html published: true permalink: tips/55 type: markdown order: “055” —

Abseil Tip 122 테스트 픽스처, 명확성, 그리고 데이터 흐름

작성일 2024-11-13 | In cpp , abseil , |

title: “이번 주의 팁 #122: 테스트 픽스처, 명확성, 그리고 데이터 흐름” layout: tips sidenav: side-nav-tips.html published: true permalink: tips/122 type: markdown order: “122” —

VILA-U a Unified Foundation Model Integrating Visual Understanding and Generation

작성일 2024-11-13 | In paper-review , with-gpt , |

논문 : https://arxiv.org/abs/2409.04429

Condition-Aware Neural Network for Controlled Image Generation

작성일 2024-11-13 | In paper-review , with-gpt , |

논문 : https://arxiv.org/abs/2404.01143

DistriFusion Distributed Parallel Inference for High-Resolution Diffusion Models

작성일 2024-11-13 | In paper-review , with-gpt , |

논문 : https://arxiv.org/abs/2402.19481

VILA On Pre-training for Visual Language Models

작성일 2024-11-13 | In paper-review , with-gpt , |

논문 : https://arxiv.org/abs/2312.07533

FastComposer Tuning-Free Multi-Subject Image Generation with Localized Attention

작성일 2024-11-13 | In paper-review , with-gpt , |

논문 : https://arxiv.org/abs/2305.10431

Abseil Tip 1 string_view의 활용 방법과 이점

작성일 2024-11-12 | In cpp , abseil , |

Abseil Tip #1: `string_view`의 활용 방법과 이점

ShadowKV KV Cache in Shadows for High-Throughput Long-Context LLM Inference

작성일 2024-11-12 | In paper-review , with-gpt , |

논문 : https://arxiv.org/abs/2410.21465

Query-Efficient Correlation Clustering with Noisy Oracle

작성일 2024-11-12 | In paper-review , with-gpt , |

논문 : https://arxiv.org/abs/2402.01400

LiteMoE Customizing On-device LLM Serving via Proxy Submodel Tuning

작성일 2024-11-12 | In paper-review , with-gpt , |

논문 : https://dl.acm.org/doi/10.1145/3666025.3699355

LaRS Latent Reasoning Skills for Chain-of-Thought Reasoning

작성일 2024-11-12 | In paper-review , with-gpt , |

논문 : https://aclanthology.org/2024.findings-emnlp.206/

Batch Calibration Rethinking Calibration for In-Context Learning and Prompt Engineering

작성일 2024-11-12 | In paper-review , with-gpt , |

논문 : https://arxiv.org/abs/2309.17249

Scientific Beta Multi-Beta Multi-Strategy Indices Implementing Multi-Factor Equity Portfolios with Smart Factor Indices

작성일 2024-11-11 | In paper-review , with-gpt , finance , |

논문 : https://conferences.pionline.com/uploads/conference_admin/ERI_Scientific_Beta_Publication_Scientific_Beta_Multi-Beta_Multi-Strategy_Indices_Equity_Portfolios.pdf

Foundations of Factor Investing

작성일 2024-11-11 | In paper-review , with-gpt , finance , |

논문 : https://www.msci.com/documents/1296102/1336482/Foundations_of_Factor_Investing.pdf

RAG4ITOps A Supervised Fine-Tunable and Comprehensive RAG Framework for IT Operations and Maintenance

작성일 2024-11-11 | In paper-review , with-gpt , |

논문 : https://arxiv.org/abs/2410.15805v1

MagicPIG LSH Sampling for Efficient LLM Generation

작성일 2024-11-11 | In paper-review , with-gpt , |

논문 : https://arxiv.org/abs/2410.16179

EPIC Efficient Position-Independent Context Caching for Serving Large Language Models

작성일 2024-11-11 | In paper-review , with-gpt , |

논문 : https://arxiv.org/abs/2410.15332

ELICIT LLM Augmentation via External In-Context Capability

작성일 2024-11-11 | In paper-review , with-gpt , |

논문 : https://arxiv.org/abs/2410.09343

COMET Towards Partical W4A4KV4 LLMs Serving

작성일 2024-11-11 | In paper-review , with-gpt , |

논문 : https://arxiv.org/abs/2410.12168

The Cross-Section of Expected Stock Returns

작성일 2024-11-10 | In paper-review , with-gpt , finance , |

논문 : https://www.jstor.org/stable/2329112

Portfolio Selection

작성일 2024-11-10 | In paper-review , with-gpt , finance , |

논문 : https://www.jstor.org/stable/2975974

Capital asset prices A theory of market equilibrium under conditions of risk

작성일 2024-11-10 | In paper-review , with-gpt , finance , |

논문 : https://www.jstor.org/stable/2977928

MInference 1.0 Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention

작성일 2024-11-10 | In paper-review , with-gpt , |

논문 : https://arxiv.org/abs/2407.02490

HYSYNTH Context-Free LLM Approximation for Guiding Program Synthesis

작성일 2024-11-10 | In paper-review , with-gpt , |

논문 : https://arxiv.org/abs/2405.15880v2

DynamoLLM Designing LLM Inference Clusters for Performance and Energy Efficiency

작성일 2024-11-10 | In paper-review , with-gpt , |

논문 : https://arxiv.org/abs/2408.00741

Can Graph Learning Improve Planning in LLM-based Agents?

작성일 2024-11-10 | In paper-review , with-gpt , |

논문 : https://arxiv.org/abs/2405.19119

ALPINE Unveiling the Planning Capability of Autoregressive Learning in Language Models

작성일 2024-11-10 | In paper-review , with-gpt , |

논문 : https://arxiv.org/abs/2405.09220

Transformers are Multi-State RNNs

작성일 2024-11-07 | In paper-review , with-gpt , |

논문 : https://arxiv.org/abs/2401.06104

Meta Large Language Model Compiler Foundation Models of Compiler Optimization

작성일 2024-11-07 | In paper-review , with-gpt , |

논문 : https://arxiv.org/abs/2407.02524

Enabling Tensor Language Model to Assist in Generating High-Performance Tensor Programs for Deep Learning

작성일 2024-11-07 | In paper-review , with-gpt , |

논문 : https://www.usenix.org/system/files/osdi24-zhai.pdf

Efficient Streaming Language Models with Attention Sinks

작성일 2024-11-07 | In paper-review , with-gpt , |

논문 : https://arxiv.org/abs/2309.17453

Model Tells You What to Discard Adaptive KV Cache Compression for LLMs

작성일 2024-11-07 | In paper-review , with-gpt , |

논문 : https://arxiv.org/abs/2310.01801

BUZZ Beehive-structured Sparse KV Cache with Segmented Heavy Hitters for Efficient LLM Inference

작성일 2024-11-06 | In paper-review , with-gpt , |

논문 : https://arxiv.org/abs/2410.23079

KVSharer Efficient Inference via Layer-Wise Dissimilar KV Cache Sharing

작성일 2024-11-06 | In paper-review , with-gpt , |

논문 : https://arxiv.org/abs/2410.18517

FLUX Fast Software-based Communication Overlap On GPUs Through Kernel Fusion

작성일 2024-11-06 | In paper-review , with-gpt , |

논문 : https://arxiv.org/abs/2406.06858