To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
[email protected] low health (38/100) — consider alternatives
Get this data programmatically — free, no authentication.
curl https://depscope.dev/api/check/conda/llmlinguaFirst published · 2024-03-24 16:35:20.976000+00:00
Last updated · 2025-04-22 14:59:01.140000+00:00