CompressX
Compress LLMs. Keep the originals. Hardware-aware quantization for Ollama and any GGUF-compatible tool.
Quick Start
$ npm install -g compressx
$ compressx compress llama3.2
68%
Size Reduction
54%
Speed Improvement
0
Cloud Uploads
MIT
License
Features
What CompressX does
A focused CLI tool that does one thing well — compress LLMs locally with zero friction.
One-Command Install
Install globally with npm and start compressing immediately. No configuration files or setup required.
Hardware-Aware Quantization
Auto-detects your GPU and VRAM to select the optimal compression level for your hardware.
100% Local Processing
All compression happens on your machine. No cloud uploads, no data transfer, no accounts required.
Side-by-Side Benchmarking
Compare original vs compressed models with speed, perplexity, and quality assessments.
Live Progress Tracking
Real-time per-tensor progress bars during compression so you always know the status.
Post-Compression Validation
Automatic sanity checks catch broken quantizations before you use a compressed model.
Multi-Platform Support
Works with Ollama, LM Studio, llama.cpp, Jan, GPT4All, and any GGUF-compatible tool.
Self-Installing Dependencies
Downloads llama.cpp binaries automatically on first run. No manual dependency management.
Example
Real compression results
Model: llama3.2:latest (4B parameters)
Original: 8.10 GB
Compressed: 2.60 GB
Savings: 5.50 GB (68% reduction)
Speed: +54% faster generation
Perplexity: +6.3% (minimal quality impact)
Compatibility
Works with your tools
FAQ
Frequently Asked Questions
What models does CompressX support?
CompressX works with any model available through Ollama. It supports GGUF-compatible models and can also be used with LM Studio, llama.cpp, Jan, and GPT4All.
Does compression affect model quality?
Quantization involves a tradeoff between size and quality. CompressX provides benchmarking so you can measure the exact impact. Typical results show minimal perplexity increase (around 6%) with significant size savings.
Do I need a GPU?
No. CompressX works on CPU-only machines. However, if a GPU is detected, it will auto-select optimal compression settings for your hardware.
Is my data sent anywhere?
No. All processing happens 100% locally on your machine. CompressX never uploads models, telemetry, or any data to external servers.
What are the system requirements?
Node.js 18 or higher. CompressX automatically downloads the llama.cpp binaries it needs on first run.
Is it really free?
Yes. CompressX is MIT-licensed open source software. Free forever, no accounts, no credits, no rate limits.