All Projects
Open SourceMIT License

CompressX

Compress LLMs. Keep the originals. Hardware-aware quantization for Ollama and any GGUF-compatible tool.

Quick Start

$ npm install -g compressx

$ compressx compress llama3.2

68%

Size Reduction

54%

Speed Improvement

0

Cloud Uploads

MIT

License

Features

What CompressX does

A focused CLI tool that does one thing well — compress LLMs locally with zero friction.

One-Command Install

Install globally with npm and start compressing immediately. No configuration files or setup required.

Hardware-Aware Quantization

Auto-detects your GPU and VRAM to select the optimal compression level for your hardware.

100% Local Processing

All compression happens on your machine. No cloud uploads, no data transfer, no accounts required.

Side-by-Side Benchmarking

Compare original vs compressed models with speed, perplexity, and quality assessments.

Live Progress Tracking

Real-time per-tensor progress bars during compression so you always know the status.

Post-Compression Validation

Automatic sanity checks catch broken quantizations before you use a compressed model.

Multi-Platform Support

Works with Ollama, LM Studio, llama.cpp, Jan, GPT4All, and any GGUF-compatible tool.

Self-Installing Dependencies

Downloads llama.cpp binaries automatically on first run. No manual dependency management.

Example

Real compression results

Model: llama3.2:latest (4B parameters)

Original: 8.10 GB

Compressed: 2.60 GB

Savings: 5.50 GB (68% reduction)

Speed: +54% faster generation

Perplexity: +6.3% (minimal quality impact)

Compatibility

Works with your tools

Ollama
LM Studio
llama.cpp
Jan
GPT4All
Any GGUF tool

FAQ

Frequently Asked Questions

What models does CompressX support?

CompressX works with any model available through Ollama. It supports GGUF-compatible models and can also be used with LM Studio, llama.cpp, Jan, and GPT4All.

Does compression affect model quality?

Quantization involves a tradeoff between size and quality. CompressX provides benchmarking so you can measure the exact impact. Typical results show minimal perplexity increase (around 6%) with significant size savings.

Do I need a GPU?

No. CompressX works on CPU-only machines. However, if a GPU is detected, it will auto-select optimal compression settings for your hardware.

Is my data sent anywhere?

No. All processing happens 100% locally on your machine. CompressX never uploads models, telemetry, or any data to external servers.

What are the system requirements?

Node.js 18 or higher. CompressX automatically downloads the llama.cpp binaries it needs on first run.

Is it really free?

Yes. CompressX is MIT-licensed open source software. Free forever, no accounts, no credits, no rate limits.

Ready to compress your models?

Install CompressX with a single command and start saving disk space today. Free and open source.