VCF Manipulation Toolkit

A comprehensive set of 60+ C/C++ command-line tools for manipulating and analysing Variant Call Format (VCF) files. Each tool does one job well and can be chained together using standard streams.

Documentation Bioconda PyPI version PyPI downloads License: MIT
scroll

Built for Genomics Workflows

Designed following Unix philosophy—each tool does one thing well and can be composed into powerful pipelines.

60 Specialized Tools

Comprehensive toolkit covering filtering, transformation, analysis, quality control, and annotation of genomic variants.

Pipeline Ready

All tools read from stdin and write to stdout, enabling seamless Unix pipe composition for complex workflows.

Fast & Efficient

Optimized C++ implementation designed for processing large genomic datasets with minimal memory footprint.

Cross Platform

Full support for Linux and macOS, including native optimization for Apple Silicon M1/M2 chips.

Python Bindings

Programmatic access with structured data types, type hints, and seamless integration with Python workflows.

Easy Installation

Available via PyPI, Bioconda, Docker, or build from source. Get started in seconds.

60 Tools Across 7 Categories

Category Tools Examples
Data Analysis 12
allele_freq_calc variant_classifier hwe_tester ld_calculator
Data Filtering 11
phred_filter record_filter impact_filter population_filter
Data Transformation 11
multiallelic_splitter format_converter sorter indel_normalizer
Quality Control 6
validator concordance_checker outlier_detector missing_detector
File Management 7
indexer file_splitter merger compressor
Annotation & Reporting 9
custom_annotator info_summarizer field_extractor header_parser
Data Processing 4
missing_data_handler quality_adjuster haplotype_phaser haplotype_extractor

Get Started in Seconds

Choose your preferred installation method.

Recommended for Python users. Includes Python bindings with structured data types.

bash
pip install vcfx

After installing, use the Python API:

python
import vcfx
vcfx.run_tool("variant_classifier", "--help")

Complete toolkit with all dependencies via Bioconda.

bash
conda install -c bioconda vcfx

No compilation needed. Pull and run directly.

bash
# Pull the latest image
docker pull ghcr.io/jorgemfs/vcfx:latest

# Run a tool
docker run --rm ghcr.io/jorgemfs/vcfx:latest VCFX_variant_classifier --help

Build from source with CMake. Requires C++17 compiler.

bash
git clone https://github.com/ieeta-pt/VCFX.git
cd VCFX
mkdir build && cd build
cmake .. -DPYTHON_BINDINGS=ON
make
make install  # Optional: installs to ~/.local/bin

Build Powerful Pipelines

SNP Frequency Analysis Pipeline

Chain tools together to classify variants, filter for SNPs, apply quality thresholds, and calculate allele frequencies.

input.vcf variant_classifier grep SNP phred_filter allele_freq_calc output.tsv
bash
cat input.vcf | \
  VCFX_variant_classifier --append-info | \
  grep 'VCF_CLASS=SNP' | \
  VCFX_phred_filter --phred-filter 30 | \
  VCFX_allele_freq_calc > high_quality_snp_frequencies.tsv

Ready to Get Started?

Explore the full documentation for detailed guides, API references, and more examples.

Cite VCFX

If you use VCFX in your research, please cite:

Silva, J.M., Oliveira, J.L. (2025). "VCFX: A Minimalist, Modular Toolkit for Streamlined Variant Analysis." 12th International Work-Conference on Bioinformatics and Biomedical Engineering (IWBBIO 2025), Springer.

bibtex
@inproceedings{silva2025vcfx,
  title={VCFX: A Minimalist, Modular Toolkit for Streamlined Variant Analysis},
  author={Silva, Jorge Miguel and Oliveira, Jos{\'e} Lu{\'i}s},
  booktitle={12th International Work-Conference on Bioinformatics and
             Biomedical Engineering (IWBBIO 2025)},
  year={2025},
  organization={Springer}
}

Links