MergeKit Config Generator

Build a ready-to-run mergekit YAML config without remembering the schema. Pick a merge method (SLERP, TIES, DARE, linear, task-arithmetic, passthrough), set the base model and the models to merge with their weights and density, and copy the config to run with mergekit-yaml config.yaml ./output. The config is assembled in your browser.

How to use the MergeKit Config Generator

Choose a merge method, set the base model, and list the models to merge one per line. Each line is the model name and optionally a weight and density: org/Model, 0.5, 0.5. The YAML updates live. Save it as config.yaml and run mergekit-yaml config.yaml ./out --cuda.

Weight and density only apply to methods that use them — TIES and DARE use both, linear and task-arithmetic use weight, SLERP uses the single t value instead, and passthrough uses neither. The generator emits only the fields the chosen method needs, so the config stays valid.

The merge methods, briefly

SLERP spherically interpolates between exactly two models and tends to preserve capabilities better than a plain average; t is the blend (0 = base, 1 = the other model). Linear is a straight weighted average of any number of models. TIES trims each model's parameter changes to the largest ones (controlled by density), resolves sign conflicts, then merges — good for combining many finetunes without them cancelling out. DARE randomly drops and rescales deltas before merging (DARE-TIES adds the TIES sign step). Task arithmetic adds task vectors onto the base. Passthrough stacks layers from different models (a 'frankenmerge') to make a larger model.

There is no single best method. SLERP is the safe default for two models; TIES or DARE-TIES when merging several; passthrough when you want to grow depth. Density around 0.5 and weights summing to roughly 1 are common starting points.

Common use cases

  • Combine finetunes — merge a coding finetune with a chat finetune into one model.
  • Recover a base skill — SLERP a specialised model back toward its base to undo over-finetuning.
  • Frankenmerges — passthrough-stack layers to build a larger model from smaller ones.
  • Reproducible experiments — keep the exact YAML alongside the merged weights.

Frequently asked questions

Do all models need the same architecture?

For weight-merging methods (SLERP, TIES, DARE, linear) yes — same architecture and size. Passthrough is more flexible because it stacks layers rather than averaging weights.

What density should I use for TIES/DARE?

Density is the fraction of parameter deltas kept; ~0.5 is a common starting point. Lower density keeps fewer, larger changes per model, which helps when merging many models.

How do I actually run the merge?

Install mergekit (pip install mergekit), save this as config.yaml, and run mergekit-yaml config.yaml ./output --cuda. Then load ./output like any Hugging Face model.

Is anything sent to a server?

No. The YAML is assembled entirely in your browser from your inputs.
Embed this tool on your site

Free to embed, no attribution required (but appreciated). Paste this where you want the tool to appear: