BitNet is Microsoft’s open-source, ultra-low-bit large model inference framework optimized for CPU-based local inference and extreme compression (1-bit/1.58-bit quantization), delivering efficient and low-power execution for models like BitNet, Llama3-8B-1.58, and Falcon3 without GPU requirements . Released under the MIT license, it features both C++ and Python interfaces, backed by an active community and ongoing updates, making it ideal for embedded, mobile, and edge AI deployments.
Official website: https://bitnet-demo.azurewebsites.net/
Source code: https://github.com/microsoft/BitNet?tab=readme-ov-file
Model Details
- Architecture: Transformer-based, modified with
BitLinear
layers (BitNet framework).- Uses Rotary Position Embeddings (RoPE).
- Uses squared ReLU (ReLU²) activation in FFN layers.
- Employs
subln
normalization. - No bias terms in linear or normalization layers.
- Quantization: Native 1.58-bit weights and 8-bit activations (W1.58A8).
- Weights are quantized to ternary values {-1, 0, +1} using absmean quantization during the forward pass.
- Activations are quantized to 8-bit integers using absmax quantization (per-token).
- Crucially, the model was trained from scratch with this quantization scheme, not post-training quantized.
- Parameters: ~2 Billion
- Training Tokens: 4 Trillion
- Context Length: Maximum sequence length of 4096 tokens.
- Recommendation: For optimal performance on tasks requiring very long contexts (beyond the pre-training length or for specialized long-reasoning tasks), we recommend performing intermediate long-sequence adaptation/training before the final fine-tuning stage.
- Training Stages:
- Pre-training: Large-scale training on public text/code and synthetic math data using a two-stage learning rate and weight decay schedule.
- Supervised Fine-tuning (SFT): Fine-tuned on instruction-following and conversational datasets using sum loss aggregation and specific hyperparameter tuning.
- Direct Preference Optimization (DPO): Aligned with human preferences using preference pairs.
- Tokenizer: LLaMA 3 Tokenizer (vocab size: 128,256).
Libre Depot original article,Publisher:Libre Depot,Please indicate the source when reprinting:https://www.libredepot.top/5525.html