19 Jan 23:25

mrwyattii

1c8b8f3

DeepSpeed v0.13.0

New Features

DeepSpeed-FastGen: Introducting Mixtral, Phi-2, and Falcon support with major performance and feature enhancements.

What's Changed

Update version.txt after 0.12.6 release by @mrwyattii in #4850
doc corrections by @goodship1 in #4861
Fix exception handling in get_all_ranks_from_group() function by @HeyangQin in #4862
deepspeed engine: fp16 support validation on init by @nelyahu in #4843
Remove hooks on gradient accumulation on engine/optimizer destroy by @chiragjn in #4858
optimize grad_norm calculation in stage3.py by @mmhab in #4436
Fix f-string messages by @li-plus in #4865
[NPU] Fix npu offload bug by @CurryRice233 in #4883
Partition parameters: Minor refactoring of use_secondary_tensor condition by @deepcharm in #4868
Pipeline: Add support to eval micro bs configuration by @nelyahu in #4859
zero_to_fp32.py: Handle a case where shape doesn't have numel attr by @nelyahu in #4842
Add support of Microsoft Phi-2 model to DeepSpeed-FastGen by @arashb in #4812
Support cpu tensors without direct device invocation by @abhilash1910 in #3842
add sharded loading for safetensors in AutoTP by @sywangyi in #4854
[XPU] XPU accelerator support for Intel GPU device by @delock in #4547
enable starcode((kv_head=1)) autotp by @Yejing-Lai in #4896
Release overlap_comm & contiguous_gradients restrictions for ZeRO 1 by @li-plus in #4887
[NPU]Add ZeRO-Infinity feature for NPU by @misstek in #4809
fix num_kv_heads sharding in uneven autoTP for Falcon-40b by @Yejing-Lai in #4712
Nvme offload checkpoint by @eisene in #4707
Add WarmupCosineLR to Read the Docs by @dwyatte in #4916
Add Habana Labs HPU accelerator support by @deepcharm in #4912
Unit tests for MiCS by @zarzen in #4792
Fix SD workflow to work with latest diffusers version by @lekurile in #4918
[Fix] Fix cpu inference UT failure by @delock in #4430
Add paths to run SD tests by @loadams in #4919
Change PR/schedule triggers for CPU-inference by @loadams in #4924
fix falcon-40b accuracy issue by @Yejing-Lai in #4895
Refactor the positional emebdding config code by @arashb in #4920
Pin to triton 2.1.0 to fix issues with nv-inference by @loadams in #4929
Add support of Qwen models (7b, 14b, 72b) to DeepSpeed-FastGen by @ZonePG in #4913
DeepSpeedZeroOptimizer: refactor bit16 flattening to support more accelerators by @nelyahu in #4833
Fix confusing width in simd_load by @yzhblind in #4714
Specify permissions for secrets.GITHUB_TOKEN by @mrwyattii in #4927
Enable quantizer op on ROCm by @rraminen in #4114
autoTP for Qwen by @inkcherry in #4902
Allow specifying mii branch for nv-a6000 workflow by @mrwyattii in #4936
Only run MII CI for inference changes by @mrwyattii in #4939
InfV2 - remove generation config requirement by @mrwyattii in #4938
Cache HF model list for inference tests by @mrwyattii in #4940
Fix docs inconsistency on default value for ignore_unused_parameters by @loadams in #4949
Fix bug in CI model caching by @mrwyattii in #4951
fix uneven issue & add balance autotp by @Yejing-Lai in #4697
Optimize preprocess for ragged batching by @tohtana in #4942
Fix bug where ZeRO2 never uses the reduce method. by @CurryRice233 in #4946
[docs] Add new autotp supported model in tutorial by @delock in #4960
Add missing op_builder.hpu component for HPU accelerator by @nelyahu in #4963
Stage_1_and_2.py: fix assert for reduce_scatter configurations combinations by @nelyahu in #4964
[MiCS]Add the path to support sequence_data_parallel on MiCS by @ys950902 in #4926
Update the DeepSpeed Phi-2 impl. to work with the HF latest changes by @arashb in #4950
Prevent infinite recursion when DS_ACCELERATOR is set to cuda by @ShukantPal in #4962
Fixes for training models with bf16 + freshly initialized optimizer via load_module_only by @haileyschoelkopf in #4141
params partition for skip_init by @inkcherry in #4722
Enhance query APIs for text generation by @tohtana in #4965
Add API to set a module as a leaf node when recursively setting Z3 hooks by @tohtana in #4966
Fix T5 and mistral model meta data error by @Yejing-Lai in #4958
FastGen Jan 2024 blog by @mrwyattii in #4980

New Contributors

@chiragjn made their first contribution in #4858
@li-plus made their first contribution in #4865
@misstek made their first contribution in #4809
@dwyatte made their first contribution in #4916
@ZonePG made their first contribution in #4913
@yzhblind made their first contribution in #4714
@ShukantPal made their first contribution in #4962
@haileyschoelkopf made their first contribution in #4141

Full Changelog: v0.12.6...v0.13.0

Contributors

arashb, zarzen, and 26 other contributors

Assets 2

21 Dec 00:46

mrwyattii

v0.12.6

c00388a

v0.12.6: Patch release

What's Changed

Update version.txt after 0.12.5 release by @mrwyattii in #4826
Cache metadata for TP activations and grads by @BacharL in #4360
Inference changes for incorporating meta loading checkpoint by @oelayan7 in #4692
Update CODEOWNERS by @mrwyattii in #4838
support baichuan model: by @baodii in #4721
inference engine: check if accelerator supports FP16 by @nelyahu in #4832
Update zeropp.md by @goodship1 in #4835
[NPU] load EXPORT_ENV based on different accelerators to support multi-node training on other devices by @minchao-sun in #4830
Add cuda_accelerator.py to triggers for A6000 test by @mrwyattii in #4848
Capture short kernel sequences to graph by @inkcherry in #4318
Checkpointing: Avoid assigning tensor storage with different device by @deepcharm in #4836
engine.py: remove unused _curr_save_path by @nelyahu in #4844
Mixtral FastGen Support by @cmikeh2 in #4828

New Contributors

@minchao-sun made their first contribution in #4830

Full Changelog: v0.12.5...v0.12.6

Contributors

cmikeh2, goodship1, and 8 other contributors

Assets 2

16 Dec 01:00

mrwyattii

v0.12.5

65b7727

v0.12.5: Patch release

What's Changed

Fix DS Stable Diffusion for latest diffusers version by @lekurile in #4770
Resolve any '..' in the file paths using os.path.abspath() by @rraminen in #4709
Update dockerfile with updated versions by @loadams in #4780
Run workflows when they are edited by @loadams in #4779
BF16_Optimizer: add support for bf16 grad acc by @nelyahu in #4713
fix autoTP issue for mpt (trust_remote_code=True) by @sywangyi in #4787
Fix Hybrid Engine metrics printing by @lekurile in #4789
[BUG] partition_balanced return wrong result. by @zjjMaiMai in #4312
improve the way to determine whether a variable is None by @RUAN-ZX in #4782
[NPU] Add HcclBackend for 1-bit adam, 1-bit lamb, 0/1 adam by @RUAN-ZX in #4733
Fix for stage3 when setting different communication data type by @BacharL in #4540
Add support of Falcon models (7b, 40b, 180b) to DeepSpeed-FastGen by @arashb in #4790
Switch paths-ignore to single quotes, update paths-ignore on nv-pre-compile-ops by @loadams in #4805
fix for tests using torch<2.1 by @mrwyattii in #4818
Universal Checkpoint for Sequence Parallelism by @samadejacobs in #4752
Accelerate CI fix by @mrwyattii in #4819
fix [BUG] 'DeepSpeedGPTInference' object has no attribute 'dtype' for… by @jxysoft in #4814
Update broken link in docs by @mrwyattii in #4822
Update imports from Transformers by @loadams in #4817
Minor updates to CI workflows by @mrwyattii in #4823
fix falcon model load from_config meta_data error by @baodii in #4783
mv DeepSpeedEngine param_names dict init post _configure_distributed_model by @nelyahu in #4803
Refactor launcher user arg parsing by @mrwyattii in #4824
Fix 4649 by @Alienfeel in #4650

New Contributors

@zjjMaiMai made their first contribution in #4312
@jxysoft made their first contribution in #4814
@baodii made their first contribution in #4783
@Alienfeel made their first contribution in #4650

Full Changelog: v0.12.4...v0.12.5

Contributors

arashb, jxysoft, and 12 other contributors

Assets 2

01 Dec 19:32

mrwyattii

v0.12.4

7122362

v0.12.4: Patch release

What's Changed

Update version.txt after 0.12.3 release by @mrwyattii in #4673
[MII] catch error wrt HF version and Mistral by @jeffra in #4634
[NPU] Add NPU support for unit test by @RUAN-ZX in #4569
[op-builder] use unique exceptions for cuda issues by @jeffra in #4653
Add stable diffusion unit test by @mrwyattii in #2496
[CANN] Support cpu offload optimizer for Ascend NPU by @hipudding in #4568
Inference Checkpoints in V2 by @cmikeh2 in #4664
KV Cache Improved Flexibility by @cmikeh2 in #4668
Fix for when prompt contains an odd num of apostrophes by @oelayan7 in #4660
universal-ckp: support megatron-deepspeed llama model by @mosheisland in #4666
Add new MII unit tests by @mrwyattii in #4693
[Bug fix] WarmupCosineLR issues by @sbwww in #4688
infV2 fix for OPT size variants by @mrwyattii in #4694
Add get and set APIs for the ZeRO-3 partitioned parameters by @yiliu30 in #4681
Remove unneeded dict reinit (fix for #4565) by @eisene in #4702
Update flops profiler to recurse by @loadams in #4374
Communication Optimization for Large-Scale Training by @RezaYazdaniAminabadi in #4695
[docs] Intel inference blog by @jeffra in #4734
use all_gather_into_tensor instead of all_gather by @taozhiwei in #4705
Install deepspeed-kernels only on Linux by @aphedges in #4739
Add nv-sd badge to README by @loadams in #4747
Re-organize .gitignore file to be parsed properly by @aphedges in #4740
fix mics run with offload++ by @GuanhuaWang in #4749
Fix logger formatting for partitioning flags by @OAfzal in #4728
fix: to solve #4726 by @RUAN-ZX in #4727
Add safetensors support by @jihnenglin in #4659

New Contributors

@RUAN-ZX made their first contribution in #4569
@oelayan7 made their first contribution in #4660
@sbwww made their first contribution in #4688
@yiliu30 made their first contribution in #4681
@eisene made their first contribution in #4702
@taozhiwei made their first contribution in #4705
@OAfzal made their first contribution in #4728
@jihnenglin made their first contribution in #4659

Full Changelog: v0.12.3...v0.12.4

Contributors

jeffra, cmikeh2, and 15 other contributors

Assets 2

13 Nov 17:51

lekurile

v0.12.3

6ea44d0

v0.12.3: Patch release

New Bug Fixes

Stable Diffusion now supported with latest Torch, diffusers, and Triton versions.

What's Changed

Update version.txt after 0.12.2 release by @mrwyattii in #4617
Fix figure in FlexGen blog by @tohtana in #4624
Fix figure of llama2 13B in DS-FlexGen blog by @tohtana in #4625
Fix config format by @xu-song in #4594
Guanhua/partial offload rebase v2 (#590) by @GuanhuaWang in #4636
offload++ blog (#623) by @GuanhuaWang in #4637
Update README in offloadpp blog by @GuanhuaWang in #4641
[docs] update news items by @jeffra in #4640
DeepSpeed-FastGen Chinese Blog by @HeyangQin in #4642
Fix issues with torch cpu builds by @loadams in #4639
Isolate src code and testing for DeepSpeed-FastGen by @cmikeh2 in #4610
Add Japanese blog for DeepSpeed-FastGen by @tohtana in #4651
Fix for MII unit tests by @mrwyattii in #4652
Enhance the robustness of module_state_dict by @LZHgrla in #4587
Enable ZeRO3 allgather for multiple dtypes by @tohtana in #4647
add option to disable pipeline partitioning by @nelyahu in #4322
Added HIP_PLATFORM_AMD=1 for non JIT build by @rraminen in #4585
Fix rope_theta arg for diffusers_attention by @lekurile in #4656
tl.dot(a,b, trans_b=True) is not supported by triton2.0+ , updating this api by @bmedishe in #4541
Update ds-chat workflow to work w/ deepspeed-chat install by @lekurile in #4598
Diffusers attention script update triton2.1 by @bmedishe in #4573
Fix the openfold training. by @cctry in #4657
Universal ckp fixes by @mosheisland in #4588
Update .gitignore [Adding comments , Improved documentation] by @Nadav23AnT in #4631
Update lr_schedules.py by @CoinCheung in #4563
Fix UNET and VAE implementations for new diffusers version by @lekurile in #4663
fix num_kv_heads sharding in autoTP for the new in-repo Falcon-40B by @dc3671 in #4654

New Contributors

@xu-song made their first contribution in #4594
@LZHgrla made their first contribution in #4587
@mosheisland made their first contribution in #4588
@Nadav23AnT made their first contribution in #4631
@CoinCheung made their first contribution in #4563

Full Changelog: v0.12.2...v0.12.3

Contributors

jeffra, cmikeh2, and 16 other contributors

Assets 2

04 Nov 04:08

jeffra

v0.12.2

4f7dd72

v0.12.2

What's Changed

Quick bug fix direct to master to ensure mismatched cuda environments are shown to the user 4f7dd72
Update version.txt after 0.12.1 release by @mrwyattii in #4615

Full Changelog: v0.12.1...v0.12.2

Contributors

mrwyattii

Assets 2

04 Nov 03:24

jeffra

v0.12.1

3437a5b

v0.12.1: Patch release

What's Changed

Update version.txt after 0.12.0 release by @mrwyattii in #4611
Add number for latency comparison by @tohtana in #4612
Update minor CUDA version compatibility. by @cmikeh2 in #4613

Full Changelog: v0.12.0...v0.12.1

Contributors

cmikeh2, mrwyattii, and tohtana

Assets 2

03 Nov 22:36

jeffra

v0.12.0

09834bb

DeepSpeed v0.12.0

New features

DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference

What's Changed

Update version.txt after 0.11.2 release by @mrwyattii in #4609
Pin transformers in nv-inference by @loadams in #4606
DeepSpeed-FastGen by @cmikeh2 in #4604
DeepSpeed-FastGen blog by @jeffra in #4607

Full Changelog: v0.11.2...v0.12.0

Contributors

jeffra, cmikeh2, and 2 other contributors

Assets 2

03 Nov 20:29

jeffra

v0.11.2

f060407

v0.11.2: Patch release

What's Changed

Update version.txt after 0.11.1 release by @mrwyattii in #4484
Update DS_BUILD_* references. by @loadams in #4485
Introduce pydantic_v1 compatibility module for pydantic>=2.0.0 support by @ringohoffman in #4407
Enable control over timeout with environment variable by @BramVanroy in #4405
Update ROCm verison by @loadams in #4486
adding 8bit dequantization kernel for asym fine-grained block quantization in zero-inference by @stephen-youn in #4450
Fix scale factor on flops profiler by @loadams in #4500
add DeepSpeed4Science white paper by @conglongli in #4502
[CCLBackend] update API by @Liangliang-Ma in #4378
Ulysses: add col-ai evaluation by @samadejacobs in #4517
Ulysses: Update README.md by @samadejacobs in #4518
add available memory check to accelerators by @jeffra in #4508
clear redundant parameters in zero3 bwd hook by @inkcherry in #4520
Add NPU FusedAdam support by @CurryRice233 in #4343
fix error type issue in deepspeed/comm/ccl.py by @Liangliang-Ma in #4521
Fixed deepspeed.comm.monitored_barrier call by @Quentin-Anthony in #4496
[Bug fix] Add rope_theta for llama config by @cupertank in #4480
[ROCm] Add rocblas header by @rraminen in #4538
[docs] ZeRO infinity slides and blog by @jeffra in #4542
Switch from HIP_PLATFORM_HCC to HIP_PLATFORM_AMD by @loadams in #4539
Turn off I_MPI_PIN for impi launcher by @delock in #4531
[docs] paper updates by @jeffra in #4543
ROCm 6.0 prep changes by @loadams in #4537
Fix RTD builds by @mrwyattii in #4558
pipe engine _aggregate_total_loss: more efficient loss concatenation by @nelyahu in #4327
Add missing rocblas include by @loadams in #4557
Enable universal checkpoint for zero stage 1 by @tjruwase in #4516
[AutoTP] Make AutoTP work when num_heads not divisible by number of workers by @delock in #4011
Fix the sequence-parallelism for the dense model architecture by @RezaYazdaniAminabadi in #4530
engine.py - save_checkpoint: only rank-0 should create the save dir by @nelyahu in #4536
Remove PP Grad Tail Check by @Quentin-Anthony in #2538
Added HIP_PLATFORM_AMD=1 by @rraminen in #4570
fix multiple definition while building evoformer by @fecet in #4556
Don't check overflow for bf16 data type by @hablb in #4512
Public update by @yaozhewei in #4583
[docs] paper updates by @jeffra in #4584
Disable CPU inference on PRs by @loadams in #4590

New Contributors

@ringohoffman made their first contribution in #4407
@BramVanroy made their first contribution in #4405
@cupertank made their first contribution in #4480

Full Changelog: v0.11.1...v0.11.2

Contributors

jeffra, BramVanroy, and 19 other contributors

Assets 2

09 Oct 16:54

mrwyattii

v0.11.1

e9503fe

v0.11.1: Patch release

What's Changed

Fix bug in bfloat16 optimizer related to checkpointing by @okoge-kaz in #4434
Move tensors to device if mp is not enabled by @deepcharm in #4461
Fix torch import causing release build failure by @mrwyattii in #4468
add lm_head and embed_out tensor parallel by @Yejing-Lai in #3962
Fix release workflow by @mrwyattii in #4483

New Contributors

@okoge-kaz made their first contribution in #4434
@deepcharm made their first contribution in #4461

Full Changelog: v0.11.0...v0.11.1

Contributors

mrwyattii, Yejing-Lai, and 2 other contributors

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New Features

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

New Bug Fixes

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

Contributors

New features

What's Changed

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

Releases: microsoft/DeepSpeed

DeepSpeed v0.13.0

New Features

What's Changed

New Contributors

Contributors

v0.12.6: Patch release

What's Changed

New Contributors

Contributors

v0.12.5: Patch release

What's Changed

New Contributors

Contributors

v0.12.4: Patch release

What's Changed

New Contributors

Contributors

v0.12.3: Patch release

New Bug Fixes

What's Changed

New Contributors

Contributors

v0.12.2

What's Changed

Contributors

v0.12.1: Patch release

What's Changed

Contributors

DeepSpeed v0.12.0

New features

What's Changed

Contributors

v0.11.2: Patch release

What's Changed

New Contributors

Contributors

v0.11.1: Patch release

What's Changed

New Contributors

Contributors