Efficient and effective sparse LSTM on FPGA with bank-balanced sparsity S Cao, C Zhang, Z Yao, W Xiao, L Nie, D Zhan, Y Liu, M Wu, L Zhang Proceedings of the 2019 ACM/SIGDA International Symposium on Field …, 2019 | 190 | 2019 |
Balanced sparsity for efficient dnn inference on gpu Z Yao, S Cao, W Xiao, C Zhang, L Nie Proceedings of the AAAI conference on artificial intelligence 33 (01), 5676-5683, 2019 | 118 | 2019 |
Seernet: Predicting convolutional neural network feature-map sparsity through low-bit quantization S Cao, L Ma, W Xiao, C Zhang, Y Liu, L Zhang, L Nie, Z Yang Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2019 | 80 | 2019 |
Dense-to-sparse gate for mixture-of-experts X Nie, S Cao, X Miao, L Ma, J Xue, Y Miao, Z Yang, Z Yang, CUI Bin | 22 | 2021 |
Evomoe: An evolutional mixture-of-experts training framework via dense-to-sparse gate X Nie, X Miao, S Cao, L Ma, Q Liu, J Xue, Y Miao, Y Liu, Z Yang, B Cui arXiv preprint arXiv:2112.14397, 2021 | 15 | 2021 |
Integer or floating point? new outlooks for low-bit quantization on large language models Y Zhang, L Zhao, S Cao, W Wang, T Cao, F Yang, M Yang, S Zhang, N Xu arXiv preprint arXiv:2305.12356, 2023 | 9 | 2023 |
Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference R Hwang, J Wei, S Cao, C Hwang, X Tang, T Cao, M Yang, M Rhu arXiv preprint arXiv:2308.12066, 2023 | 3 | 2023 |
NN-Stretch: Automatic Neural Network Branching for Parallel Inference on Heterogeneous Multi-Processors J Wei, T Cao, S Cao, S Jiang, S Fu, M Yang, Y Zhang, Y Liu Proceedings of the 21st Annual International Conference on Mobile Systems …, 2023 | 2 | 2023 |
Efficient gpu kernels for n: m-sparse weights in deep learning B Lin, N Zheng, L Wang, S Cao, L Ma, Q Zhang, Y Zhu, T Cao, J Xue, ... Proceedings of Machine Learning and Systems 5, 2023 | 2 | 2023 |
BitDistiller: Unleashing the Potential of Sub-4-Bit LLMs via Self-Distillation D Du, Y Zhang, S Cao, J Guo, T Cao, X Chu, N Xu arXiv preprint arXiv:2402.10631, 2024 | 1 | 2024 |
AFPQ: Asymmetric Floating Point Quantization for LLMs Y Zhang, S Zhang, S Cao, D Du, J Wei, T Cao, N Xu arXiv preprint arXiv:2311.01792, 2023 | 1 | 2023 |
Adam accumulation to reduce memory footprints of both activations and gradients for large-scale dnn training Y Zhang, Y Han, S Cao, G Dai, Y Miao, T Cao, F Yang, N Xu arXiv preprint arXiv:2305.19982, 2023 | 1 | 2023 |
Accurate and structured pruning for efficient automatic speech recognition H Jiang, LL Zhang, Y Li, Y Wu, S Cao, T Cao, Y Yang, J Li, M Yang, L Qiu arXiv preprint arXiv:2305.19549, 2023 | 1 | 2023 |
FlexSaaS: A Reconfigurable Accelerator for Web Search Selection S Cao, L Nie, D Zhan, W Wang, N Xu, R Das, M Wu, L Zhang, D Chiou ACM Transactions on Reconfigurable Technology and Systems (TRETS) 12 (1), 1-20, 2019 | | 2019 |
The Case for Learning Machine Language G Liu, CJM Liang, S Cao, S Lu, L van Doorn | | |