Distributed Training

This blog dives into the challenges and innovations in distributed training for deep learning models. We explore how GPU and TPU accelerators are crucial for managing large neural networks and detail the use of gradient descent in model optimization. Key discussions include the strategies of data parallel and distributed data parallel, with insights into implementations like Horovod’s ring-allreduce and PyTorch’s process-based approach. This overview provides a solid foundation for understanding how modern computational resources can be leveraged to enhance deep learning training efficiency.
Read more →

Online Softmax

One of the most important task in deep learning is classification. This involves predicting the class to which a given input data belongs. Models such as convolution neural networks (CNN) and Large language models use classification layers. These models produce output predictions for all possible classes, but these predictions are not immediately usable as they can be any floating point number. The softmax function is essential in converting these raw model outputs, known as logits, into probabilities that sum to one, making them interpretable
Read more →

Sentencepiece Tokenizer

Sentencepiece tokenizer Introduction In the rapidly evolving field of natural language processing (NLP), the efficiency and accuracy of language models hinge significantly on how text data is prepared and processed before training. At the heart of this preparation is the process of tokenization, a crucial step where raw text is transformed into a structured format that the models can interpret from. Among the variety of tokenization methods, the sentencepiece tokenizer stands out as a versatile tool that efficiently handles diverse languages without relying on predefined word boundaries.
Read more →

Why Audio Files Are Really Hard to Compress

Audio data is inherently complex and dense. Unlike text, where redundancy is common (think repeated words or phrases), audio signals are continuous streams of varying frequencies and amplitudes. These signals can include everything from human speech and music to ambient noises and complex soundscapes. The richness and variety in audio data make it challenging to identify and eliminate redundancy without losing essential information. Human perception of sound Human ears are sensitive to a wide range of frequencies, from about 20 Hz to 20 kHz.
Read more →