<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"><channel><title>Haoyang Ma Blog</title><description>Personal website of Haoyang Ma, a research engineer working on LLM agents and systems.</description><link>https://haoyang9804.github.io/</link><item><title>现代孔乙己 - softmax 的 N 种 CUDA 写法</title><link>https://haoyang9804.github.io/blog/%E7%8E%B0%E4%BB%A3%E5%AD%94%E4%B9%99%E5%B7%B1---softmax%E7%9A%84n%E7%A7%8Dcuda%E5%86%99%E6%B3%95/</link><guid isPermaLink="true">https://haoyang9804.github.io/blog/%E7%8E%B0%E4%BB%A3%E5%AD%94%E4%B9%99%E5%B7%B1---softmax%E7%9A%84n%E7%A7%8Dcuda%E5%86%99%E6%B3%95/</guid><description>一次从 naive softmax 到多 kernel reduce 的 CUDA kernel 性能实验笔记。</description><pubDate>Fri, 22 May 2026 00:00:00 GMT</pubDate></item></channel></rss>