<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>SRE on 黄文卓 | DevOps Engineer</title>
    <link>https://socake.github.io/categories/sre/</link>
    <description>Recent content in SRE on 黄文卓 | DevOps Engineer</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>zh-CN</language>
    <managingEditor>17691281867@163.com (Wenzhuo Huang)</managingEditor>
    <webMaster>17691281867@163.com (Wenzhuo Huang)</webMaster>
    <copyright>© 2026 Wenzhuo Huang</copyright>
    <lastBuildDate>Wed, 24 Sep 2025 10:00:00 +0800</lastBuildDate><atom:link href="https://socake.github.io/categories/sre/index.xml" rel="self" type="application/rss+xml" />
    
    <item>
      <title>On-Call 轮值管理实战：从告警疲劳到可持续值班</title>
      <link>https://socake.github.io/posts/oncall-rotation-management/</link>
      <pubDate>Wed, 24 Sep 2025 10:00:00 +0800</pubDate>
      <author>17691281867@163.com (Wenzhuo Huang)</author>
      <guid>https://socake.github.io/posts/oncall-rotation-management/</guid>
      <description>On-call 不是福利也不是惩罚，是一份职责。把它做成可持续的工程实践，比任何高级监控工具都重要。</description>
      <media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/oncall-rotation-management/featured.jpg" />
    </item>
    
    <item>
      <title>故障响应与 Blameless 复盘：让每一次事故都变成组织资产</title>
      <link>https://socake.github.io/posts/incident-response-postmortem/</link>
      <pubDate>Wed, 10 Sep 2025 10:00:00 +0800</pubDate>
      <author>17691281867@163.com (Wenzhuo Huang)</author>
      <guid>https://socake.github.io/posts/incident-response-postmortem/</guid>
      <description>事故响应不是英雄主义，是一套可重复的流程。把流程、模板、文化讲清楚，让每次事故都能沉淀成组织资产。</description>
      <media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/incident-response-postmortem/featured.jpg" />
    </item>
    
    <item>
      <title>混沌工程 GameDay 实战指南：从第一次演练到常态化故障注入</title>
      <link>https://socake.github.io/posts/chaos-engineering-gameday/</link>
      <pubDate>Wed, 27 Aug 2025 10:00:00 +0800</pubDate>
      <author>17691281867@163.com (Wenzhuo Huang)</author>
      <guid>https://socake.github.io/posts/chaos-engineering-gameday/</guid>
      <description>别把混沌工程理解成随便 kill pod。真正有价值的是一套假设驱动的演练方法论：演练前写下假设，演练中验证，复盘后改进系统和流程。</description>
      <media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/chaos-engineering-gameday/featured.jpg" />
    </item>
    
    <item>
      <title>On-Call 工程实践：从告警响应到 Runbook 设计</title>
      <link>https://socake.github.io/posts/on-call-engineering-practice/</link>
      <pubDate>Tue, 08 Jul 2025 11:26:00 +0800</pubDate>
      <author>17691281867@163.com (Wenzhuo Huang)</author>
      <guid>https://socake.github.io/posts/on-call-engineering-practice/</guid>
      <description>好的 On-Call 体系不是让人 24 小时盯着屏幕，而是让每一次叫醒都有价值。从告警质量到 Runbook 设计，从轮班制度到数据驱动改进，这篇文章是我们团队在生产环境打磨 3 年的实践总结。</description>
      <media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/on-call-engineering-practice/featured.jpg" />
    </item>
    
    <item>
      <title>SRE 故障管理全生命周期：从响应到复盘</title>
      <link>https://socake.github.io/posts/sre-incident-management/</link>
      <pubDate>Sat, 05 Jul 2025 09:30:00 +0800</pubDate>
      <author>17691281867@163.com (Wenzhuo Huang)</author>
      <guid>https://socake.github.io/posts/sre-incident-management/</guid>
      <description>故障处理不只是技术问题，更是协作和信息流问题。这篇文章完整梳理了从故障触发到 Post-Mortem 归档的每个环节，包括 IC 角色的意义、15 分钟定界框架，以及如何让 Post-Mortem 真正推动改进而不是走过场。</description>
      <media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://socake.github.io/posts/sre-incident-management/featured.jpg" />
    </item>
    
  </channel>
</rss>
