用正則表達(dá)式過(guò)濾腳本的一些研究

2010-08-28 10:49:09來(lái)源：西部e網(wǎng)作者：

在做一些網(wǎng)站（特別是BBS之類）時(shí)，經(jīng)常會(huì)有充許用戶輸入html樣式代碼，卻禁止腳本的運(yùn)行的需求, 以達(dá)到豐富網(wǎng)頁(yè)樣式，禁止惡意代碼的運(yùn)行。
當(dāng)然不能用 HtmlEncode 和 HtmlDecode 方法,因?yàn)檫@樣連基本的html代碼會(huì)被禁止掉。

我在網(wǎng)上搜索，也沒(méi)有找到好的解決辦法，倒是收集了一些腳本攻擊的實(shí)例：

1. <script>標(biāo)記中包含的代碼
2. <a href=javascript:...中的代碼
3. 其它基本控件的 on...事件中的代碼
4. iframe 和 frameset 中載入其它頁(yè)面造成的攻擊

有了這些資料后，事情就簡(jiǎn)單多了，寫(xiě)一個(gè)簡(jiǎn)單的方法，用正則表達(dá)式把以上符合幾點(diǎn)的代碼替換掉:
  public string wipeScript(string html)
  {
       System.Text.RegularExpressions.Regex regex1 = new System.Text.RegularExpressions.Regex(@"<script[\s\S]+</script *>",System.Text.RegularExpressions.RegexOptions.IgnoreCase);
       System.Text.RegularExpressions.Regex regex2 = new System.Text.RegularExpressions.Regex(@" href *= *[\s\S]*script *:",System.Text.RegularExpressions.RegexOptions.IgnoreCase);
       System.Text.RegularExpressions.Regex regex3 = new System.Text.RegularExpressions.Regex(@" on[\s\S]*=",System.Text.RegularExpressions.RegexOptions.IgnoreCase);
       System.Text.RegularExpressions.Regex regex4 = new System.Text.RegularExpressions.Regex(@"<iframe[\s\S]+</iframe *>",System.Text.RegularExpressions.RegexOptions.IgnoreCase);
       System.Text.RegularExpressions.Regex regex5 = new System.Text.RegularExpressions.Regex(@"<frameset[\s\S]+</frameset *>",System.Text.RegularExpressions.RegexOptions.IgnoreCase);
       html = regex1.Replace(html, ""); //過(guò)濾<script></script>標(biāo)記
       html = regex2.Replace(html, ""); //過(guò)濾href=javascript: (<A>) 屬性
       html = regex3.Replace(html, " _disibledevent="); //過(guò)濾其它控件的on...事件
       html = regex4.Replace(html, ""); //過(guò)濾iframe
       html = regex5.Replace(html, ""); //過(guò)濾frameset
       return html;
  }

此方法輸入可能包含腳本的html代碼，返回則就是干凈的代碼了。
我做過(guò)一些簡(jiǎn)單的測(cè)試，可以滿中要求，只是還存在幾個(gè)疑問(wèn)：
以上考濾的情況是否比較完善, 還存在其它的腳本攻擊手段嗎？
是否會(huì)有其它更好的解決辦法?

作者Blog：http://blog.csdn.net/yolle/

關(guān)鍵詞：正則表達(dá)式 ASP.NET

用正則表達(dá)式過(guò)濾腳本的一些研究

相關(guān)閱讀:

贊助商鏈接: