Eclipse 的字符串分區(qū)共享優(yōu)化機制

2010-08-28 10:50:01來源：西部e網作者：

在 Java/C# 這樣基于引用語義處理字符串的語言中，作為不可變對象存在的字符串，如果內容相同，則可以通過某種機制實現重用。因為對這類語言來說，指向內存中兩塊內存位置不同內容相同的字符串，與同時指向一個字符串并沒有任何區(qū)別。特別是對大量使用字符串的 XML 文件解析類似場合，這樣的優(yōu)化能夠很大程度上降低程序的內存占用，如 SAX 解析引擎標準中就專門定義了一個 http://xml.org/sax/features/string-interning 特性用于字符串重用。

　　在語言層面，Java/C# 中都直接提供了 String.Intern 的支持。而對 Java 來說，實現上的非常類似。由 String.intern 方法，將當前字符串以內容為鍵，對象引用為值，放入一個全局性的哈希表中。

　　代碼:

//
// java/lang/String.java
//

public final class String
{
　//...
　public native String intern(); // 使用 JNI 函數實現以保障效率
}

//
// hotspot/src/share/vm/prims/jvm.cpp
//

JVM_ENTRY(jstring, JVM_InternString(JNIEnv *env, jstring str))
JVMWrapper("JVM_InternString");
if (str == NULL) return NULL;
　oop string = JNIHandles::resolve_non_null(str); // 將引用解析為內部句柄
　oop result = StringTable::intern(string, CHECK_0); // 進行實際的字符串 intern 操作
　return (jstring) JNIHandles::make_local(env, result); // 獲取內部句柄的引用
　JVM_END
　//
　// hotspot/src/share/vm/memory/symbolTable.cpp
　//
　oop StringTable::intern(oop string, TRAPS)
　{
　　if (string == NULL) return NULL;
　　ResourceMark rm(THREAD); // 保護線程資源區(qū)域
　　int length;
　　Handle h_string (THREAD, string);
　　jchar* chars = java_lang_String::as_unicode_string(string, length); // 獲取實際字符串內容
　　oop result = intern(h_string, chars, length, CHECK_0); // 完成字符串 intern 操作
　　return result;
　}
　oop StringTable::intern(Handle string_or_null, jchar* name, int len, TRAPS)
　{
　　int hashValue = hash_string(name, len); // 首先根據字符串內容計算哈希值
　　stringTableBucket* bucket = bucketFor(hashValue); // 根據哈希值獲取目標容器
　　oop string = bucket->lookup(name, len); // 然后檢測字符串是否已經存在
　　// Found
　　if (string != NULL) return string;
　　// Otherwise, add to symbol to table
　　return basic_add(string_or_null, name, len, hashValue, CHECK_0); // 將字符串放入哈希表
　}

　　對全局字符串表中的字符串，是沒有辦法顯式手動清除的。只能在不使用此字符串后，由垃圾回收線程在進行不可達對象標記時進行分析，并最終調用 StringTable::unlink 方法去遍歷清除。

　　代碼：

//
// hotspot/src/share/vm/memory/genMarkSweep.cpp
//

void GenMarkSweep::mark_sweep_phase1(...)
{
　//...
　StringTable::unlink();
}

//
// hotspot/src/share/vm/memory/symbolTable.cpp
//

void StringTable::unlink() {
　// Readers of the string table are unlocked, so we should only be
　// removing entries at a safepoint.
　assert(SafepointSynchronize::is_at_safepoint(), "must be at safepoint")
　for (stringTableBucket* bucket = firstBucket(); bucket <= lastBucket(); bucket++) {
　　for (stringTableEntry** p = bucket->entry_addr(); *p != NULL;) {
　　　stringTableEntry* entry = *p;
　　　assert(entry->literal_string() != NULL, "just checking");
　　　if (entry->literal_string()->is_gc_marked()) { // 字符串對象是否可達
　　　　// Is this one of calls those necessary only for verification? (DLD)
　　　　entry->oops_do(&MarkSweep::follow_root_closure);
　　　　p = entry->next_addr();
　　　} else { // 如不可達則將其內存塊回收到內存池中
　　　　*p = entry->next();
　　　　entry->set_next(free_list);
　　　　free_list = entry;
　　　}
　　}
　}
}

　　通過上面的代碼，我們可以直觀了解到，對 JVM (Sun JDK 1.4.2) 來說，String.intern 提供的是全局性的基于哈希表的共享支持。這樣的實現雖然簡單，并能夠在最大限度上進行字符串共享;但同時也存在共享粒度太大，優(yōu)化效果無法度量，大量字符串可能導致全局字符串表性能降低等問題。

　　為此 Eclipse 舍棄了 JVM 一級的字符串共享優(yōu)化機制，而通過提供細粒度、完全可控、可測量的字符串分區(qū)共享優(yōu)化機制，一定程度上緩解此問題。Eclipse 核心的 IStringPoolParticipant 接口由使用者顯式實現，在其 shareStrings 方法中提交需要共享的字符串。

　　代碼:

//
// org.eclipse.core.runtime.IStringPoolParticipant
//

public interface IStringPoolParticipant {
　/**
　* Instructs this participant to share its strings in the provided
　* pool.
　*/
　public void shareStrings(StringPool pool);
}

　　例如 MarkerInfo 類型實現了 IStringPoolParticipant 接口，在其 shareStrings 方法中，提交自己需要共享的字符串 type，并通知其下級節(jié)點進行相應的提交。

　　代碼:

//
// org.eclipse.core.internal.resources.MarkerInfo
//

public class MarkerInfo implements ..., IStringPoolParticipant
{
　public void shareStrings(StringPool set) {
　　type = set.add(type);
　　Map map = attributes;
　　if (map instanceof IStringPoolParticipant)
　　((IStringPoolParticipant) map).shareStrings(set);
　}
}

　　這樣一來，只要一個對象樹各級節(jié)點選擇性實現 IStringPoolParticipant 接口，就可以一次性將所有需要共享的字符串，通過遞歸提交到一個字符串緩沖池中進行復用優(yōu)化。如 Workspace 就是這樣一個字符串共享根入口，其 open 方法在完成工作區(qū)打開操作后，將需要進行字符串共享優(yōu)化的緩存管理對象，加入到全局字符串緩沖區(qū)分區(qū)優(yōu)化列表中。

　　代碼:

//
// org.eclipse.core.internal.resources
//

public class Workspace ...
{
　protected SaveManager saveManager;
　public IStatus open(IProgressMonitor monitor) throws CoreException
　{
　　// 打開工作空間
　　// 最終注冊一個新的字符串緩沖池分區(qū)
　　InternalPlatform.getDefault().addStringPoolParticipant(saveManager, getRoot());
　　return Status.OK_STATUS;
　}
}

對需要優(yōu)化的類型 SaveManager 來說，只需要實現 IStringPoolParticipant 接口，并在被調用的時候提交自己與子元素的需優(yōu)化字符串即可。其子元素甚至都不需要實現 IStringPoolParticipant 接口，只需將提交行為一級一級傳遞下去即可，如:

　　代碼:

//
// org.eclipse.core.internal.resources.SaveManager
//

public class SaveManager implements ..., IStringPoolParticipant
{
　protected ElementTree lastSnap;
　public void shareStrings(StringPool pool)
　{
　　lastSnap.shareStrings(pool);
　}
}

//
// org.eclipse.core.internal.watson.ElementTree
//
public class ElementTree
{
　protected DeltaDataTree tree;
　public void shareStrings(StringPool set) {
　　tree.storeStrings(set);
　}
}

//
// org.eclipse.core.internal.dtree.DeltaDataTree
//
public class DeltaDataTree extends AbstractDataTree
{
　private AbstractDataTreeNode rootNode;
　private DeltaDataTree parent;
　public void storeStrings(StringPool set) {
　　//copy field to protect against concurrent changes
　　AbstractDataTreeNode root = rootNode;
　　DeltaDataTree dad = parent;
　　if (root != null)
　　　root.storeStrings(set);
　　if (dad != null)
　　　dad.storeStrings(set);
　}
}
//
// org.eclipse.core.internal.dtree.AbstractDataTreeNode
//
public abstract class AbstractDataTreeNode
{
　protected AbstractDataTreeNode children[];
　protected String name;
　public void storeStrings(StringPool set) {
　　name = set.add(name);
　　//copy children pointer in case of concurrent modification
　　AbstractDataTreeNode[] nodes = children;
　　if (nodes != null)
　　　for (int i = nodes.length; --i >= 0;)
　　　　nodes[i].storeStrings(set);
　}
}

　　所有的需優(yōu)化字符串，都會通過 StringPool.add 方法提交到統一的字符串緩沖池中。而這個緩沖池的左右，與 JVM 級的字符串表略有不同，它只是在進行字符串緩沖分區(qū)優(yōu)化時，起到一個階段性的整理作用，本身并不作為字符串引用的入口存在。因此在實現上它只是簡單的對 HashMap 進行包裝，并粗略計算優(yōu)化能帶來的額外空間，以提供優(yōu)化效果的度量標準。

　　代碼:

//
// org.eclipse.core.runtime.StringPool
//

public final class StringPool {
　private int savings;
　private final HashMap map = new HashMap();
　public StringPool() {
　　super();
　}
　public String add(String string) {
　　if (string == null)
　　　return string;
　　Object result = map.get(string);
　　if (result != null) {
　　　if (result != string)
　　　　savings += 44 + 2 * string.length();
　　　return (String) result;
　　}
　　map.put(string, string);
　　return string;
　}
　// 獲取優(yōu)化能節(jié)省多少空間的大致估算值
　public int getSavedStringCount() {
　　return savings;
　}
}

　　不過這里的估算值在某些情況下可能并不準確，例如緩沖池中包括字符串 S1，此時提交一個與之內容相同但物理位置不同的字符串 S2，則如果 S2 被提交多次，會導致錯誤的高估優(yōu)化效果。當然如果需要得到精確值，也可以對其進行重構，通過一個 Set 跟蹤每個字符串優(yōu)化的過程，獲得精確優(yōu)化度量，但需要損失一定效率。

　　在了解了需優(yōu)化字符串的提交流程，以及字符串提交后的優(yōu)化流程后，我們接著看看 Eclipse 核心是如何將這兩者整合到一起的。

　　前面提到 Workspace.open 方法會調用 InternalPlatform.addStringPoolParticipant 方法，將一個字符串緩沖池分區(qū)的根節(jié)點，添加到全局性的優(yōu)化任務隊列中。

　　代碼:

//
// org.eclipse.core.internal.runtime.InternalPlatform
//

public final class InternalPlatform {
　private StringPoolJob stringPoolJob;
　public void addStringPoolParticipant(IStringPoolParticipant participant, ISchedulingRule rule) {
　if (stringPoolJob == null)
　　stringPoolJob = new StringPoolJob(); // Singleton 模式
　　stringPoolJob.addStringPoolParticipant(participant, rule);
　}
}

//
// org.eclipse.core.internal.runtime.StringPoolJob
//

public class StringPoolJob extends Job
{
　private static final long INITIAL_DELAY = 10000;//five seconds
　private Map participants = Collections.synchronizedMap(new HashMap(10));
　public void addStringPoolParticipant(IStringPoolParticipant participant, ISchedulingRule rule) {
　participants.put(participant, rule);
　if (sleep())
　　wakeUp(INITIAL_DELAY);
　}
　public void removeStringPoolParticipant(IStringPoolParticipant participant) {
　　participants.remove(participant);
　}
}

　　此任務將在合適的時候，為每個注冊的分區(qū)進行共享優(yōu)化。

　　StringPoolJob 類型是分區(qū)任務的代碼所在，其底層實現是通過 Eclipse 的任務調度機制。關于 Eclipse 的任務調度，有興趣的朋友可以參考 Michael Valenta (IBM) 的 On the Job: The Eclipse Jobs API 一文。

　　這里需要了解的是 Job 在 Eclipse 里，被作為一個異步后臺任務進行調度，在時間或資源就緒的情況下，通過調用其 Job.run 方法執(zhí)行。可以說 Job 非常類似一個線程，只不過是基于條件進行調度，可通過后臺線程池進行優(yōu)化罷了。而這里任務被調度的條件，一方面是任務自身的調度時間因素，另一方面是通過 ISchedulingRule 接口提供的任務資源依賴關系。如果一個任務與當前正在運行的任務傳統，則將被掛起直到沖突被緩解。而 ISchedulingRule 接口本身可以通過 composite 模式進行組合，描述復雜的任務依賴關系。

　　在具體完成任務的 StringPoolJob.run 方法中，將對所有字符串緩沖分區(qū)的調度條件進行合并，以便在條件允許的情況下，調用 StringPoolJob.shareStrings 方法完成實際工作。

　　代碼:

//
// org.eclipse.core.internal.runtime.StringPoolJob
//

public class StringPoolJob extends Job
{
　private static final long RESCHEDULE_DELAY = 300000;//five minutes
　protected IStatus run(IProgressMonitor monitor)
　{
　　//copy current participants to handle concurrent additions and removals to map
　　Map.Entry[] entries = (Map.Entry[]) participants.entrySet().toArray(new Map.Entry[0]);
　　ISchedulingRule[] rules = new ISchedulingRule[entries.length];
　　IStringPoolParticipant[] toRun = new IStringPoolParticipant[entries.length];
　　for (int i = 0; i < toRun.length; i++) {
　　　toRun[i] = (IStringPoolParticipant) entries[i].getKey();
　　　rules[i] = (ISchedulingRule) entries[i].getValue();
　　}
　　// 將所有字符串緩沖分區(qū)的調度條件進行合并
　　final ISchedulingRule rule = MultiRule.combine(rules);
　　// 在調度條件允許的情況下調用 shareStrings 方法執(zhí)行優(yōu)化
　　try {
　　　Platform.getJobManager().beginRule(rule, monitor); // 阻塞直至調度條件允許
　　　shareStrings(toRun, monitor);
　　} finally {
　　　Platform.getJobManager().endRule(rule);
　　}
　　// 重新調度任務自己，以便進行下一次優(yōu)化
　　long scheduleDelay = Math.max(RESCHEDULE_DELAY, lastDuration*100);
　　schedule(scheduleDelay);
　　return Status.OK_STATUS;
　}
}

　　StringPoolJob.shareStrings 方法只是簡單的遍歷所有分區(qū)，調用其根節(jié)點的 IStringPoolParticipant.shareStrings 方法，進行前面所述的優(yōu)化工作，并最終返回分區(qū)的優(yōu)化效果。而緩沖池本身，只是作為一個優(yōu)化工具，完成后直接被放棄。

　　代碼:

private int shareStrings(IStringPoolParticipant[] toRun, IProgressMonitor monitor) {
　final StringPool pool = new StringPool();
　for (int i = 0; i < toRun.length; i++) {
　　if (monitor.isCanceled()) // 操作是否被取消
　　　break;
　　final IStringPoolParticipant current = toRun[i];
　　Platform.run(new ISafeRunnable() { // 安全執(zhí)行
　　　public void handleException(Throwable exception) {
　　　　//exceptions are already logged, so nothing to do
　　　}
　　　public void run() {
　　　　current.shareStrings(pool); // 進行字符串重用優(yōu)化
　　　}
　　});
　}
　return pool.getSavedStringCount(); // 返回優(yōu)化效果
}
}

　　通過上面的分析我們可以看到，Eclipse 實現的基于字符串緩沖分區(qū)的優(yōu)化機制，相對于 JVM 的 String.intern() 來說:

　　1.控制的粒度更細，可以指定要對哪些對象進行優(yōu)化;

　　2.優(yōu)化效果可度量，可以大概估算出優(yōu)化能節(jié)省的空間;

　　3.不存在性能瓶頸，不存在集中的字符串緩沖池，因此不會因為大量字符串導致性能波動;

　　4.不會長期占內存，緩沖池只在優(yōu)化執(zhí)行時存在，完成后中間結果被拋棄;

　　5.優(yōu)化策略可選擇，通過定義調度條件，可選擇性執(zhí)行不同的優(yōu)化策略.

關鍵詞：Eclipse

Eclipse 的字符串分區(qū)共享優(yōu)化機制

相關閱讀: