和SLR(1)分析表)
用Python自动化构建LR(0)和SLR(1)分析表从理论到代码实现编译原理中的语法分析阶段常常让学习者望而生畏尤其是构造LR分析表的过程。传统的手工绘制状态机和填写表格不仅耗时耗力还容易出错。本文将带你用Python实现LR(0)和SLR(1)分析表的自动构造通过代码理解算法本质告别死记硬背的学习方式。1. 基础准备文法表示与数据结构在开始编码前我们需要设计合适的数据结构来表示文法和分析表。Python的类和字典结构非常适合这种场景。class Production: def __init__(self, head, body): self.head head # 产生式左部 self.body body # 产生式右部列表形式 def __str__(self): return f{self.head} - { .join(self.body)} class Grammar: def __init__(self): self.productions [] # 产生式集合 self.terminals set() # 终结符集合 self.non_terminals set() # 非终结符集合 self.start_symbol None # 开始符号关键数据结构说明Production类封装了文法的单个产生式Grammar类管理整个文法的元信息使用集合存储终结符和非终结符便于快速查找2. 拓广文法与项目表示拓广文法是构造LR分析表的第一步我们需要在原始文法基础上添加一个新的开始符号。def augment_grammar(grammar): 拓广文法添加新的开始符号S augmented Grammar() augmented.start_symbol grammar.start_symbol augmented.productions.append(Production( augmented.start_symbol, [grammar.start_symbol] )) # 复制原始文法的产生式 for prod in grammar.productions: augmented.productions.append(prod) return augmentedLR项目是带有点标记的产生式我们可以扩展Production类来表示class LRItem: def __init__(self, production, dot_pos0): self.production production self.dot_pos dot_pos # 点的位置 def next_symbol(self): 获取点后面的符号 if self.dot_pos len(self.production.body): return self.production.body[self.dot_pos] return None def is_reduce_item(self): 判断是否为归约项目 return self.dot_pos len(self.production.body)3. 构造项目集规范簇项目集规范簇(Canonical Collection)是LR分析的核心我们需要实现闭包(closure)和转移(goto)操作。def closure(items, grammar): 计算项目集的闭包 closure_set set(items) changed True while changed: changed False for item in list(closure_set): next_sym item.next_symbol() if next_sym in grammar.non_terminals: # 添加该非终结符的所有产生式 for prod in grammar.productions: if prod.head next_sym: new_item LRItem(prod, 0) if new_item not in closure_set: closure_set.add(new_item) changed True return frozenset(closure_set) def goto(items, symbol, grammar): 计算转移函数 new_items set() for item in items: if item.next_symbol() symbol: new_items.add(LRItem(item.production, item.dot_pos 1)) return closure(new_items, grammar)算法要点闭包操作会递归添加所有可能的项目转移函数模拟DFA的状态转移使用frozenset保证项目集的不可变性4. 构建LR(0)分析表有了项目集规范簇后我们可以构造LR(0)分析表。分析表通常包含ACTION和GOTO两部分。def build_lr0_table(grammar): 构造LR(0)分析表 augmented augment_grammar(grammar) start_item LRItem(augmented.productions[0], 0) c0 closure({start_item}, augmented) # 初始化分析表 action_table {} goto_table {} # 状态编号到项目集的映射 state_id {c0: 0} queue [c0] next_id 1 while queue: current queue.pop(0) current_id state_id[current] # 处理归约项目 for item in current: if item.is_reduce_item(): if item.production.head augmented.start_symbol: # 接受动作 action_table[(current_id, #)] (acc,) else: # 归约动作 prod_num augmented.productions.index(item.production) for term in augmented.terminals.union({#}): action_table[(current_id, term)] (r, prod_num) # 处理转移 symbols set() for item in current: sym item.next_symbol() if sym: symbols.add(sym) for sym in symbols: new_items goto(current, sym, augmented) if new_items not in state_id: state_id[new_items] next_id queue.append(new_items) next_id 1 new_state state_id[new_items] if sym in augmented.terminals: action_table[(current_id, sym)] (s, new_state) else: goto_table[(current_id, sym)] new_state return action_table, goto_table5. 扩展为SLR(1)分析表SLR(1)在LR(0)基础上增加了FOLLOW集的处理能解决部分冲突。我们需要先实现FOLLOW集的计算。def compute_follow(grammar): 计算非终结符的FOLLOW集 follow {nt: set() for nt in grammar.non_terminals} follow[grammar.start_symbol].add(#) changed True while changed: changed False for prod in grammar.productions: for i, sym in enumerate(prod.body): if sym in grammar.non_terminals: # 规则1A - αBβ将FIRST(β)-{ε}加入FOLLOW(B) first_beta set() for next_sym in prod.body[i1:]: if next_sym in grammar.terminals: first_beta.add(next_sym) break else: first_beta.update( first.get(next_sym, set()) - {None} ) if None not in first.get(next_sym, set()): break else: first_beta.add(None) if None in first_beta: first_beta.remove(None) old_size len(follow[sym]) follow[sym].update(follow[prod.head]) if len(follow[sym]) old_size: changed True old_size len(follow[sym]) follow[sym].update(first_beta) if len(follow[sym]) old_size: changed True return follow基于FOLLOW集我们可以修改分析表构造逻辑def build_slr1_table(grammar): 构造SLR(1)分析表 action, goto build_lr0_table(grammar) follow compute_follow(grammar) # 解决冲突检查每个状态中的归约项目 for state in range(len(set(s for s, _ in action.keys()))): reduce_items [] for (s, sym), act in list(action.items()): if s state and act[0] r: reduce_items.append((sym, act[1])) # 检查移进-归约冲突 for (s, sym), act in list(action.items()): if s state and act[0] s: for (rsym, rprod) in reduce_items: if sym rsym: # 检查FOLLOW集是否相交 prod_head grammar.productions[rprod].head if sym in follow[prod_head]: raise ValueError( fSLR(1)冲突状态{state}符号{sym} ) return action, goto6. 实际应用与冲突处理在实际编码中我们经常会遇到各种冲突情况。让我们通过一个具体例子来演示如何处理。示例文法E - E T | T T - T * F | F F - ( E ) | id首先定义文法grammar Grammar() grammar.start_symbol E grammar.non_terminals {E, T, F} grammar.terminals {, *, (, ), id} grammar.productions [ Production(E, [E, , T]), Production(E, [T]), Production(T, [T, *, F]), Production(T, [F]), Production(F, [(, E, )]), Production(F, [id]) ]构造分析表并测试# 构造LR(0)分析表 lr0_action, lr0_goto build_lr0_table(grammar) # 构造SLR(1)分析表 slr1_action, slr1_goto build_slr1_table(grammar) # 比较两者的差异 print(LR(0) ACTION表大小:, len(lr0_action)) print(SLR(1) ACTION表大小:, len(slr1_action))常见问题处理移进-归约冲突当同一单元格需要同时填移进和归约时SLR(1)会检查FOLLOW集归约-归约冲突当同一单元格有多个归约项时通常需要更强大的分析器如LR(1)或LALR(1)表项缺失可能意味着文法不是LR(0)或SLR(1)的需要考虑错误恢复机制7. 可视化与分析表调试为了更好理解分析表的构造过程我们可以添加可视化功能def print_table(action, goto, grammar): 打印分析表 terminals sorted(grammar.terminals.union({#})) non_terminals sorted(grammar.non_terminals) print(状态\t \t.join(terminals) \t \t.join(non_terminals)) max_state max(s for s, _ in action.keys()) for state in range(max_state 1): row [str(state)] for term in terminals: key (state, term) if key in action: act action[key] if act[0] s: row.append(fs{act[1]}) elif act[0] r: row.append(fr{act[1]}) else: row.append(act[0]) else: row.append() for nt in non_terminals: key (state, nt) if key in goto: row.append(str(goto[key])) else: row.append() print(\t.join(row))这个可视化函数可以帮助我们直观地比较LR(0)和SLR(1)分析表的差异特别是在冲突处理上的不同。