typescript需要结构-初步了解 Typescript 解析器

bendan520 2023年8月26日 typescript 0

前言

前段时间看了开源组件stryker的源码，对Typescript解析器产生了兴趣。这个开源组件用于测试单元测试的质量。通过识别源代码手动修改个别代码的内容，然后看单元测试来衡量。 Typescript 解析器所做的就是识别源代码的关键步骤。

所以我花了一些时间学习 Typescript 解析器，感觉就像打开了一扇新门，在那里我可以玩很多有趣的东西。

附：strike()

最基本的，生成AST

经过查看Stryker的源码，发现应用Typescript解析器的关键字如下：

export function parseFile(file: File, target: ts.ScriptTarget | undefined) {  return ts.createSourceFile(file.name, file.textContent, target || ts.ScriptTarget.ES5, /*setParentNodes*/ true);}

上面导入了 ts 模块：

import * as ts from 'typescript';

createSourceFile函数的参数可以自由编写。该参数会秒杀Typescript代码，后两个参数可以保留默认值。最终的输出是抽象语法树（AST）。

通过nodejs断点调试可以查看这棵树的每个节点的内容。不过我在网上翻遍了，找到了一个可以语义打出树结构的代码，比较方便：

// 打印 TS 的语法树const printAllChildren = (node: ts.Node, depth = 0) => {  console.log(new Array(depth + 1).join('----'), ts.SyntaxKind[node.kind], node.pos, node.end);  node.getChildren().forEach((c) => printAllChildren(c, depth + 1));};

此时我使用如下源码测试：

export const test = (a: number) => a + 2;export const test2 = 0;

输出右图：

一棵树很容易理解。

Stryker插件的原理是生成AST，深度遍历每个树节点，对具体的树节点进行更新，然后重新生成源码（生成回源码的能力应该是ts.createPrinter的能力，例如）。

现在我们已经掌握了生成 AST 和遍历 AST 的方式了，我们就可以开始耍花招了~

应用：谁编写了最多的单个测试用例？

假设，该项目使用 Jest 编写单元测试。

遍历目录找到单个测试文件并读取文件内容没什么好说的。获取文件内容后，我们生成 Typescript 的 AST 并开始遍历它。

如何判断节点是describe还是test/it？可以这样做

if (ts.SyntaxKind[node.kind] === 'CallExpression') {    const funcName = node.expression.escapedText;    if (funcName === 'describe') {        // TODO    } else if (funcName === 'it' || funcName === 'test') {        // TODO    }}

下一个问题，你如何知道哪些行是这个单个测试用例的代码块？ AST的每个节点都有对应的源代码起始位置和结束位置，然后我们就可以知道它是哪一行了（上一节的AST已经复制过来了）。这时需要注意的是，直接使用pos和end来获取索引。如果树节点是行的开头和结尾，则可以包含换行符和注释。

方法一：可以使用如下方法通过中间数组避开行首和行尾来统计行数：

else if (funcName === 'it' || funcName === 'test') {    const testKeywordNode = node.getChildren()[0];    parseInfoList.push({        describeName: currDescribe,        testName: node.arguments[0].text,        lineBegin: getLineNumByPos(fileContent, testKeywordNode.end), // 不使用 pos，避免引入注释        lineEnd: getLineNumByPos(fileContent, node.end),        expectLines: getExpectLines(fileContent, node) || [],  // 获取所有 expect 所在的行    });    return;}

获取文件内容某个位置所在的行，只需获取上述内容的换行数即可：

const getLineNumByPos = (fileContent, pos) => {    const contentBefore = fileContent.slice(0, pos);    // 统计有多少个换行符    return (contentBefore.match(/n/g) || []).length + 1;};

方法二：使用TS封装的socket。 TS 提供了 getStartgetFullStart 等套接字。它们之间的区别在于 getFullStart 将包含上面的换行符和注释（如果有），但 getStart 不会。至于前面几行，也有对应的api（我为什么不直接用这个？因为我到一边才知道这个（╯‵′）╯︵┻‐┻）：

const { line, character } = sourceFile.getLineAndCharacterOfPosition(node.getStart());

做到了这一点就完成了难度，剩下的无非就是通过 gitblame 命令获取用例代码所在行的作者来分析用例的作者，这不是本文的重点。

高级：使用 Typescript 上层套接字

createSourceFile 还是非常好用的。通过分析 AST 树可以做很多事情，但它的缺点也很明显：

作为底层API，它关注的是底层单个文件中每个token的类型，但无法获取底层的连接。比如Typescript的类型推断不能只看一个token就得到，有的类型推断甚至需要跨文件，这是AST树无法得到的；

只能根据代码的级别一层层向上遍历。当遇到同一个函数有多种写法时，必须兼容区分，而且没有下AST树的socket，这给相关函数的编写带来很多不便；

对此，Typescript 的解析器能力已经帮助我们构建了整体结构，从 AST 中提取代码并不是刀耕火种。这是架构图（来自 Typescript 的 GitHub wiki）

Wiki地址是：，虽然是英文，但是很容易理解。建议阅读。

里面有一个整体解析过程的介绍。生成 AST 只是第一步。它还谈到了一些概念。个人觉得比较有趣的概念有以下几个：

程序

SourceTree是单个文件的结构，多个SourceTree相互关联组成一个Program。可以通过一组源文件或单个源文件来创建程序。此时，类似于 webpack 从主入口寻找文件，Typescript 会将源文件引用的所有文件导入到 Program 中并进行解析处理：

this.program = ts.createProgram([this.srcFile], {    target: ts.ScriptTarget.ES5,    module: ts.ModuleKind.CommonJS,});

因为已经介绍了相关文件，可以找到文件和代码之间的关系，所以很多中间功能都是基于Program的。

类型检查器

从名字上就可以看出，这是用于类型检测的，并且提供了类型推断的功能。从程序创建：

this.checker = this.program.getTypeChecker();

然后你可以做各种类型的事情，例如获取函数的输入和输出参数的类型：

const getFunctionTypeInfoFromSignature = (signature: ts.Signature, checker: ts.TypeChecker): IFunctionTypeInfo => {  // 获取参数类型  const paramTypeStrList = signature.parameters.map((parameter) => {    return checker.typeToString(checker.getTypeOfSymbolAtLocation(parameter, parameter.valueDeclaration));  });
  // 获取返回值类型  const returnType = signature.getReturnType();  const returnTypeStr = checker.typeToString(returnType);
  return {    paramTypes: paramTypeStrList,    returnType: returnTypeStr,  };};
export const getFunctionTypeInfoByNode = (  node: ts.ArrowFunction | ts.FunctionDeclaration | ts.MethodDeclaration,  checker: ts.TypeChecker,): IFunctionTypeInfo => {  const tsType = checker.getTypeAtLocation(node);  return getFunctionTypeInfoFromSignature(tsType.getCallSignatures()[0], checker);};

在这里你会看到 getCallsignatures 返回一个字段，因为 Typescript 支持函数构造。

在使用TypeChecker的过程中你会注意到另一个重要的概念：

象征

AST的每个节点都是Node，那么Symbol和Node有什么区别呢？

简单来说，Node就是一个句子的代码块，它可能是一个变量名，它可能是一个函数之类的关键字typescript需要结构，它可能是一个代码块，而Symbol就像它的名字一样，每个Symbol都类似于我们的东西在控制台中看到。调试时输入的变量名。两个函数可能在本地定义了两个同名的变量，但它们属于不同的Symbol； A.ts 导出的变量 a 在 B.ts 中使用，并且对应于相同的 Symbol。

我个人认为Symbol的作用主要是：

上层分析，比如变量定义，Node数据结构可以理解为一堆初始数据。要获取变量名称，需要获取名称类型然后获取上面的文本，然后再次修剪：

转换为对应的Symbol类似于制作一个中间结构体，只需要调用下层的socket即可如：

const symbol = this.checker.getSymbolAtLocation(declaration.name)!;const name = symbol.getName();

包括获取类的Symbol，通过调用socket很容易就可以获取到构造函数的数据，无论隐藏的有多深。

const symbol = checker.getSymbolAtLocation(node.name);if (!symbol) {  return null;}
const tsType = checker.getTypeOfSymbolAtLocation(symbol, symbol.valueDeclaration);const signature = tsType.getConstructSignatures()[0];

类型关联。即使在两个不同的文件中使用相同的符号，其类型也是固定的。不过我发现通过Symbol可以获取Type，通过Node也可以获取Type，所以我觉得没有必要使用Symbol的功能。

有关更多 Typescript 解析器内容，您可以深入阅读 Typescript wiki。如果想直接理解代码，wiki上也有几种情况：

理解了这个概念后，我们就不需要为同一个功能编写一堆基于 AST 树的代码，而是可以以更加高贵的形式来实现。

先别急着开始写，先了解一下这个网站和那些API

一开始我只是用第一节的代码复制AST树，直到我发现了下面的网站：#

功能大致如下：

非常好用，特别是在最右边的区域，你可以知道每个AST节点有什么属性和技能（不同类型的AST节点不同），并且可以轻松获取相关数据，而不是只经过一层遍历。

至于左后侧的窗棂，可以结合这个demo看看：#creating-and-printing-a-typescript-ast

然后说说Typescript提供的解析器API。总的来说，我特别抱怨，因为我在 GitHub 上没有找到解析器的 API 列表，所以我只能看示例，翻阅 Typescript 的源代码来理解。以下是我使用过并发现有用的一些内容：

参考之前AST网站中每一层节点的类型，都有对应的判别函数，比如ts.isClassDeclaration、ts.isArrowFunction等。

当然，你也可以参考第一节的demo，使用 ts.SyntaxKind[node.kind] === 'VariableStatement' 来判断，但是使用标准套接字对 TS 更加友好：

modifierFlag可以简单理解为修饰符标志，比如public、private、async等，可以在AST查看网站里面看到：

判断一个树节点是否有某个标志，可以参考下面的写法：

export const isNodeExported = (node: ts.Node): boolean => {  return (ts.getCombinedModifierFlags(node as ts.Declaration) & ts.ModifierFlags.Export) !== 0;};

我仍在寻找更有用的 API。

应用：获取类的所有成员及其类型定义

过程：

private analyseExportNodeForClass(node: ts.ClassDeclaration) {  const className = node.name?.getFullText().trim() || '';  const classMemeberInfoList: IClassMemberInfo[] = [];
  node.members.forEach((member) => {    if (!ts.isPropertyDeclaration(member) && !ts.isMethodDeclaration(member)) {      return;    }
    const { name, type, funcArgsType } = ts.isPropertyDeclaration(member)      ? this.getBasicInfoFromVarDeclaration(member)      : this.getBasicInfoFromFuncDeclaration(member);    const accessibility = getClassAccessibility(member);
    console.log('name', name);    console.log('type', type);    console.log('funcArgsType', funcArgsType);    console.log('');
    classMemeberInfoList.push({      name,      type,      funcArgsType,      accessibility,    });  });
  // 构造函数单独处理  const constructorParamType = this.getConstructorParamType(node);    // TODO 输出相关变量  console.log(className, classMemeberInfoList, constructorParamType);}

其中获取成员函数的定义和类型：

private getBasicInfoFromFuncDeclaration(declaration: ts.FunctionDeclaration | ts.MethodDeclaration) {  const symbol = this.checker.getSymbolAtLocation(declaration.name!)!;  const name = symbol.getName();  const typeInfo = getFunctionTypeInfoByNode(declaration, this.checker);  const type = typeInfo.returnType;  const funcArgsType = typeInfo.paramTypes;
  return {    name,    type,    funcArgsType,  };}
// utils.tsexport const getFunctionTypeInfoByNode = (  node: ts.ArrowFunction | ts.FunctionDeclaration | ts.MethodDeclaration,  checker: ts.TypeChecker,): IFunctionTypeInfo => {  const tsType = checker.getTypeAtLocation(node);  return getFunctionTypeInfoFromSignature(tsType.getCallSignatures()[0], checker);};
const getFunctionTypeInfoFromSignature = (signature: ts.Signature, checker: ts.TypeChecker): IFunctionTypeInfo => {  // 获取参数类型  const paramTypeStrList = signature.parameters.map((parameter) => {    return checker.typeToString(checker.getTypeOfSymbolAtLocation(parameter, parameter.valueDeclaration));  });
  // 获取返回值类型  const returnType = signature.getReturnType();  const returnTypeStr = checker.typeToString(returnType);
  return {    paramTypes: paramTypeStrList,    returnType: returnTypeStr,  };};

获取成员属性比较简单typescript需要结构，不再赘述。

获取构造函数类型：

private getConstructorParamType(node: ts.ClassDeclaration) {  const constructorInfo = getConstructorInfo(node, this.checker);  const constructorParamType: string[] = constructorInfo    ? constructorInfo.paramTypes    : [];
  return constructorParamType;}
// utils.tsexport const getConstructorInfo = (node: ts.ClassDeclaration, checker: ts.TypeChecker): IFunctionTypeInfo | null => {  if (!node.name) {    return null;  }
  const symbol = checker.getSymbolAtLocation(node.name);  if (!symbol) {    return null;  }
  const tsType = checker.getTypeOfSymbolAtLocation(symbol, symbol.valueDeclaration);  const signature = tsType.getConstructSignatures()[0];
  if (!signature) {    return null;  }
  return getFunctionTypeInfoFromSignature(signature, checker);};

而如果你想知道每个成员的开放程度，你可以这样做：

export const getClassAccessibility = (node: ts.PropertyDeclaration | ts.MethodDeclaration) => {  // const hasPublic = (ts.getCombinedModifierFlags(node) & ts.ModifierFlags.Public) !== 0;  const hasPrivate = (ts.getCombinedModifierFlags(node) & ts.ModifierFlags.Private) !== 0;  const hasProtect = (ts.getCombinedModifierFlags(node) & ts.ModifierFlags.Protected) !== 0;
  return hasProtect ? ts.ModifierFlags.Protected : hasPrivate ? ts.ModifierFlags.Private : ts.ModifierFlags.Public;};

至此，一个类该有的所有信息就可以获取到了。当然，实际情况还有一些特殊的场景，比如静态成员，比如getter等，你可以参考AST查看网站的结构，大致计算出使用哪些API。

获取了类信息之后，我们可以做很多事情，比如手动输出各个类的socket文档？手动模拟类的实例？想象空间真大~

总结

讲了这么多，我可能已经触及了 Typescript 解析器的冰山一角。随着Typescript的使用越来越多，了解Typescript解析器并编译相关工具将成为一件有意义且有些挑战性的事情，期待听到更多工具~

注：相关参考文档

typescript需要结构-初步了解 Typescript 解析器

发表评论

发表回复取消回复

相关文章

发表评论

发表回复 取消回复

发表回复取消回复