0x01 Introduction

在golang代码审计中遇到了这种情况

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
package main

import (
"bytes"
"text/template"
)

type Callback struct {
URL string `json:"url"` // 回调接口url
Body string `json:"body"` // 请求体
}

type Test struct {
*Callback
Attack string
}

func main() {
test := Test{
Callback: &Callback{
URL: "http://",
Body: "xxxxxxx",
},
Attack: "xxxxxx",
}

urlTemplate, err := template.New("url").Parse(test.URL)
if err != nil { panic(err) }

var urlBuffer bytes.Buffer
err = urlTemplate.Execute(&urlBuffer, test)
if err != nil { panic(err) }

bodyTemplate, err := template.New("body").Parse(test.Body)
if err != nil { panic(err) }
var bodyBuffer bytes.Buffer
err = bodyTemplate.Execute(&bodyBuffer, test)
}

可以从简化代码中看出,template解析的\{\{\}\}均为外部可控的的string传入,目测可以做一些操作,故深入研究一下。

1
2
> $ go version
go version go1.13.1 darwin/amd64

0x02 Tag

之前在看Tag的时候,并没有太理解这个东西有什么用,golang的官方解释如下:

A field declaration may be followed by an optional string literal tag, which becomes an attribute for all the fields in the corresponding field declaration. An empty tag string is equivalent to an absent tag. The tags are made visible through a reflection interface and take part in type identity for structs but are otherwise ignored.

Tag会在reflection interface中课件,且参与到type的定义中,从下面这个例子中可以看出tag会在反射的方法中做自动转换:

1
2
3
4
5
6
7
// A struct corresponding to a TimeStamp protocol buffer.
// The tag strings define the protocol buffer field numbers;
// they follow the convention outlined by the reflect package.
struct {
microsec uint64 `protobuf:"1"`
serverIP6 uint64 `protobuf:"2"`
}

Tag的类型为 type StructTag string

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
package main

import (
"fmt"
"reflect"
)

func main() {
type S struct {
F string `species:"gopher" color:"blue"`
}

s := S{}
st := reflect.TypeOf(s)
field := st.Field(0)
fmt.Println(field.Tag.Get("color"), field.Tag.Get("species"))

}

在这里就很好理解了,通过反射做自动类型转换。

粗略看了一下json序列化和反序列化的实现,从源码中并没有看到显性通过拿Tag的值来进行json key的写入,没找到具体在哪实现的,这里不再赘述,下一篇写这里。

主要参考一下几篇文章:

https://pkg.go.dev/encoding/json#Marshal

https://cs.opensource.google/go/go/+/refs/tags/go1.16.4:src/encoding/json/encode.go;l=158

https://draveness.me/golang/docs/part2-foundation/ch04-basic/golang-reflect/

https://zhuanlan.zhihu.com/p/37165706

0x03 text/template源码分析 - 解析逻辑

文档中的有一些描述还是比较模糊的,看一下源码中的具体实现。

template.New()


初始化Template结构体,和Template.common。在common结构体中定义了parseFuncs和execFuncs两个map,根据注释可知是为了规范API和不将reflection暴露,同时加了一个读写锁muFuncs。众所周知,反射操作具有的灵活性极高,故鄙人决定看看两个map的分离是如何不暴露reflection的。

func (t *Template) Parse(text string) (*Template, error)

Parse函数中,加读锁(RLock),并继续调用Parse。传入t.parseFuncs和builtins,也就是通过Template.Funcs添加的自定义的函数内置函数,其中builtins定义如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
var builtins = FuncMap{
"and": and,
"call": call,
"html": HTMLEscaper,
"index": index,
"slice": slice,
"js": JSEscaper,
"len": length,
"not": not,
"or": or,
"print": fmt.Sprint,
"printf": fmt.Sprintf,
"println": fmt.Sprintln,
"urlquery": URLQueryEscaper,

// Comparisons
"eq": eq, // ==
"ge": ge, // >=
"gt": gt, // >
"le": le, // <=
"lt": lt, // <
"ne": ne, // !=
}

parse.Parse函数中生成初始的TreeSet,同时生成根节点。

Tree结构体如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
// Tree is the representation of a single parsed template.
type Tree struct {
Name string // name of the template represented by the tree.
ParseName string // name of the top-level template during parsing, for error messages.
Root *ListNode // top-level root of the tree.
text string // text parsed to create the template (or its parent)
// Parsing only; cleared after parse.
funcs []map[string]interface{}
lex *lexer
token [3]item // three-token lookahead for parser.
peekCount int
vars []string // variables defined at the moment.
treeSet map[string]*Tree
}

其中,词法分析器lexer结构如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
// lexer holds the state of the scanner.
type lexer struct {
name string // the name of the input; used only for error reports
input string // the string being scanned
leftDelim string // start of action
rightDelim string // end of action
trimRightDelim string // end of action with trim marker
pos Pos // current position in the input
start Pos // start position of this item
width Pos // width of last rune read from input
items chan item // channel of scanned items
parenDepth int // nesting depth of ( ) exprs
line int // 1+number of newlines seen
startLine int // start line of this item
}

继续调用Tree.Parse(),在其中调用词法解析。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
// Parse parses the template definition string to construct a representation of
// the template for execution. If either action delimiter string is empty, the
// default ("{{" or "}}") is used. Embedded template definitions are added to
// the treeSet map.
func (t *Tree) Parse(text, leftDelim, rightDelim string, treeSet map[string]*Tree, funcs ...map[string]interface{}) (tree *Tree, err error) {
defer t.recover(&err)
t.ParseName = t.Name
t.startParse(funcs, lex(t.Name, text, leftDelim, rightDelim), treeSet)
t.text = text
t.parse()
t.add()
t.stopParse()
return t, nil
}

lex()

这个词法解析还是蛮有意思的,这里面的解析规则需要看一下。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// lex creates a new scanner for the input string.
func lex(name, input, left, right string) *lexer {
if left == "" {
left = leftDelim
}
if right == "" {
right = rightDelim
}
l := &lexer{
name: name,
input: input,
leftDelim: left,
rightDelim: right,
trimRightDelim: rightTrimMarker + right,
items: make(chan item),
line: 1,
startLine: 1,
}
go l.run()
return l
}

首先是生成一个lexer,再goroutine调用l.run(), 在这之中循环调用不同函数去解析lexer,从lexText开始。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
// lexText scans until an opening action delimiter, "{{".
func lexText(l *lexer) stateFn {
l.width = 0
if x := strings.Index(l.input[l.pos:], l.leftDelim); x >= 0 {
ldn := Pos(len(l.leftDelim))
l.pos += Pos(x)
trimLength := Pos(0)
if strings.HasPrefix(l.input[l.pos+ldn:], leftTrimMarker) {
trimLength = rightTrimLength(l.input[l.start:l.pos])
}
l.pos -= trimLength
if l.pos > l.start {
l.line += strings.Count(l.input[l.start:l.pos], "\n")
l.emit(itemText)
}
l.pos += trimLength
l.ignore()
return lexLeftDelim
}
l.pos = Pos(len(l.input))
// Correctly reached EOF.
if l.pos > l.start {
l.line += strings.Count(l.input[l.start:l.pos], "\n")
l.emit(itemText)
}
l.emit(itemEOF)
return nil
}

第一个判断找\{\{ 开始解析,去调用lexLeftDelim,lexLeftDelim函数中判断是否是空白字符,注释等。到lexInsideAction函数开始解析:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
// lexInsideAction scans the elements inside action delimiters.
func lexInsideAction(l *lexer) stateFn {
// Either number, quoted string, or identifier.
// Spaces separate arguments; runs of spaces turn into itemSpace.
// Pipe symbols separate and are emitted.
delim, _ := l.atRightDelim()
if delim {
if l.parenDepth == 0 {
return lexRightDelim
}
return l.errorf("unclosed left paren")
}
switch r := l.next(); {
case r == eof || isEndOfLine(r):
return l.errorf("unclosed action")
case isSpace(r):
l.backup() // Put space back in case we have " -}}".
return lexSpace
case r == '=':
l.emit(itemAssign)
case r == ':':
if l.next() != '=' {
return l.errorf("expected :=")
}
l.emit(itemDeclare)
case r == '|':
l.emit(itemPipe)
case r == '"':
return lexQuote
case r == '`':
return lexRawQuote
case r == '$':
return lexVariable
case r == '\'':
return lexChar
case r == '.':
// special look-ahead for ".field" so we don't break l.backup().
if l.pos < Pos(len(l.input)) {
r := l.input[l.pos]
if r < '0' || '9' < r {
return lexField
}
}
fallthrough // '.' can start a number.
case r == '+' || r == '-' || ('0' <= r && r <= '9'):
l.backup()
return lexNumber
case isAlphaNumeric(r):
l.backup()
return lexIdentifier
case r == '(':
l.emit(itemLeftParen)
l.parenDepth++
case r == ')':
l.emit(itemRightParen)
l.parenDepth--
if l.parenDepth < 0 {
return l.errorf("unexpected right paren %#U", r)
}
case r <= unicode.MaxASCII && unicode.IsPrint(r):
l.emit(itemChar)
return lexInsideAction
default:
return l.errorf("unrecognized character in action: %#U", r)
}
return lexInsideAction
}

可以看到上面的逻辑,对特殊字符等情况都进行解析,解析到了就存在lexer.items中,所以模板和字符串最终是被解析成一个个常量的组合。

这里我们假设输入的就是一个字符串,则进入lexIdentifier逻辑,在这个逻辑中,以非字符数字为间隔,将字符串打散,同时判断是否为关键字。

lexIdentifier逻辑如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
// lexIdentifier scans an alphanumeric.
func lexIdentifier(l *lexer) stateFn {
Loop:
for {
switch r := l.next(); {
case isAlphaNumeric(r):
// absorb.
default:
l.backup()
word := l.input[l.start:l.pos]
if !l.atTerminator() {
return l.errorf("bad character %#U", r)
}
switch {
case key[word] > itemKeyword:
l.emit(key[word])
case word[0] == '.':
l.emit(itemField)
case word == "true", word == "false":
l.emit(itemBool)
default:
l.emit(itemIdentifier)
}
break Loop
}
}
return lexInsideAction
}

关键字如下:

1
2
3
4
5
6
7
8
9
10
11
12
var key = map[string]itemType{
".": itemDot,
"block": itemBlock,
"define": itemDefine,
"else": itemElse,
"end": itemEnd,
"if": itemIf,
"range": itemRange,
"nil": itemNil,
"template": itemTemplate,
"with": itemWith,
}

可以看出return还是lexInsideAction函数,所以这里是递归调用,一直到结尾,最终得到lexer.items,其中存储拆散的模板中拆散的token,有意思。

回到func (t *Tree) Parse()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
// Parse parses the template definition string to construct a representation of
// the template for execution. If either action delimiter string is empty, the
// default ("{{" or "}}") is used. Embedded template definitions are added to
// the treeSet map.
func (t *Tree) Parse(text, leftDelim, rightDelim string, treeSet map[string]*Tree, funcs ...map[string]interface{}) (tree *Tree, err error) {
defer t.recover(&err)
t.ParseName = t.Name
t.startParse(funcs, lex(t.Name, text, leftDelim, rightDelim), treeSet)
t.text = text
t.parse()
t.add()
t.stopParse()
return t, nil
}

在回到调用lex()的语句t.startParse(funcs, lex(t.Name, text, leftDelim, rightDelim), treeSet)中,刚才的lex函数是将input拆成token,现在的parse需要将token整合到树中。解析逻辑在parse,鄙人比较关心动态函数调用检查的逻辑,故侧重点在能够触发动态函数调用的代码中。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
// parse is the top-level parser for a template, essentially the same
// as itemList except it also parses {{define}} actions.
// It runs to EOF.
func (t *Tree) parse() {
t.Root = t.newList(t.peek().pos)
for t.peek().typ != itemEOF {
if t.peek().typ == itemLeftDelim {
delim := t.next()
if t.nextNonSpace().typ == itemDefine {
newT := New("definition") // name will be updated once we know it.
newT.text = t.text
newT.ParseName = t.ParseName
newT.startParse(t.funcs, t.lex, t.treeSet)
newT.parseDefinition()
continue
}
t.backup2(delim)
}
switch n := t.textOrAction(); n.Type() {
case nodeEnd, nodeElse:
t.errorf("unexpected %s", n)
default:
t.Root.append(n)
}
}
}

可以看到这里匹配刚才lexer解析出来的token,我们关心的token应该是itemIdentifier,如进入textOrAction()逻辑。

调用如下 Tree.textOrAction⇒ Tree.action

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
// Action:
// control
// command ("|" command)*
// Left delim is past. Now get actions.
// First word could be a keyword such as range.
func (t *Tree) action() (n Node) {
switch token := t.nextNonSpace(); token.typ {
case itemBlock:
return t.blockControl()
case itemElse:
return t.elseControl()
case itemEnd:
return t.endControl()
case itemIf:
return t.ifControl()
case itemRange:
return t.rangeControl()
case itemTemplate:
return t.templateControl()
case itemWith:
return t.withControl()
}
t.backup()
token := t.peek()
// Do not pop variables; they persist until "end".
return t.newAction(token.pos, token.line, t.pipeline("command"))
}

itemIdentifier不符合上面的case,return返回的是一个Action,也就是说中的字符,如不是特殊意义的key或者符号,即为action,这里Action结构体如下:

1
2
3
4
5
6
7
8
9
10
// ActionNode holds an action (something bounded by delimiters).
// Control actions have their own nodes; ActionNode represents simple
// ones such as field evaluations and parenthesized pipelines.
type ActionNode struct {
NodeType
Pos
tr *Tree
Line int // The line number in the input. Deprecated: Kept for compatibility.
Pipe *PipeNode // The pipeline in the action.
}

template中pipeline猜测代表的应该是控制语句,这里的PipeNode通过Tree.pipeline(“command”)生成。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
// Pipeline:
// declarations? command ('|' command)*
func (t *Tree) pipeline(context string) (pipe *PipeNode) {
token := t.peekNonSpace()
pipe = t.newPipeline(token.pos, token.line, nil)
// Are there declarations or assignments?
decls:
if v := t.peekNonSpace(); v.typ == itemVariable {...}
for {
switch token := t.nextNonSpace(); token.typ {
case itemRightDelim, itemRightParen:
// At this point, the pipeline is complete
t.checkPipeline(pipe, context)
if token.typ == itemRightParen {
t.backup()
}
return
case itemBool, itemCharConstant, itemComplex, itemDot, itemField, itemIdentifier,
itemNumber, itemNil, itemRawString, itemString, itemVariable, itemLeftParen:
t.backup()
pipe.append(t.command())
default:
t.unexpected(token, context)
}
}
}

itemIdentifier和其他一堆token都走到Tree.command(),可以看出这里有bool、空值、string等。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
// command:
// operand (space operand)*
// space-separated arguments up to a pipeline character or right delimiter.
// we consume the pipe character but leave the right delim to terminate the action.
func (t *Tree) command() *CommandNode {
cmd := t.newCommand(t.peekNonSpace().pos)
for {
t.peekNonSpace() // skip leading spaces.
operand := t.operand()
if operand != nil {
cmd.append(operand)
}
switch token := t.next(); token.typ {
case itemSpace:
continue
case itemError:
t.errorf("%s", token.val)
case itemRightDelim, itemRightParen:
t.backup()
case itemPipe:
default:
t.errorf("unexpected %s in operand", token)
}
break
}
if len(cmd.Args) == 0 {
t.errorf("empty command")
}
return cmd
}

这里在Tree.newCommand中又搞了一个CommandNode,指定了一下command开始的位置,结构体如下

1
2
3
4
5
6
7
// CommandNode holds a command (a pipeline inside an evaluating action).
type CommandNode struct {
NodeType
Pos
tr *Tree
Args []Node // Arguments in lexical order: Identifier, field, or constant.
}

通过t.peekNonSpace()拆词,拆出来的调用operand,其中再调用term()。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
// term:
// literal (number, string, nil, boolean)
// function (identifier)
// .
// .Field
// $
// '(' pipeline ')'
// A term is a simple "expression".
// A nil return means the next item is not a term.
func (t *Tree) term() Node {
switch token := t.nextNonSpace(); token.typ {
case itemError:
t.errorf("%s", token.val)
case itemIdentifier:
if !t.hasFunction(token.val) {
t.errorf("function %q not defined", token.val)
}
return NewIdentifier(token.val).SetTree(t).SetPos(token.pos)
case itemDot:
return t.newDot(token.pos)
case itemNil:
return t.newNil(token.pos)
case itemVariable:
return t.useVar(token.pos, token.val)
case itemField:
return t.newField(token.pos, token.val)
case itemBool:
return t.newBool(token.pos, token.val == "true")
case itemCharConstant, itemComplex, itemNumber:
number, err := t.newNumber(token.pos, token.val, token.typ)
if err != nil {
t.error(err)
}
return number
case itemLeftParen:
pipe := t.pipeline("parenthesized pipeline")
if token := t.next(); token.typ != itemRightParen {
t.errorf("unclosed right paren: unexpected %s", token)
}
return pipe
case itemString, itemRawString:
s, err := strconv.Unquote(token.val)
if err != nil {
t.error(err)
}
return t.newString(token.pos, token.val, s)
}
t.backup()
return nil
}

在这个逻辑里itemIdentifier会经过hasFunction检查,没问题生成IdentifierNode结构体。

1
2
3
4
5
6
7
8
9
10
11
12
// hasFunction reports if a function name exists in the Tree's maps.
func (t *Tree) hasFunction(name string) bool {
for _, funcMap := range t.funcs {
if funcMap == nil {
continue
}
if funcMap[name] != nil {
return true
}
}
return false
}

所以,能够调用的函数只能在funcMap中,不在的话就会panic。

总结一下,这里tree的解析逻辑中,对lexer中的词再进行进一步的解析,生成的不同的Node,其中NodeType代表类型,按到树中。

0x04 text/template源码分析 - execute逻辑

经过上面的步骤,已经将输入完全打散生成一棵树,在Template.Execute中进行连续调用,直接看处理ActionNode的逻辑,在其中调用了evalPipeline方法。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
// Eval functions evaluate pipelines, commands, and their elements and extract
// values from the data structure by examining fields, calling methods, and so on.
// The printing of those values happens only through walk functions.

// evalPipeline returns the value acquired by evaluating a pipeline. If the
// pipeline has a variable declaration, the variable will be pushed on the
// stack. Callers should therefore pop the stack after they are finished
// executing commands depending on the pipeline value.
func (s *state) evalPipeline(dot reflect.Value, pipe *parse.PipeNode) (value reflect.Value) {
if pipe == nil {
return
}
s.at(pipe)
value = missingVal
for _, cmd := range pipe.Cmds {
value = s.evalCommand(dot, cmd, value) // previous value is this one's final arg.
// If the object has type interface{}, dig down one level to the thing inside.
if value.Kind() == reflect.Interface && value.Type().NumMethod() == 0 {
value = reflect.ValueOf(value.Interface()) // lovely!
}
}
for _, variable := range pipe.Decl {
if pipe.IsAssign {
s.setVar(variable.Ident[0], value)
} else {
s.push(variable.Ident[0], value)
}
}
return value
}

遍历pipe.Cmds,去执行evalCommand()。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
func (s *state) evalCommand(dot reflect.Value, cmd *parse.CommandNode, final reflect.Value) reflect.Value {
firstWord := cmd.Args[0]
switch n := firstWord.(type) {
case *parse.FieldNode:
return s.evalFieldNode(dot, n, cmd.Args, final)
case *parse.ChainNode:
return s.evalChainNode(dot, n, cmd.Args, final)
case *parse.IdentifierNode:
// Must be a function.
return s.evalFunction(dot, n, cmd, cmd.Args, final)
case *parse.PipeNode:
// Parenthesized pipeline. The arguments are all inside the pipeline; final must be absent.
s.notAFunction(cmd.Args, final)
return s.evalPipeline(dot, n)
case *parse.VariableNode:
return s.evalVariableNode(dot, n, cmd.Args, final)
}
s.at(firstWord)
s.notAFunction(cmd.Args, final)
switch word := firstWord.(type) {
case *parse.BoolNode:
return reflect.ValueOf(word.True)
case *parse.DotNode:
return dot
case *parse.NilNode:
s.errorf("nil is not a command")
case *parse.NumberNode:
return s.idealConstant(word)
case *parse.StringNode:
return reflect.ValueOf(word.Text)
}
s.errorf("can't evaluate command %q", firstWord)
panic("not reached")
}

evalCommand中,可以看到FieldNode(.A)、ChainNode(.A.B.C)、IndentifierNode(call/html/js)、PipeNode等。在evalFunction中,依然做了函数的检查,findFunction中检查了execFuncs、和builtinFuncs的函数,而后使用evalCall进行执行,evalCall就没有什么限制了

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
func (s *state) evalFunction(dot reflect.Value, node *parse.IdentifierNode, cmd parse.Node, args []parse.Node, final reflect.Value) reflect.Value {
s.at(node)
name := node.Ident
function, ok := findFunction(name, s.tmpl)
if !ok {
s.errorf("%q is not a defined function", name)
}
return s.evalCall(dot, function, cmd, name, args, final)
}

// findFunction looks for a function in the template, and global map.
func findFunction(name string, tmpl *Template) (reflect.Value, bool) {
if tmpl != nil && tmpl.common != nil {
tmpl.muFuncs.RLock()
defer tmpl.muFuncs.RUnlock()
if fn := tmpl.execFuncs[name]; fn.IsValid() {
return fn, true
}
}
if fn := builtinFuncs[name]; fn.IsValid() {
return fn, true
}
return reflect.Value{}, false
}

执行evalCall除去evalFunction还有evalField,间接的还有evalFieldChain和evalFieldNode,其中的细节逻辑就不再深入探究了。

0x05 可行性分析

通过上面的分析可以看出,对于函数调用,主要通过funcMap进行限制,在解析阶段和exec阶段均存在限制。但在最终的evalcall中反而灵活性比较大,并且在evalField等都存在调用,故可能存在以下数个绕过思路。

  1. parser与execute解析逻辑不一致导致绕过。parser认为某一项nodetype 为 A,而execute认为其为B。这里的逻辑问题比较费力去分析。
  2. Field、FieldChain、$变量等上层方法能够控制最后receiver,指定为任意receiver。
  3. rewrite funcMap
⬆︎TOP