Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

《看源码学web》【如何解析form(一)】 #5

Open
gamebody opened this issue Feb 15, 2019 · 0 comments
Open

《看源码学web》【如何解析form(一)】 #5

gamebody opened this issue Feb 15, 2019 · 0 comments

Comments

@gamebody
Copy link
Owner

http一些头信息

Content-Type • 实体中所承载对象的类型。
Content-Length • 所传送实体主体的长度或大小。
Content-Language • 与所传送对象最相配的人类语言。
Content-Encoding • 对象数据所做的任意变换(比如,压缩) 。

http请求是有headers和实体组成的,CRLF【回车(CR, ASCII 13, \r) 换行(LF, ASCII 10, \n)】之后就是实体。

文本的字符编码(之后讨论)
Content-Type: text/html; charset=iso-8859-4

多部分表格提交

multipart/form-data

<form action="/" method="post" enctype="multipart/form-data">
  <input type="text" name="description" value="some text">
  <input type="file" name="myFile">
  <button type="submit">Submit</button>
</form>

看一下请求体

POST /foo HTTP/1.1
Content-Length: 68137
Content-Type: multipart/form-data; boundary=---------------------------974767299852498929531610575

---------------------------974767299852498929531610575
Content-Disposition: form-data; name="description" 

some text
---------------------------974767299852498929531610575
Content-Disposition: form-data; name="myFile"; filename="foo.txt" 
Content-Type: text/plain 

(content of the uploaded file foo.txt)
---------------------------974767299852498929531610575

application/x-www-form-urlencoded

<form action="http://127.0.0.1:3000" method="post" enctype="application/x-www-form-urlencoded">
    user: <input type="text" name="user" value="111"><br>
    password: <input type="text" name="pass" value="222"><br>
    <button type="submit">提交</button>
</form>

看下请求体

Content-Type: application/x-www-form-urlencoded
user=111&pass=222

text/plain

<form action="http://127.0.0.1:3000" method="post" enctype="text/plain">
    user: <input type="text" name="user" value="111"><br>
    password: <input type="text" name="pass" value="222"><br>
    <button type="submit">提交</button>
</form>

看下请求体

Content-Type: text/plain
user=111
pass=222

可以用node接收一下

const Koa = require('koa');
const app = new Koa();

app.use(async ctx => {
    const arr = []
    ctx.req.on('data', (chunk) => {
        arr.push(chunk)
    })
    ctx.req.on('end', (chunk) => {
        console.log(Buffer.concat(arr))
    })
    ctx.body = 'asdads'
});

app.listen(3000);

// <Buffer 75 73 65 72 3d 31 31 31 0d 0a 70 61 73 73 3d 32 32 32 0d 0a>
// 转换一下
user=111
pass=222

可以细节的观察到有两个0d 0a(回车换行)

内容编码的类型

Content-Encoding 首部就用这些标准化的代号来说明编码时使用的算法

Content-Encoding值 描  述
gzip 表明实体采用 GNU zip 编码a
compress 表明实体采用 Unix 的文件压缩程序
deflate 表明实体是用 zlib 的格式压缩的b
identity 表明没有对实体进行编码。当没有Content-Encoding首部时,就默认为这种情况

有这些稍微基础的原理,就可以分析源码学web了,看看是怎么解析数据,可以友好的使用这些值。

分析 koa-body 这个中间件
核心代码:

// 根据不同的content-type来判断如何解析
if (opts.json && ctx.is('json')) {
        bodyPromise = buddy.json(ctx, {
        encoding: opts.encoding,
        limit: opts.jsonLimit,
        strict: opts.jsonStrict,
        returnRawBody: opts.includeUnparsed
    });
} else if (opts.urlencoded && ctx.is('urlencoded')) {
    bodyPromise = buddy.form(ctx, {
        encoding: opts.encoding,
        limit: opts.formLimit,
        queryString: opts.queryString,
        returnRawBody: opts.includeUnparsed
    });
} else if (opts.text && ctx.is('text')) {
    bodyPromise = buddy.text(ctx, {
        encoding: opts.encoding,
        limit: opts.textLimit,
        returnRawBody: opts.includeUnparsed
    });
} else if (opts.multipart && ctx.is('multipart')) {
    bodyPromise = formy(ctx, opts.formidable);
}


// 根据同的type返回不同的结果
bodyPromise = bodyPromise || Promise.resolve({});
return bodyPromise.catch(function(parsingError) {
    if (typeof opts.onError === 'function') {
    opts.onError(parsingError, ctx);
    } else {
    throw parsingError;
    }
    return next();
})
.then(function(body) {
    if (opts.patchNode) {
        if (isMultiPart(ctx, opts)) {
            ctx.req.body = body.fields;
            ctx.req.files = body.files;
        } else if (opts.includeUnparsed) {
            ctx.req.body = body.parsed || {};
            if (! ctx.is('text')) {
                ctx.req.body[symbolUnparsed] = body.raw;  
            }
        } else {
            ctx.req.body = body;
        }
    }
    if (opts.patchKoa) {
        if (isMultiPart(ctx, opts)) {
            ctx.request.body = body.fields;
            ctx.request.files = body.files;
        } else if (opts.includeUnparsed) {
            ctx.request.body = body.parsed || {};
            if (! ctx.is('text')) {
                ctx.request.body[symbolUnparsed] = body.raw;
            }
        } else {
            ctx.request.body = body;
        }
    }
    return next();
})

核心的解析逻辑又在 co-body 这个包里面
看一下co-body的代码

// index.js


exports = module.exports = require('./lib/any');
exports.json = require('./lib/json');
exports.form = require('./lib/form');
exports.text = require('./lib/text');

解析核心代码

// form.js  用了qs包来解析
var parsed = opts.qs.parse(str, queryString);
return opts.returnRawBody ? { parsed: parsed, raw: str } : parsed;
// json.js 直接使用的JSON.parse
var parsed = parse(str);
return opts.returnRawBody ? { parsed: parsed, raw: str } : parsed;

function parse(str){
    if (!strict) return str ? JSON.parse(str) : str;
    // strict mode always return object
    if (!str) return {};
    // strict JSON test
    if (!strictJSONReg.test(str)) {
      throw new Error('invalid JSON, only supports object and array');
    }
    return JSON.parse(str);
}
// text.js
return raw(inflate(req), opts);
// index.js
function inflate(stream, options) {
    if (!stream) {
        throw new TypeError('argument stream is required')
    }

    options = options || {}

    var encoding = options.encoding
        || (stream.headers && stream.headers['content-encoding'])
        || 'identity'

    switch (encoding) {
    case 'gzip':
    case 'deflate':
        break
    case 'identity':
        return stream
    default:
        var err = new Error('Unsupported Content-Encoding: ' + encoding)
        err.status = 415
        throw err
    }

    // no not pass-through encoding
    delete options.encoding

    return stream.pipe(zlib.Unzip(options))
}

inflate只是根据 content-encoding 解压返回一个stream,raw 是 raw-body的包(嘿嘿)

就这些吧,还有MultiPart和字符知识的没有写,这个库也涉及了,学习要 有的放矢,👋再见。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant