Diving into C++ internals of io.js

by Fedor Indutny / @indutny

whoami

Fedor Indutny

io.js TC member and...

(node.js core team member)

http://jsconfbp.indutny.com

Diving into C++ internals of io.js

Alternative title

A History Of GIT BLAME

C++ talk on

JS conference

Like anyone cares...

...or write in C++ a lot

...in fact

you wrote

C++ code

Ways to optimize JS

Avoid "hidden classes"


function Point(x, y, z) {
  this.x = x;
  this.y = y;
  this.z = z;
}
            
This is JS ^

Compare to:


class Point {
 public:
  double x;
  double y;
  double z;
};
            
This is C++ ^

Avoid "polymorphism"


function add(x, y) {
  return x + y;
}

add(0, 1); // <- good
add('foo', 'bar'); // <- polymorphism!
            

Manual allocation


function Parser() {
}

Parser.freelist = [];

Parser.get = function() {
  if (this.freelist.length)
    return this.freelist.pop();
  return new Parser();
};

Parser.prototype.release = ...;
            

Fast JS

==

C++

(io|node).js

use C/C++

internally

if you didn't know about it

V8 is in C++


also

has only ECMA API

Which is cool

Unless you need timers

...or File System API

...or Networking

Node.js C++ layer:

Lives on top of the event-loop

Provides

  • net sockets
  • dns queries
  • file system
  • zlib bindings
  • other stuff...

Back to the

alternative title

A History Of GIT BLAME

Start with history of the subject

to ease the understanding of

C++ layer

Node.js uses git since beginning


The history is in

git log & git blame

`git log deps/v8` - v8 fighting us

`git log src/` - us fighting v8

node.js begins

61890720 commit

Commit log:

“add readme and initial code”


git checkout 61890720

Two C/C++ dependencies except V8:

  • libebb - for HTTP
  • liboi - for TCP/libev

./server script.js


function Process(request) {
  if (options.verbose) {
    log("Processing " + request.host +
        request.path +
        " from " + request.referrer +
        "@" + request.userAgent);
  }
  if (!output[request.host])
    output[request.host] = 1;
  else
    output[request.host]++
}
            

How was it organized internally?

  • server.cc - command-line args, load JS
  • js_http_request_processor.cc - http request handler

Almost nothing working

at that point

(Just a Proof-of-Concept)

Summary:

  • one file to setup V8 and CLI args
  • HTTP server is in C/C++, without any networking events
  • One C++ instance for every request

    (Mapping uri, headers, method to JS object)

Quickly jumping to

064c8f02 commit

Commit log:

“Use ObjectWrap base class for File, Socket, Server.”

One API to wrap all objects


class File : public ObjectWrap {
 public:
  File(Handle<Object> handle)
      : ObjectWrap(handle) {
  }
};
            
  • net.Server
  • net.Socket
  • File

are ObjectWrap instances

File structure:

  • src/node.cc - init C++ libs, invoke src/main.js
  • src/http.cc - HTTP server API, Connection, HttpRequest
  • src/file.cc, src/file.js - future FS module
  • src/process.cc - .exit(), future process object
  • src/timers.cc - setTimeout/setInterval

Side note:

HTTP server still provided by liboi, node.js is using libev

version 0.2

grown

matured (a bit)

Separating JS from C++

CommonJS

tons of core modules

File structure:

  • lib/ - all JavaScript core modules
  • src/ - their C++ counterparts
  • deps/ - all C/C++ dependencies
    (v8, http-parser, c-ares, libeio, libev)

ObjectWrap is a public API now

(polished out for and by community)

Previously all C++ classes were

global objects

In v0.2 they are provided by process.binding()

Example


> process.binding('fs');
{ access: [Function: access],
  close: [Function: close],
  open: [Function: open],
  ...lots of stuff...
            

...and similar stuff for other modules too

version 0.6

Short note

  • libev/libeio was replaced by libuv
    (Lots of work by Ben Noordhuis, Bert Belder, Ryan Dahl, and others)
  • Windows support

version 0.10

boring

version 0.12

io.js

Lots of new stuff!

(We are interested only in C++)

Important thing:

Outgrown ObjectWrap
(src/node_object_wrap.h)

Now using AsyncWrap
(src/async-wrap.h)
(two fields: parent and providerType)

We have arrived

at the present point

of node.js/io.js

Time to stop the

Software Archeology

Time to get into the

C++ internals...

Interoperation

Handles

Wraps

Unicorns

Two Folders - Two Worlds

lib - src

require('fs')

Just loads lib/fs.js and executes it.

no magic

JS is not capable of FS operations.

No networking either.

it is for the best!

Because of

Sandboxing

But... I need:

fs.writeFileSync()

http.request()

Lots of low-level C++ stuff outside of the JS-land

Let's learn by example

Though, require('fs') is boring

let's move to

require('net')

  • creates sockets
  • `connect` events
  • `.write()` callbacks

All powered by C++ machinery!

Provided by bindings:

process.binding()
  • tcp_wrap - (src/tcp_wrap.cc)
  • stream_wrap - (src/stream_wrap.cc)

Bindings provide JS classes:

  • TCP
  • TCPConnectWrap
  • WriteWrap
  • ShutdownWrap

Purpose of these classes:

  • TCP - holds TCP socket, read/write
  • *Wrap - is what you pass in for async actions on TCP

Example

`net.connect()` workflow:

  1. tcp = new TCP()
  2. Store tcp in _handle
  3. Parse args to net.connect
  4. req = new TCPConnectWrap()
  5. tcp.connect(req, port, host)
  6. async req.oncomplete()

Conslusion

C++ classes are either:

  • Handle
    (not to be confused with v8 handles)
  • Wrap
    (async request wrap)
    (lifetime <= handle's)

File structure

  • src/tcp_wrap.cc
    • TCPWrap (TCP in JS)
    • TCPConnectWrap
  • src/stream_base.cc (io.js only)
    • WriteWrap
    • ShutdownWrap

Structure of C++ side

How does process.binding() work?


NODE_MODULE_CONTEXT_AWARE_BUILTIN(
    fs,
    node::InitFs)
            

Has kind of same effect as:


cppModules['fs'] = {
  initialized: false,
  initFn: InitFS
};
            

So what process.binding() does?


process.binding = function(moduleName) {
  var module = cppModules[moduleName];
  if (module.initialized)
    return module.exports;

  module.exports = {};
  module.initFn(module.exports);
  return module.exports;
};
            

What initFn does?

Initializes module, like in CommonJS

Thanks, captain!

Exports functions and classes

Each exported JS class has

C++ counterpart

Most of them inherit from AsyncWrap

Small reminder. Two kinds of classes:

Handle and Wrap

Garbage Collector destroys only:

Handles

Requests are manually destroyed
after async action completion
(i.o.w. after callback)

Exam time

Time to test your skills!

Situation:

You debug io.js issue, and find out that it

is crashing on:


var FooBar = process.binding('foo_bar').FooBar;

// Crashes here:
var f = new FooBar();
            

Where will you search for FooBar?

Answer:

Somewhere in src/, searching by:


NODE_MODULE_CONTEXT_AWARE_BUILTIN(foo_bar, ..)
            

This will be most likely in src/foo_bar.cc

C++ Streams

Renovation through making stuff public

We did it for ObjectWrap

We should do it for StreamWrap

StreamWrap wasn't that bad in v0.10

Was modified a lot for v0.12

Because we needed TLS streams in C++

Looks terrible in v0.12

StreamWrap

  • Support 1-depth nesting
    via callbacks instances
  • May skip JS callbacks
  • Does some black magic inside
  • Performance is fantastic!
  • Source is rigid

But JS streams works so well...

They support multi-level piping

and are transparent to the user!

What if...

StreamBase in io.js

To the Rescue!

StreamBase (previosuly StreamWrap)

  • No spooky callbacks instances
  • Multi-level stream consumption
  • Almost no black magic
  • Performance is the same!

Still needs to be done:

  • Way to unconsume/unpipe
  • Clean-up the APIs

Let's make it public!

Because:

  • It is cool
  • Great performance
  • HTTP2
  • Users know better

Homework

  1. Clone the io.js repo
  2. Open src/
  3. Go through files
  4. Check what you learned
  5. Fix if we are wrong
  6. Send a PR
  7. ...have fun?

Thank you!