Papermill: C0DE

Max F. Albrecht

2013-08-09T06:03:38

1 mill-cli.js

                _   _  
            o  | | | | 
  _  _  _      | | | | 
 / |/ |/ |  |  |/  |/  
   |  |  |_/|_/|__/|__/

This is the application running as mill command line interface. It is directly require()-d and start()-ed by the ./bin/mill stub.

The stub is the only 'real' shell script in this project. That means it is the only file directly called by the operating system. That is also the reason it needs a "hashbang" as it's first line.

Condensed to its main functionality, the stub looks like this::1

#!/usr/bin/env node

var mill = require('../mill-cli.js');

mill.start(function (err) {
  process.exit(err ? 1 : 0);
});

1.1 SETUP

The setup process is idiomatic for every nodejs file in this project, so it is only verbosely annotated here

Since we start from nothing, we need to require some modules.

Some of them come from the node.js core, they can be required without installing them.

  • we need path for working with paths.
var path = require('path'),

A Note about external modules: After npm install --save something in the app dir, they are installed in the sub-folder 'node_modules', thus will be found automagically. The --save falgs also instructs npm to write this 'dependency' to the package.json file.

Now, we require() our external modules. - the flatiron anti-framework

    flatiron = require('flatiron');

1.2 {APP}

We start out with an 'app' object, which we get from the flatiron module.



var app = mill = module.exports = flatiron.app;

This is a "injection container", provided by the 'broadway' module. This means plugins can modify the app object directly. Where do these plugins come from? - some are built-in and already activated by flatiron: (log, conf, router) - some are built-in, but need to be activated (http, cli, …) - Your own plugins need to activated as well From flatiron, we get a bunch of stuff (but not too much)

1.3 Configuration

Before doing anything with our app, we load the configuration. app.config is another built-in part of flatiron: the nconf module In short, it allows us to use 3 pre-configured sources where our configuration could come from: - a JSON file, ie. ./config.json -> { "foo": 1337 } - the environment variables of the process, ie. $ export foo=1337 && mill - command line flags, ie. mill --foo 1337

The order in which we load the config specifies the order in which they are used! First is more important than second, etc. You can think of the config file as default settings, the environment variables as per-system setting and command line arguments as per-run settings, thus they are used in that order.


app.config.argv(); // conf source: arguments is most important
app.config.env();  // then env vars
app.config.file('file', path.join(__dirname, 'config', 'config.json')); // lastly, our config.json file

FIXME: set the dir manually

app.config.set('cwd', process.cwd());
  
  • also use the "cli" plugin (enables lazy-loading commands and color output)
app.use(flatiron.plugins.cli, {
  source: path.join(__dirname, 'lib', 'commands'),
  

1.4 Usage

The 'usage' information will be show when no (valid) command was given.

  "notFoundUsage": true,
  

It is an array of strings, which will be seperated by line breaks. We start by getting our ASCII logo (if found), and append the text to that.

  usage: (Array.isArray(app.config.get('banner')) ? app.config.get('banner') : []).concat([
    'Commands:',
    'mill new "Project Title" [-s paper|simple]     Setup a new project',
    'mill print [/path/to/project]                  Output project to PDF',
    'mill web [/path/to/project]                    Output project to HTML',
    'mill help <command>                            Show more help',
    ''
  ]),
});

We also use our own modules:

  • Utility functions
app.use(require('./lib/utils'));
  • Command Shortcuts
require('./lib/alias');

The CLI can be run in debug mode. We detect if the user wants it and set a variable for it to use throughout the program.

if (app.config.get('debug') || app.config.get('DEBUG:on')) {
  app.DEBUG = true;
}

Turn CLI colors off on request (--no-colors, { "colors": true }) - from jitsu

if (!app.config.get('colors')) {
  
  app._NOCOLORS = true;
  

app needs to be inited before we can set up the log

  app.init(function (err, res) {
    
    app.log.get('default').stripColors = true;
    app.log.get('default').transports.console.colorize = false;
  });
}

This finishes the mill CLI.

2 output.js

/*jslint node: true, regexp: true, nomen: true, sloppy: true, vars: true, white: true */

2.1 mill output

Create and set up a new project.


(Required modules)

var fs = require('graceful-fs'),
    path = require('path'),
    async = require('async'),
    f = require('underscore'),
    build = require('../build/mill-build'),
    readProjectConfig = require('../readProjectConfig');

2.1.1 Usage Information

This is displayed when the user requests help on this command.

var usage= [
  'Renders a papermill project according to settings.',
  '- Can export to PDF (print) and HTML (web)',
  '- Uses auto-mode when no settings are found',
  '',
  'Usage: mill output <project> [--format]',
  '',
  'Example usages:',
  'mill print',
  'mill output --print </path/to/project>',
  'mill web',
  'mill output --web .' 
];

2.1.2 Workflow: output

this function gets called by the cli app when the user runs the output command.

function output (dir, callback) {
  

Our context is set to the application, we save it in a variable for convenience.

  var app = this;
  

TODO: loop over all args

TODO: see if arg is file or path

set path from supplied argument or use the current working directory

  dir = dir || '.';
  

make the path absolute from the current working directory

  dir = path.resolve(process.cwd(), dir);
  
  mill.dbug("output dir:", dir)
  

For control flow, we use an async chain of events, where "each functions callback value becomes the next functions first argument":

  async.waterfall([
    
  1. Get the user configuration
      function getUserConf(callback) {
      
  • try to read config file from project path
        readProjectConfig(dir, app.config.get(), callback);  
      
      },
    
  1. Build with the configuration
      function (config, callback) {
        build(config, callback);
      }
    ],
    
  1. Done
    function finishedChain(err, result) {
      
      if (err) { return app.fail(err, callback); }
      
  • build answer
      var res = result;
      
  • callback to the CLI
      mill.dbug("build res:", res)
      
      callback(null, res);
    
  });
  
}

The whole command workflow lies in 1 main function, which we also export as a module;

output.usage = usage;
module.exports = output;

3 mill-build.js

A module to build output from papermill project config

3.1 Setup

var fs = require('fs-extra'), // uses 'graceful-fs' if available
    path = require('path'),
    temp = require('temp'),
    async = require('async'),
    f = require('underscore'),
    pandoc = require('pandoc-api');
    mill = require('../../mill-cli');

3.2 Module

module.exports = function build(project, callback) {
    

Here lies just the workflow connecting the worker functions. Each functions callback values is the first argument of the next function.

  
  async.waterfall(
    
    [
      function start(callback) {
        inflateConfig(project, callback)
      },
      deriveBuildJobs,
      render
    ],
    
    callback
    
  );
  
};

3.3 Worker functions

3.3.1 inflateConfig

function inflateConfig(config, callback) {
  
  • apply defaults where nothing is given
  • read all lazy-configurable properties (input, output)
  • see if it is string, array or object
  • rewrite lazy values to full values

Handle errors - TODO: spec validation…

  if (!config) {
    callback(new Error("No config given, can't inflate!"));
  }
  
  • project config just with "known" properties
  var project = f.pick(config, mill.config.get('papermill:known_props'));
  

3.3.1.1 input -- NOT optional for now

  
  ['input'].forEach(function (prop) {
    
    if (typeof config[prop] === 'undefined') {
      callback(new Error("No config.input!"));
    }
    
    var item = config[prop];
    
  • is it a string?
    if (typeof item === 'string') {

use it as path

      config[prop] = [{ 'path': item }];

return null;

    }
  • is object (or array)?
    if (typeof item === 'object') {
  • is really array?
      if (Array.isArray(item)) {

nothing to do, TODO: recursion return null;

      }
  • is really an object?
      else {
        
  • is there a non-empty list in it? (Configure list property name to 'list')
        var lst = 'list';
        if (item[lst] && item[lst].length) {
        
  • prepare result
          var res = [],
  • save the base config, minus the list
          base = f.extend({}, f.omit(item, lst));
          
  • loop over the list TODO: recursion
          item.list.forEach(function (li) {
            
  • assume it is an object
            var i = li;
            
  • if there is a string inside, read as path to new obj
            if (typeof li === 'string') {
              i = { 'path': li };
            }
            

extend with base (without .path)

            i = f.extend({}, f.omit(base, 'path'), i);
            
  • set path by joining those from base and i, if they exist
            i.path = path.join(base.path || '', i.path || '');
            
            res.push(i);
            
          });
          
  • link the result
          config[prop] = res;
          
        }
        else {
  • just put it into array
          config[prop] = [item];
        }
        
      }
    }
    
  • extend every input in list with project config
    config[prop].forEach(function (obj, i) {
     config[prop][i] = f.extend({}, project, obj);
    });

  });

3.3.1.2 output (optional)

  
  ['output'].forEach(function (prop) {
    
    var defaultTargets = mill.config.get('papermill:targets');
  
  • is there anything?
    if (config[prop] === undefined) {
      

if not, load default config (string)!

      config[prop] = mill.config.get('papermill:output_dir');
      
    }

    var item = config[prop];
  
  • is it a string?
    if (typeof item === 'string') {
      
  • use it as project.output_dir!
      config.output_dir = item;
      
  • item has no sub-folder
      item = { 'path': '' };
  • apply default targets
      defaultTargets.forEach(function (target) {
        item[target] = true;
      });
      
    }
  
  • is object/array?
    if (typeof item === 'object') {
  • is array?
      if (Array.isArray(item)) {

nothing to do!

      }
    
  • is object?
      else {
      
  • make a new list
        var list = [];
        

try to read path to output_dir

        if (item.path && typeof item.path === 'string') {
          config.output_dir = item.path;
        }
      
  • read base config, minus defaultTargets and path (for usage further down the tree)
      
        var base = f.extend({}, f.omit(item, 'path', defaultTargets));
        
  • loop each default target
        defaultTargets.forEach(function (target) {
                  
  • make empty result
          var res = {};
        
  • is there something?
          if (item[target] !== undefined) {
            
  • is it a boolean (true/false)?
            if (typeof item[target] === 'boolean') {
            
  • is it true?
              if (item[target] === true) {
              
  • use it, extended with the base settings
                res = f.extend({}, base, { 'target': target });
              }
              else {
  • it is false, abort!
                return null;
              }
            
            }
          
  • is it an object (and not an array)?
            if (typeof item[target] === 'object' && !Array.isArray(item[target])) {
  • try reading output_dir
              if (item.path) {
                config.output_dir = item.path;
              }
              
            }
            
          }
          
  • extend res with base and target
          res = f.extend({}, base, item[target], { 'target': target });
          
  • add obj to list (if target was disabled we have already aborted)
          list.push(res);
        
        });
        

if there is no output_dir found in config, apply default

        if (!config.output_dir) {
          config.output_dir = mill.config.get('papermill:output_dir');
        }
  • add new list to config
        config[prop] = list;
        
      }
    }
  });
  
  • callback with the inflated config
  callback(null, config);

}

3.3.2 deriveBuildJobs


function deriveBuildJobs(config, callback) {
  • multiply inputs with outputs
  • TODO: ??? read metadata from input.output?

  mill.dbug('inflated config:', config);
  • make empty result
  config.jobs = [];
  
  • For each item in input
  config.input.forEach(function (item) {
  • For each target in output
    config.output.forEach(function (target) {
      
  • make job by combinig config in order
      var job = f.extend(
        {},
        f.omit(target, 'path'),
        f.omit(item, 'path'),
        f.omit(item.output, 'path')
      );
      
  • handle paths
      job.input = item.path;
      job.output = target.path || ''; // can be empty if no sub-dir
      
  • load default pandoc config
      job = f.extend(
        {},
        mill.config.get('papermill:pandoc'),
        job
      );
      
  • load default config for targets 'print' and 'web'
      mill.config.get('papermill:targets').forEach(function (target) {
        
        if (job.target === target) {
          job = f.extend(
            {},
            mill.config.get('papermill:pandoc_targets:' + target),
            job
          );
        
        }
      
      });
      
  • internal option mapping (list of ['old','new'] items)
      mill.config.get('papermill:internal_config_mapping')
        .forEach(function (opt) {

If there is value for 'old'

          if (job[opt[0]]) {

set 'new' = 'old'

            job[opt[1]] = job[opt[0]];

delete 'old'

            delete job[opt[0]];
          }
        });
        

3.3.3 handle remaining unknown options

We need to safe already configured variable string or object.

      var vario = {};
      if (typeof job.variable === 'string') {
        vario[job.variable] = true;
      }      
      else if (typeof job.variable === 'object') {
        vario = f.extend({}, vario, job.variable);
      }      
      job.variable = vario;
      

unknownOptions is an array of all the job keys, minus the options known by pandoc.

      var unknownOptions = f.difference(f.keys(job), pandoc.OPTIONS);
      

Now loop over unknownOptions and rewrite them to pandoc-variables

      unknownOptions.forEach(function (v) {

(if there is something at all).

        if (job[v] !== 'undefined') {
          job.variable[v] = job[v] || true;
          delete job[v];          
        }
      });
      

process.exit()

Finally, we add the job to the list

      config.jobs.push(job);

    });
    
  });
    

callback with result

  callback(null, config);

}

3.3.4 Render

function render(build, callback) {
  
  mill.dbug('BUILD:', build);
  
  • make a tmp working directory
  temp.mkdir('mill', function(err, workingdir) {

    if (err) { return callback(err); }

Debug: don't use the temp dir

    if (mill.config.get('DEBUG:on')) {
      workingdir = mill.config.get('DEBUG:workingdir') || workingdir;
      fs.mkdirsSync(workingdir);
    }
    
    mill.dbug("Working directory:", workingdir)
    
  • build each job
    async.eachSeries(
      build.jobs,

      function(job, callback) {
        
  • construct path to original doc location
        var jwd = job.input;
        
  • has the job input a file extension?
        if (path.extname(jwd) !== '') {

then take just the dirname of it

          jwd = path.dirname(jwd)
        }
  • make full path
        jwd = path.join(build.path, jwd);

        mill.dbug('Job working directory', jwd);
        job.dataDir = build.path;
        

go there, for correct handling of relative paths

        try {
          process.chdir(jwd);
          mill.dbug('New job working directory', process.cwd());
        }
        catch (err) {
          return callback(err);
        }
        

3.3.5 copy assets

var assets = ["template", "css" ];

        var assets = [];
        
  • helper function to copy an asset
        function copyAsset(asset, basedir, workingdir, callback) {
          
          if (typeof asset === 'string') {
            

target: where to copy to

            var target = path.join(workingdir, path.basename(asset));
            
  • try to copy relative from workingdir…
            fs.copy(asset, path.join(workingdir, path.basename(asset)), 
              function (err) {
                                
                if (!err) {
                  callback(null, target);
                }
                
  • if not found,
                else {
                  

try from basedir

                  asset = path.join(basedir, asset);
                  
                  fs.copy(asset, target, function (err) {
                    callback(err || null, target);
                  });
                  
                }
                
              }
          );
          }

If asset is not a string,

          else {

there is nothing to do.

            callback(null, null);
          }
        }
        
        async.each(
          assets,
          function(asset, callback) {
                                    
  • 'css' can be an Array, loop it
            if (typeof asset === 'css' && Array.isArray(job[asset])) {
              
              async.each(job[asset], function(item, callback) {
                
                return copyAsset(item, build.path, workingdir, callback);
            
              });
            
  • normal items
            } else {
              
              return copyAsset(job[asset], build.path, workingdir, 
                function (err, res) {
                  
                  if (err) { return callback(err); }
                  
  • rewrite path to res job[asset] = res;
                  callback(null);
                  
              });
            }
            
          },
          function end(err) {

            if (err) { return callback(err); }
  • handle paths (make 'em full!)
            var filename = path.basename(job.input)
              .replace(path.extname(job.input), ''); // "/path/foo.bar" > "foo"

            job.input = path.join(build.path, job.input);
            job.output = path.join(workingdir, build.output_dir, job.output, filename);
            

make full paths for assets

            [ "bibliography", "csl", "template" ].forEach(function (item) {
              
              if (job[item]) {
                job[item] = path.join(build.path, job[item]);
              }
            
            });
            
            if (job.variable.target === 'print') {
              job.output = job.output + '.pdf';
            }
            if (job.variable.target === 'web') {
              job.output = job.output + '.html';
            }

            handleInputFiles(job, workingdir, function(err, res) {
              if (err) { return callback(err); }

              handleOutputFiles(job, function(err, res) {
                if (err) { return callback(err); }
  • finally, build it!
                mill.dbug('build.job:', job);
                pandoc(job, callback);

              });

            });

          }
        );

      }, function finishedBuilds(err) {

        if (err) { return callback(err); }

        var resultpath = path.join(workingdir, build.output_dir),
          outputpath = path.join(build.path, build.output_dir);

        fs.copy(
        resultpath,
        outputpath,
        callback);

      }
    );

  });

}

3.4 Helper functions

3.4.1 handle input files/directories

jandocs interprets directories as a list of files, but we want to combine them!

function handleInputFiles(job, workingdir, callback) {
    
  • basic checking: does anything exist?

  fs.exists(job.input, function (exists) {
  
    if (!exists) {
      return callback("Input does not exists!" + job.input);
    }
  

ok, but is it a directory?

    fs.stat(job.input, function (err, stats) {
      if (err) { return callback(err); }
      
      if (!stats.isDirectory()) {

It is NOT a directory, nothing to do!

        return callback(null, job);
      }
      else {

It IS a directory, we need to combine the files in order:

  • prepare the combinedInput and -File
        var combinedInput = '',
            combinedInputFile = path.join(workingdir, path.basename(job.input) + '.md');
  • get list of files
        fs.readdir(job.input, function (err, files) {
          
          if (err) { return callback(err); }
          

sort files

          files = files.sort();
          
  • read and combine all the files
          async.each(
            files,
            function loop(file, callback) {
              fs.readFile(
                path.join(job.input, file),
                { encoding:'utf8' },
                function (err, data) {
                  if (err) { return callback(err); }
                  

add file content to combinedInput, with 2 extra empty lines to protect headings and other block elements

                  combinedInput = combinedInput + data + '\n\n';
                  callback(null);
                  
                }
              );
            },
            function end(err) {
              

save the combinedInput to file

              fs.writeFile(combinedInputFile, combinedInput, function (err) {
                
                if (err) { return callback(err); }
                

set new input and return job

                job.input = combinedInputFile;
                callback(err || null, job);
                
              });
              
            }
          );
        
        });
      
      
      }
      
    });
  
  });
  
}

3.4.2 handle output files/directories

We just need to make shure that all paths where want to write exist in the working directory.

function handleOutputFiles(job, callback) {
  fs.createFile(job.output, callback);
}

4 readProjectConfig.js

A module to read a papermill project's configuration

# Setup

var fs = require('fs-extra'),
    path = require('path'),
    f = require('underscore'),
    mill = require('../mill-cli'),
    tool = require('./utils');

# Module

module.exports = function readConfig(directory, millconf, callback) {
  
  var dir = directory || process.cwd;
  millconf = millconf;
  
  1. Get the config files with a function which gives back an array of exisiting config files.
  getConfigFiles(millconf, dir, 
    
    function (err, configs) {
    
    if (err) { return callback(err); }
    
  1. Read the first found config and callback with it.
    readConfigs(configs, directory, callback);

  });

}

4.1 Worker functions

4.1.1 getConfigFiles

A function to search for configuration files:

function getConfigFiles(millconf, directory, callback) {
  

Search for all (configured) configuration file names.

  tool.searchFs("{" + millconf.CONFIGFILES + "}", directory, 
    
    function searchResults(err, files) {
      
      mill.dbug("Found config files: ", files);
            

If there was an error, or we got back an empty array, we callback with an error.

      if (err || !files.length) {        
        var msg = err || "No config file found!";
        return callback(new Error(msg), null);
      }
      

Otherwise, we convert all the list entries to full paths

      files.forEach(function (file, i) {
        files[i] = path.join(directory, file);
      });
      

and callback with the list of existing config files.

      return callback(null, files);
      
    }
    
  );
  
}

4.1.2 readConfigs

Function to read config file(s).

function readConfigs(list, directory, callback) {
    
  • prepare result objects
  var JSONfile,
      result;
  
  • clean the list: remove directories
  tool.cleanDirectories( list, 
    
    function (result) {
      

return error if the list is empty after cleaning

      if (!result.length) {
        return callback("No config file found!");
      }
      else {
      
  • pick the first config file from the remaining list
        JSONfile = f.first(result);
        mill.dbug("config file:", JSONfile);
      
  • safely read and parse the JSON file
        fs.readJson(JSONfile, function (err, data) {
          
  • check for error or missing data
          if (err || !data) {
            err = err || new Error('Could not read config!');
            callback(err);
          }
          
  • add the path and callback
          data.path = directory;
          
          mill.dbug("data", data);
          callback(null, data);
          
        });
        
      }

    }
  );
}

5 alias.js

var mill = require('../mill-cli');

Alias the appropriate commands for simplier CLI usage

mill.alias('o',   { command: 'output' });

6 utils.js

A module containing several utilities

# Setup

var fs = require('graceful-fs'),
    async = require('async'),
    glob = require('glob'),
    eyes = require('eyes'),
    tool = {};

6.1 Utilities

6.1.1 searchFs

function to search the filesystem (using glob module)

tool.searchFs = function (string, directory, callback) {
  
  glob(string, 
  
    {

"The current working directory in which to search.

      'cwd': directory,

no dotfiles

      'dot': false,

Add a / character to directory matches.

      'mark': true,

Perform a case-insensitive match. (…)"

      'nocase': true,
      'nonegate': true
    }, 
    
    function globResult(err, res) {
      callback(err || null, res || null);
  });
  
};

6.1.2 isNotDir

function to check if a path is not a directory

tool.isNotDir = function (path, callback) {
  fs.stat(path, function (err, stats) {
    if (err) {
      callback(err)
    }
    callback(!stats.isDirectory());        
  });
};

tool.isDir = function (path, callback) {
  dirCheck(path, true, callback)
};

tool.isNotDir = function (path, callback) {
  dirCheck(path, false, callback)
};

6.1.3 dirCheck

function dirCheck(path, bool, callback) {
  fs.stat(path, function (err, stats) {
    if (err) { callback(err); }
    
    res = !!stats.isDirectory();
    if (!bool) { res = !res; }
    
    callback(null, res);        
  });
};

6.2 readJSONfile

function to safely read JSON file from path in file system

tool.readJSONfile = function (path, callback) {
  
  fs.readFile(path, 'utf8', function (err, data) {
    

catch file read error and callback with it

    if (err) { return callback(err); }
 

carefully parse the JSON data

    try {
      result = JSON.parse(data);
    } catch (err) {

catch any parsing error and callback with it

      return callback(err);
    }

callback with no error and the result

    callback(null, result);

  });
  
}

6.3 cleanDirectories

function to clean directories from a list of paths

tool.cleanDirectories = function (list, callback) {

run isNotDir for every item in list, then callback with result.

  async.filter( 
    list,
    function (item, callback) {
      tool.isNotDir(item, function (err, res) {
        callback(res);
      })
    },
    callback );
};

6.4 Attaching

  • exports.attach gets called by broadway on app.use
tool.attach = function (options) {
  
  var app = this;
  

Here, we attach some functions directly to the app object.

fail() an app with err, can be 'string' or 'Error':

  app.fail = function (error, data, cb) {
    
    var meta, callback;
    

no data? use as callback.

    if (typeof data === 'function') {
      callback = data;
    } else {
      meta = data;
    }  
    
    
    if (error.constructor.name !== 'Error') {
      
      if (typeof error === 'string') {
        error = new Error(error);
      }
      else {
        meta = error;
        error = new Error("");
      }
    }
    

no meta? use as callback.

    if (typeof meta === 'function') {
      callback = meta;
      app.log.error(error);    
    } else {
      app.log.error(error, meta);
    }  
    return callback(error);
    
  };

  app.dbug = function (string, data) {
    
    if (app.DEBUG) {
      app.log.debug(string);
      if (app._NOCOLORS) {
        console.dir(data || null);        
      }
      else {
        eyes.inspect(data || null);        
      }
    }
    
  };

};

exports.init gets called by broadway on app.init,

tool.init = function (done) {

but this plugin doesn't require any initialization steps)

  return done();
};

export tool as module

module.exports = tool;

  1. And btw it is just copied from jitsu's cli.