Sur la toile

Table of Contents


Easier Decision-Making: Conduct Experiments By Leo Babauta

The science of time perception: stop it slipping away by doing new things

Buufer : A blog about productivity, life hacks, writing, user experience, customer happiness and business.

series of articles about the challenges of growing an organization

Record Your Terminal Share it with no fuss

The Importance of Scheduling Nothing by Jeff Weiner CEO at LinkedIn

The tech industry is a meritocracy. We hire people based on their skills alone.

Your life is too short and too valuable to fritter away in work.

how to increase productivity with evernote

You have to earn the right to be heard about what you do and what you want to accomplish. People really don’t care about what you do until they know that you care about what they do. Secret #1: Assume the burden of other people’s discomfort Secret #2: Give and expect nothing in return Secret #3: Be proud of who you are Secret #4: Compliment early and often Secret #5: Look for common ground immediately Secret #6: Tap your sphere of influence cautiously Secret #7: Do not keep your personal and professional lives separate Secret #8: Pull—never push Secret #9: Include social media into your networking Secret #10: Lose control of your marketing

Tips and tricks for conf attendee

If a hallway conversation stalls, ask what they're working on. Discover the project they're passionate about. @chrishouser

  1. take notes in Emacs/Vim/etc.
  2. meet people and go to the evening events and such to have good conversations.


It helps tremendously to read up on the topics before a presentation, so that you aren't entering cold. @AustinTHaas

take notes. revisit the notes. read them again. then write about them. @darevay

My two rules: leave the laptop at hotel and introduce yourself to everyone. It opens so many doors to learning. @jackdanger

the talks will be great, but the hallway conversations are better. be aggressive about meeting people. Take advantage of the face time. @jimduey

Talk to people about the presentations during the breaks. Meeting people is the most rewarding part. @ericnormand


The easiest way to teach yourself C++ in 21 days

Estimates in Software Development. New Frontiers.

mumble Low-latency, high-quality voice communication for gamers

property-based testing

Writing tests first forces you to think about the problem you're solving. Writing property-based tests forces you to think way harder.

Gerrit web based code review system


Marelle logic programming (prolog) for devops

openstack thoughts by Alex Gaynor

cloudmonkey : command line interface for Apache CloudStack

The Cloudcast: From DevOps to Private PaaS

Ansible is the easiest way to deploy, manage, and orchestrate computer systems you've ever seen

Monitoring setup of amara (riemann , graphite better than nagios)

configuration with clj

Two ways: one is to use .clj data files on the classpath and take advantage of the fact that different profiles put different resources directories on the classpath. This is the approach taken by Carica ( and works great if you have complex config with nested values.

The other approach is to use environment variables; the best tool for that is Environ:

bug tracker with squash better than splunk ? send data to graphite , librato , gnaglia or graylog ?


Big Data : nouvelle étape de l’informatisation du monde

converstion with Alan Kay



[Kultpfunzel: Kult=cult. Funzel=dim light.] macht hell & ist hackable

school for poetic computation

A Hardware Accelerated Regular Expression Matcher

How-to build your own GPS Receiver

ASM Embedded CPU FORTH Verilog Spartan 3 FPGA C++ Raspberry Pi

Dual Boot Windows/Android 2.2 Tablet Straight Out Of Shenzhen

Tech Preis Vergleich


delete from line 3 up to and including first blank line: sed 3,/^$/d filename

Major Linux Vs UNIX Kernel Differences

Read-only Guest tmux Sessions

FEstival To build own voice

  • grep . *.txt
  • more * | cat
  • "fmt -1" (split lines into individual words)
  • gnt-job list | egrep –color=always 'running|waiting'


Source Multiplayer Networking for multi-players games

Programming Distributed Computing Systems A Foundational Approach By Carlos A. Varela

The network is reliable by aphyr

network partition

Amazon Dynamo Paper. It has some very interesting concepts, but ultimately fails to provide a good balance of reliability, performance and cost.

Web Dev

Static sites are fast, secure, easy to deploy, and manageable using version control

check elm-lang presentation

erlang web framework : ezwebframe

aloha webserver on top of netty

slides in HMTL5

complete web site example clojars

headless web testing with Gecko


distributed programming

Call me maybe Series by Aphyr : zookeeper kafka cassandra nuaDB


Learn [clojure|elixir|go|haskell|…] in Y minutes

Free books on Computer Science by @okalotieno

The Original 'Lambda Papers' by Guy Steele and Gerald Sussman

The Anti-Human Consequences of Static Typing by Jay McCarthy

Teach Yourself Programming in Ten Years

Name your arguments by Jamie Wong

Learnable Programming : Designing a programming system for understanding programsBret Victor

Part of a series exploring Concepts, Techniques, and Models of Computer Programming.

The Definitive Reference To Why Maybe Is Better Than Null (error handling)

Bertrand Meyer's blog

tern : editor-independent static analysis engine in javascript

Go and Rust — objects without class

Bloom language Testing distributed systems by Neil Conway

Design patterns -> theorems -> Language & tool support "When I see patterns in my programs, I consider it a sign of trouble… a sign that I'm not using abstractions that aren't enough powerful" Paul Graham Consistency As Logical Monotinicity

Brian McKenna blog

Turing complete

The notion of Turing-completeness does not apply to languages such as XML, JSON, YAML and S-expressions, because they are typically used to represent structured data, not describe computation.

The fruits of misunderstanding by prof.dr.Edsger W.Dijkstra

How anthropomorphism and analogies make concepts in computer programming harder to understand:

Erlang has very cheap threads now you can use concurrency as a control structure very close to object oriented programming and dynamic dispatch. what the Reactive framework is, it’s just the continuation monad … it is the observer observable is the dual of enumerable enumerator

Joe Armstrong on languages

What would I recommend learning?

  • C
  • Prolog
  • Erlang (I'm biased)
  • Smalltalk
  • Javascript
  • Hakell / ML /OCaml
  • LISP/Scheme/Clojure

A couple of years should be enough (PER LANGUAGE).

Notice there is no quick fix here - if you want a quick fix go buy "learn PHP in ten minutes" and spend the next twenty years googling for "how do I compute the length of a string"

The crazy think is we still are extremely bad at fitting things together - still the best way of fitting things together is the unix pipe

find … | grep | uniq | sort | …

and the fundamental reason for this is that components should be separated by well-defined protocols in a universal intermediate language.

Fitting things together by message passing is the way to go - this is basis of OO programming - but done badly in most programming languages.

If ALL applications in the world were interfaced by (say) sockets + lisp S expressions and had the semantics of the protocol written down in a formal notation - then we could reuse things (more) easily.

Today there is an unhealthy concentration on language and efficiency and NOT on how things fit together and protocols - teach protocols and not languages.



Rust is a curly-brace, block-structured expression language. It visually resembles the C language family, but differs significantly in syntactic and semantic details. Its design is oriented toward concerns of “programming in the large”, that is, of creating and maintaining boundaries – both abstract and operational – that preserve large-system integrity, availability and concurrency. It supports a mixture of imperative procedural, concurrent actor, object-oriented and pure functional styles. Rust also supports generic programming and metaprogramming, in both static and dynamic styles.

Learning How To Learn Programming

from Van Roy and Haridi's book

data parallezisation : incremental datalog computation Naiad is an investigation of data-parallel dataflow computation in the spirit of Dryad and DryadLINQ, but with a focus on incremental computation. Naiad introduces a new computational model, differential dataflow, operating over collections of differences rather than collections of records, and resulting in very efficient implementations of programming patterns that are expensive in existing systems.

var text.SelectMany(x => x.Split(' ')) .Count(y => y, (k, c) => k " : " c) .subscribe(l => {foreach (var element in l) Console.writeLine(element)})

concurrency / parallelism

Concurrency is a property of the algorithm that you are designing. It determines which parts of your data-processing logic are intrinsically independent (under all inputs and circumstances).

Parallelism is a property of the realization of your algorithm. This is not your source code, but the final executable or — even more abstractly — the behavior of your program when executed.

creative coding

Julia Buntaine‘s artwork provides conceptual footholds for issues in neuroscience

phenomenon of creative computing

creator of Processing : Casey Reas


hardware : kinect (detect human motion windows-based) arduino, touchOSC, Monome, Leap motion ()

Generative Art Matt Pearson. / Learning Processing: A Beginner's Guide to Programming Images, Animation, and Interaction Daniel Shiffman.

Algorithms for Visual Design Using the Processing Language Kostas Terzidis


Adopting Ideas from Erlang and Clojure for a Highly Concurrent, Simple and Maintainable Application

RiconEast distributed system

great explanation of concurrency concepts in clojure at the End

  • CAS semantics : Atom
  • Coordinated change inside a transaction : ref

probabilistic programming

probabilitic programming in clojure by Nils Bertschinger


Haskell from C: Where are the for Loops?

School of Haskell

anatomy of programming language

Programming in Haskell, Graham Hutton



Go on App Engine: tools, tests, and concurrency by The Go Blog

The examples from Tony Hoare's seminal 1978 paper "Communicating sequential processes" implemented in Go.


Learn Python The Hard Way

recognizing numbers

>>> from sympy import * >>> nsimplify(4.242640687119286) 3*sqrt(2)

redo: a top-down software build system

Writing clean, testable, high quality code in Python


Applicatives are too restrictive, breaking Applicatives and introducing Functional Builders

Designing scala librairies (slides)

Ztream is POC P2P-assisted Web music streaming built with WebRTC, Media Source API, AngularJS, Play, ReactiveMongo

easy to write MapReduce jobs in Hadoop on top of cascading

Gabbler, a Reactive Chat App – part 2 by hseeberger

Abstract Algebra for Scala

approximate set size (in much less memory with HyperLogLog), approximate item counting (using CountMinSketch)

Jscala blog

Programmer Fast Track in Atomic Scala book


Roundup of HTML-Based Slide Deck Toolkits

json editor

JavaScript Library for Mobile-Friendly Interactive Maps


Dijkstra's Algorithm as a Sequence (clojure implementation)

Create perfect maze : Eller's Algorithm

Implementations of Monoids for interesting approximation algorithms, such as Bloom filter, HyperLogLog and CountMinSketch

Multivariate Change of Variables in Integration Theorem (MCVIT, that’s a mouthful

Math ∩ Programming A place for elegant solutions @cgrand implementation

Exponential decay of history is a pattern that competes with ring-buffers, least-recently-used heuristics, and other techniques that represent historical information in a limited space.


Data Driven: The New Big Science

Topologic Data Analysis , NBA example (Ayasdi)

Probability (Theory) Tutorials by Noel Vaillant

Classical Mechanics: A Computational Approach by Jack Wisdom Gerald Jay Sussman

Counting selections with replacement ((n k))

The theorems of Frobenius and Suzuki on finite groups by Terence Tao

The Probabilistic Method : How many lights can you turn on?

Blog I wasnt prepared to work

Math Primer for programmers

Math with Bad Drawings : blog

Graph Partitioning and Expanders

algorithms for graph partitioning and clustering, constructions of expander graphs, and analysis of random walks

blog Norman Wildberger

The Life and Times of the Central Limit Theorem (History of Mathematics) William J. Adams

divine proportion

The new form of trigonometry developed here is called rational trigonometry, to distinguish it from classical trigonometry, the latter involving cos θ, sin θ and the many trigonometric relations currently taught to students. An essential point of rational trigonometry is that quadrance and spread, not distance and angle, are the right concepts for metrical geometry (i.e. a geometry in which measurement is involved).


OSCON 2013: Carin Meier "The Joy of Flying Robots with Clojure" with roomba, drone


APIs: The Future Is Now

Category theory in practice

Of Algebirds, Monoids, Monads, and Other Bestiary for Large-Scale Data Analytics

Algebra for Analytics by P. Oscar Boykin

category theoretic approach to optimizing MapReduce-like pipelines

You Could Have Invented Monads! (And Maybe You Already Have.)

Distributed System

The Raft Consensus Algorithm

Distributed Systems Archaeology: Works Cited by Michael R. Bernste


event sourcing for functional programmers

RabbitMQ on the cloud AWS

Rabbit farms is a standalone service for publish RabbitMQ messages

Rabbitmq vs. kafka

  • but clearly large amounts of persistent messages sitting in the broker was not the main design case for AMQP in general."
  • (It's contrasted with Kafka, which is "designed for holding and distributing large volumes of messages"
  • longer-lived work queues are really more of a Hadoop thing, not an in-memory queue thing

Use Kafka if you have a fire hose of events (100k+/sec) you need delivered in partitioned order 'at least once' with a mix of online and batch consumers, you want to be able to re-read messages, you can deal with current limitations around node-level HA (or can use trunk code), and/or you don't mind supporting incubator-level software yourself via forums/IRC.

Use RabbitMQ if you have messages (20k+/sec) that need to be routed in complex ways to consumers, you want per-message delivery guarantees, you don't care about ordered delivery, you need HA at the cluster-node level now, and/or you need 24x7 paid support in addition to forums/IRC.

An Express + based chat app that uses Redis as session store & RabbitMQ for PubSub

AMQP resources

AMQP resources:

Servers: RabbitMQ (Rabbit Technologies, Erlang/OTP, MPL) - ZeroMQ (iMatix/FastMQ/Intel, C++, GPL3) - OpenAMQ (iMatix, C, GPL2) - ActiveMQ (Apache Foundation, Java, apache2) -

Steve Vinoski explains AMQP in his column, Towards Integration

John O'Hara on the history of AMQP

Dmitriy's presentation on RabbitMQ/AMQP

ZeroMQ's analysis of the messaging technology market

Pieter Hintjens's background to AMQP

Barry Pederson's py-amqplib

Ben Hood on writing an AMQP client

Dmitriy Samovskiy introduces Ruby + QPid + RabbitMQ

Ben Hood's as3-amqp

RabbitMQ's protocol code generator

Erlang Exchange presentation on the implementation of RabbitMQ

Jonathan Conway's series on RabbitMQ and using it with Ruby/Merb

Open Enterprise's series on messaging middleware and AMQP

Messaging and distributed systems resources:

A Critique of the Remote Procedure Call Paradigm

A Note on Distributed Computing

Convenience Over Correctness

Metaprotocol Taxonomy and Communications Patterns

Joe Armstrong on Erlang messaging vs RPC

SEDA: scalable internet services using message queues

A Node.js app that shows the power for RabbitMQ's Work-queue



Data NLP

Natural Language Toolkit for python

The World's Best Grammar Checker

Open Data

Our aim is to track every government financial transaction across the world

Digital Public Library of America

Europe : think culture

FORMA Forest Monitoring for Action project in cascalog

GDELT Global Data on Events, Location and Tone : data for historians

Thoughts on GDELT

Data tells you whether to use A or B. Science tells you what A and B should be in the first place.

Politis Data : Militarized Interstate Disputes


financial, economic and social datasets

The Free Wiki World Map

The MNIST database of handwritten digits

The Harvard Dataverse Network social science research data

dataset contains contains 1,362,109 reviews of Amazon products try: import joblib except ImportError: from sklearn.externals import joblib

data = joblib.load("amazon7.pkl") X = data["X"] y = data["y"] print X.shape print y.shape print data["categories"]

Data Mining Community's Top Resource kdnuggets

the industry's online resource for big data practitioners

5-part video series: Exploring the @IBMbigdata #BigData Accelerator for Machine Data #Analytics

data analytics stories blog

linked data RDF book

LDB: The BigData In-Memory database built with Erlang, C and LISP

Fogus references about events and history ariadne

[Out of the Tarpit]( by Marks and Moseley

[Fundamental concepts of plugin infrastructures]( by Eli Bendersky

[Jess in Action]( by Ernest Friedman-Hill

innovative data companies

Operations-improver Splunk Tech-trend tracker Quid Data scientist tournament host Kaggle Credit rating revolutionary ZestFinance Electronic medical record streamliner Apixio Business intelligence visualizer Datameer Marketing modeler BlueKai Enterprise social media simplifier Gnip Brick-and-mortar customer analyzer RetailNext Compliance catalyst Recommind

Supersonic is intended to be used as a back-end for various data warehousing projects

Supersonic is an ultra-fast, column oriented query engine library written in C++. It provides a set of data transformation primitives which make heavy use of cache-aware algorithms, SIMD instructions and vectorised execution, allowing it to exploit the capabilities and resources of modern, hyper pipelined CPUs. It is designed to work in a single process.

financial dataset

search dataset Open data @CTIC

Data Analysis

Transportation optimization starts with math –> understanding human behavior.

A Statistical Analysis of Nerf Blasters and Darts By Shawn O'Neil

videos from datagotham conference

The Dangers of Overfitting or How to Drop 50 spots in 1 minute

implementation for a Restricted Boltzmann Machine and a Deep Belief Network

Mobile Phone Data Proves Humans Are Predictable During Chaos

inclass challenge

Data API

Data Computing

Play Framework Grid Deployment with Mesos


How to write a crawler by Emanuele Minotto

Quick tour of hive pigh data scientists tools via hortonworks

Evolutionary Computing with Push

ETL tools

  • AMPLab – Mesos, plus BDAS Berkeley Data Analytics Stack
  • Cascading/Cascalog/Scalding, not limited to Hadoop since other topologies are possibles;
  • Twitter – Summingbird, Storm, etc.;
  • Facebook – Presto;
  • Anaconda/IPython/Pandas;
  • Actian/ParAccel/Knime,

Mesos framework for long running services


History, patterns and future of Scalding by P. Oscar Boykin

Why all this interest in Spark? by Denny Lee

Python library for dealing with messy tabular data in several formats, guessing types and detecting headers.

Stream summarizer and cardinality estimator in java

hRaven collects run time data and statistics from MapReduce jobs in an easily queryable format

Open Platform for Visual Analytics

cascading Paco Nathan

"That workflow abstraction is important. For example, PMML has excellent features for ensembles and other complex patterns encountered in the more competitive areas of industry."

Introduction to Data Processing with Python

Building a Classification Framework with Hive and Python

how twitter uses nosql : FlockDB pig

DEAP is a novel evolutionary computation framework for rapid prototyping and testing of ideas.

Big Data Cloud Classes by Bill Howe

mrjob : Run MapReduce jobs on Hadoop or Amazon Web Service

A set of tutorial codes about matrix methods in Hadoop with mrjob

Implementation of some deep learning algorithms (python C) build on top of cudamat

Trident-ML is a realtime online machine learning library built on top storm

map-reduce algorithms explained slides

Heka, a tool for high performance data gathering, analysis, monitoring, and reporting

Hacking Redis (data structure server): Adding Interval Sets

R integration in strom

HP research : Presto Distributed R for big data

Serengeti to enable the rapid deployment of Hadoop clusters on a virtual platform.

Large Scale Math with Hadoop MapReduce @hortonworks

Twitter search use case : storm + kafka + Mechanical Turk

Hadoop and the Data Warehouse: When to Use Which

linkedin archtecture : kafka , hadoop , voldemort , nodejs

Saddle is a data manipulation library for Scala

Data Science

Text Feature Extraction (td-idf) part-2 by Christian S. Perone

Estimating User Lifetimes : pyMCMC by Cam Davidson-Pilon @cmrndp

Towards Linked Statistical Data Analysis

alternating direction method of multipliers is well suited to distributed convex optimization

3 Big Data Tech Talks You Can’t Miss by Christos Faloutsos Deepak Agarwal Jay Kreps

Block Coordinate Descent Algorithms for Large-scale Sparse Multiclass Classification by Mathieu Blondel

Machine Learning in python : blog

The World’s Top 7 Data Scientists before there was Data Science

The Multi-Armed Bandit Problem with examples and visualization

Recommendation System

myrrix succesor of mahout ?

java -Dmodel.features=100 -Dmodel.als.lambda=2 -Xmx512m -jar myrrix-serving-1.0.1.jar –port 8080

How Hacker News ranking algorithm works in Paul raham lisp

Deconstructing Recommender Systems : Amazon and Netflix use cases

Deep Learning

Recent Developments in Deep Learning

Deep Neural Networks for Speech and Image Processing

Deep Learning tutorial

Graph / Network

Apache Giraph : scalable iterative graph processing system open-source counterpart to Pregel

Probabilistic Data Structures for Web Analytics and Data Mining

loglog counting , Frequency Estimation: Count-Min Sketch, Heavy Hitters: Stream-Summary Range Query: Array of Count-Min Sketches Membership Query: Bloom Filter

Why every statistician should know about cross-validation

Truthy is a research project that helps you understand how communication spreads on Twitter

AI web-site about agent, neural network, genetic algo

extensive list of SVM tutorials

clustering with Neural Networks : Kohonen's Self Organizing Feature Maps

Andrew Ng - Machine Learning via Large-scale Brain Simulations

masters of machine learning "The Large Scale Learning class"

introduction: nline linear learning: Lecture 2 2nd order methods and analysis of convergence: Demos in Torch BFGS and Limited Storage BFGS: Lecture 3 Online learning for non-linear/non-convex models: Boosted decision trees (guest lecture by Tong Zhang) Example code in R Lecture 4 Hadoop All-Reduce; Lecture 5 Torch tutorial; torch basics; machine learning tutorial; video CUDA tutorial (by Matthew Zeiler); PDF part 1; PDF part 2; video; video for Torch 7 CUDA demo Lecture 6 Feature learning, representation learning; Lecture 7 Feature learning, deep learning;

Lecture 8 Inverted Indicies and predictive indexing, hashing Project ideas description video John Langford's projects Xiang Zhang's projects Yann LeCun's projects Lecture 9 The ad problem, advertising placement and such (guest lecturer: Leon Bottou, Microsoft Research) Lecture 10 Classic and advanced bandits (John Langford) Lecture 11 Counterfactual reasoning (Leon Bottou) Advanced topics (John Langford) Lecture 12 Active Learning, Indexing (John Langford) Slides: PDF Video Lecture 13 Deep Learning in Text and Speech Recognition Lecture 14 : Many Classes, Logarithmic-Time Prediction

Analyze Text Similarity with R: Latent Semantic Analysis and Multidimentional Scaling

Project: Supervised Classification for Sentiment Analysis

Le Macroscope by Joal Rosnay

What are the Top 10 Problems in Machine Learning for 2013?

Churn Prediction , Sentiment Analysis, Truth Veracity ,Recommendations, online Ads, News Aggregations, Scalability , Content Discovery/Search Inteligent learning , medicine

Classifying Websites with Neural Networks

Numerical optimizers for Logistic Regression in python : Trust Region better than BFGS

Great introduction of macho

CART explained with R as laternatives to logistic regression

real experiment using conditional probabilities

Concordance and Discordance in Logistic Regression

Machine learning in-depth tutorial based on scikit-learn

Naives Bayes for sentiment analysis

Support Vector Machine in PHP

Deep Unsupervised learning with sparse filtering applied to Kaggle : Black Box

Job salary prediction at Kaggle resolved with logistic regression

Best Open Source Data Mining Software : Weka Orange RapidMiner Knime JHepWork

Data Mining: Practical Machine Learning Tools and Techniques by Hall Witten Frank

Data science blog @CmrnDP

Neuro Science @coursera based on :

Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems by Peter Dayan and Larry Abbott

Temporal Networks by Petler Holme

netsci conference :

Scalable Machine Learning by Alex Smola

gradient descent blog from Daniel Duckworth

Yurii Nesterov established the Accelerated Gradient Method

Microsoft Focus in France on Machine Learning

berkeley intro data science course : material

30 Most Influential Data Scientists on Twitter

Conditional (Partitioned) Probability — A Primer

de Bruijn Graphs for Genome Assembly

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems

Random topics on optimization, probability, and statistics. By Sébastien Bubeck

Slides: The Evolution of Regression [Part 1] from @salfordsystems

data mining book in python

blog Neal Richter

Understanding the Bias-Variance Tradeoff

Accurately Measuring Model Prediction Error

Top-down particle filtering for Bayesian decision trees

Paul Lam cascalog data scientist incanter for the future

get dataset

data tools as unix tools

Thoughts on Statistics and Machine Learning

overview of cascading (Paco Nathan) scalding from Chicago Hadoop User Group


finance use case : Minimal variance asset allocation for Stocks ISA


RethinkDB to store JSON documents

MongoDB basics for everyone – Part 5 Using find() and findOne()

Distributed Algorithms in NoSQL Databases

data consistency , data placement, system coordination

Inside HyperLevelDB : makes LevelDB faster

Building Cloud Storage Services with Riak

atomic commit explained :

Next Generation Databases

google F1 The Fault-Tolerant Distributed RDBM

Data Visualization

Sparpaket des Kantons Bern visualisiert by Thomas Preusse und Oleg Lavrovsky

Python interactive visualization library for large dataset based on

Twitter hastags viz by QuatarComputingResearchIinstitute

JS data projects from okfnlabs

Recline.js : relax with your data

GED VIZ is a new online-tool for visualizing complex economic relations

Jason Davies's blog

financial map viz : between map and flowchart


GDAL - Geospatial Data Abstraction Library : translator library for raster geospatial data formats

Best maps tools

  • leafletjs
  • mapbox
  • polymaps
  • maptales
  • modestmaps
  • INTERACTIVE WORLD MAPS worldpress plugin
  • zeemaps
  • Kartograph

Compare Urban Life Around the Globe With New Side-by-Side City Maps

Sattelite raster

But remember this started with vector tiles. And the vector tiles are in the Mercator projection. It’s much harder to take Mercator tiles and reproject them to a different projection because you don’t know which tiles are visible.

Hard problems like this are Jason Davies’ bread and butter. Jason saw the above examples and set out to determine which tiles would be visible in an arbitrary projection. He then created the above visual demonstration of his algorithm. The red tiles are the ones that are visible, and as you zoom in and out, you can see it recalculate the set of needed tiles instantly.

How to Map Where You've Mapped in OpenStreetMap with tilemill

data.stories on maps with Mike Migurski

modestmaps js library for maps made by stamen team

Stamen a design and technology studio in San Francisco maps and data visualization. The next most obvious thing.

help maps to be better by Michal Migurski

Convert Address to long,lat

satellite maps explained by MapBox

Jerome Cukier's blog : communicating with data

the prefuse visualization toolkit

Brett Victor videos

NBA stats vizu

@wardnyt Sports Graphics Editor Matthew Ericson @mericson Deputy Graphics Director at The New York Times New York, NY · Jeremy White @blueshirt Graphics editor for The New York Times, while also pursuing a PhD in geography with an emphasis on interactive cartography New York City ·

The Art of Data Visualization by Edward Tufle Data Visualization History goes together with Science history ( Maps, Galileo …)

Creating a hexagonal cartogram by Ralph Straumann

A (personal) blog of data sketches from the New York Times Graphics Department

Visualize big graph data by mathieu-bastian

Viz example : Location of Every Photo From the InternationalSpaceStation

Vega is a visualization grammar, a declarative format for creating, saving and sharing visualization designs.

Nathan Yau Data Points Visualization that Means Something

Functional Art : An introduction to information graphics and visualization by Alberto Cairo f

Languages usage in github


A streaming parser for the ESRI Shapefile spatial data format

simple console for learning and experimenting with d3.js data nesting.


Climbing the d3.js Visualisation Stack : rCharts cubism …

D3 gallery with description

UTM zones with D3.js

plotting the sensors in my Android phone with d3.js and three.js

Binify + topoJSON + D3 = How to create awesome binned hexagon maps

online book "Interactive Data Visualization for the Web"

Handbook of Graph Drawing and Visualization , Roberto Tamassia


install gdal to be able to convert shape file into GeoJSON : ~/dev/misc/gdal-1.9.2/apps/ogr2ogr -f GeoJSON -where "isoa2 = 'CH' AND SCALERANK < 8" chplaces.json ~/tmp/ne10m/ne10mpopulatedplaces.shp to get shapefiles :

constraint programming

Using JuMP to Solve a TSP with Lazy Constraints

medium-level constraint modelling language student job :

Pierre Schaus operational research in scala

programmation par contraintes rencontres


persistent database in core.logic

Path expressions through graphlike structures for clojure using core.logic

Applicative logic meta-programming using Clojure's core.logic against an Eclipse workspace


top tuples per group by Nathan Marz!msg/cascalog-user/ih8yqyCqiT4/SqSeez15TBsJ

Usage of name-vars

(?- (stdout) (c/first-n (name-vars age ["?person" "?age"]) 10 :sort "?age" :reverse true))

The name-vars portion is necessary because the age dataset is just a vector without named fields.

clj pratice

Prismatic's Engineering Practices

clj programming

[ANN] riddley: code-walking without caveats!topic/clojure/a68aThpvP4o

riddley.walk> (walk-exprs number? inc '(let [n 1] (+ n 1))) (let* [n 2] (. clojure.lang.Numbers (add n 2)))

how to write a correct macroexpand-all (which requires a code walker) in Common Lisp:

clj math

clj matrix with 2 implementions and native BLAS

clj API

fold unfold : deep-merge

I think functions like this become pretty clear if you pull out 'unfold' and 'fold' utilites, like:

Their 'flatten' generates a seq [path value] pairs, and 'unflatten' turns that back into a map. With these, you can write your functions

(defn to-map [kv-seq] (into {} kv-seq)) ;; utility

(defn flatten-map [m kf vf] (->> m flatten (map (fn ks v [(kf ks) (vf v)])) to-map))

(defn mapf [m f & args] (->> m flatten (map (fn ks v [ks (apply f v args)])) unflatten))

(defn deep-merge-with [f & ms] (->> ms (map flatten) (map to-map) (reduce (fn [res m] (merge-with f res m))) ;; could use 'partial' unflatten))

(defn deep-merge [a b] (deep-merge-with (fn [x y] y) a b))

;; bonus: also useful for fns that don't return a map (defn max-depth [m] (->> m flatten (map (comp count first)) (apply max 0)))

deep merge!topic/clojure/UdFLYjLvNRs

(defn deep-merge "Recursively merges maps. If keys are not maps, the last value wins." [& vals] (if (every? map? vals) (apply merge-with deep-merge vals) (last vals)))

Destructuring can expressions as key

(let [{x (+ 1 1)} {2 "two"}] x)

arrows to compare with new threading macros as-> some-> cond->


(into {} (for …)) (defn keywordize-keys "Recursively transforms all map keys from strings to keywords." {:added "1.1"} [m] (let [f (fn k v] (if (string? k) [(keyword k) v] [k v]))] ;; only apply to maps (postwalk (fn [x] (if (map? x) (into {} (map f x)) x)) m))

some clj patterns


(set (mapcat #(… …)

monadic bind in the set monad ?

(set (apply concat (for […] […])))

(defn union-of [colls] (reduce into #{} colls))


(into {} (map #(vector …)))

fmap in the hash-map functor ?

remove empty?

(filter seq …)

load optional dependency (require 'cheshire.core) (apply (ns-resolve (symbol "cheshire.core") (symbol "parse-stream")) args)

multimethod usage on config file

(memfn getPath) instead of #(.getPath %) (defmulti load-config (comp second (partial re-find #"\.([^..]*?)$") (memfn getPath)))

(defmethod load-config "clj" [resource])

load properties file

(into {} (doto (java.util.Properties.) (.load (-> (Thread/currentThread) (.getContextClassLoader) (.getResourceAsStream "")))))

reduce + lazy seq : blow up ?!topic/clojure/0pcSxK9reSc

user> (defn test1 [coll] (reduce + coll)) user> (test1 (take 10000000 (iterate inc 0))) 49999995000000 user>

Now if we do:

user> (defn test2 [coll] [(reduce + coll) (reduce + coll)]) user> (test2 (take 10000000 (iterate inc 0))) OutOfMemoryError Java heap space [trace missing]

Clojure has a feature called locals clearing, which sets 'coll to nil before calling reduce in test1, because the compiler can prove it won't be used afterwards. In test2, coll has to be retained, because reduce is called a second time on it.


Like merge-with, but merges maps recursively, applying the given fn only when there's a non-map at a particular level.

(deepmerge + {:a {:b {:c 1 :d {:x 1 :y 2}} :e 3} :f 4} {:a {:b {:c 2 :d {:z 9} :z 3} :e 100}}) -> {:a {:b {:z 3, :c 3, :d {:z 9, :x 1, :y 2}}, :e 103}, :f 4}


Improving your Clojure code with core.reducers

reducers (defn reverse-conses ([s tail] (if (identical? (rest s) tail) s (reverse-conses s tail tail))) ([s from-tail to-tail] (loop [f s b to-tail] (if (identical? f from-tail) b (recur (rest f) (cons (first f) b))))))

(defn seq-seq [f s] (let [f1 (reduce #(cons %2 %1) nil (f (reify clojure.core.protocols.CollReduce (coll-reduce [this f1 init] f1))))] ((fn this [s] (lazy-seq (when-let [s (seq s)] (let [more (this (rest s)) x (f1 more (first s))] (if (reduced? x) (reverse-conses @x more nil) (reverse-conses x more)))))) s)))

(defmacro seq->> [s & forms] `(seq-seq (fn [n#] (->> n# ~@forms)) ~s))

(take 2 (seq->> (range) (r/map #(str (doto % prn))) (r/take 25) (r/drop 5)))


clj concurrency

promise future agent channels by tbc++ Timothy Baldrigde!topic/clojure/e6Tg4wXLcug

promise - creates a object that can be deref'd. The result of the promise can be delivered once, and deref-ing a undelivered will cause the deref-ing thread to block. A single producer can give a single value to multiple threads

future - just like a promise, but it the delivering code is given to the future and the future will go off and execute that code in a different thread. Single producer delivers a single value produced in a undefined thread, to multiple consumers

agents - couples a unbounded queue of functions with a single mutable value. Mutating that value is accomplished by enqueue'ing functions to be executed against that mutable state. Multiple producers use functions to modify a mutable ref. Can be deref-ed by may different consumers

channels - allow multiple producers to provide data to multiple consumers on a one-to-one basis. That is to say, a single value put into a channel can only be taken by a single consumer. However, multiple values can be inflight at a single time. This is all delivered by a bounded queue (notice the difference with unbounded agents). This allows for back-pressure, where slow producers can block faster consumers. So perhaps the best way to think about channels is a bounded mutable queue of promises

What is a "state monad binding plan" (referring to code in core.async)!searchin/clojure/core.async/clojure/soewFCS8dAI/kaJ09e_eA7gJ


CSP is Responsive Design by David Nolen

100k independent go blocks all running at the same time

Hoare examples implemente with core.async

core.async: communicating termination!topic/clojure/_KzEoq0XcHQ

clj image

Image analysis with Clojure and OpenCV: A face detection example


coursera class on music technology

clj devops

use leiningen for scala project scalding

clj data computation

Experimental combination of core.logic and core.matrix to allow reasoning with vectors / mathematical expressions

client cassandra thrift

clj java

jav.nio2 wrapper

(ns test.nio2.test.tail (:use nio2.files))

(defn tail [n p] "Print the last n lines of path p to stdout" (with-open [rdr (reader p)] (doseq [l (take-last n (line-seq rdr))] (println l)) (doseq [e (watch-seq (parent (real-path p)) :modify)] (when (= (real-path (:path e)) (real-path p)) (while (.ready rdr) (println (.readLine rdr)))))))

clj libraries

A macro-based refactoring library for Clojure

Utility libraries and dependency hygiene

Parallel universes for namespaces

Twitter-api [twitter-api "0.7.4"]

Geohash library for clojure by @sunng

misc clj

detect language with com.cybozu.labs.langdetect.DetectorFactory

ssierra lib on namespace

slamhound to install on emacs to write require/import for you

fast idiomatic pretty-printer

display vector , hash as ASCII table clojure.pprint/print-table is for maps only

clojure table layout

clj machine learning

review code on levenshtein algo and memoization!topic/clojure/w6SRYE4n6pc

clojure wrapper on top various nlp libs

clj server

clj perf

Proteus: local mutable variables for the masses by Zach Tellman!topic/clojure/7HNNiJJTte4

A simple IO library for using Clojure's reducers

clj webdev

webframework à la django

websockets with http-kit

JSON on steroid inpired by EDN

Building an iOS weather app with Angular and ClojureScript

clj GUI


Purnam - AngularJs Language Extensions for Clojurescript Inspired by lispyscript, coffescript and clang

cljs properties access

good (.-MAXNUMBER js/Math) and (.ceil js/Math 3.14) not clojure compatible js/Math.MAXNUMBER and (js/Math.ceil 3.14)


Functional parsing library from chapter 8 of Programming in Haskell


Cursive is the Clojure IDE that understands your code

lein faster

Fast JVM launching without the hassle of persistent JVMs.

lein startup

> I take this to mean that there's no widely accepted solution.

The widely-accepted solution is to leave a single process running. It certainly has limitations, but it's the way most people deal with the problem. > Really, I just want `lein run` to be faster. Can someone explain where all > this time is spent?

Basically it comes from having to load two JVMs, one for Leiningen itself and one for the project. Leiningen itself is fairly optimized for this (fully AOTed, bytecode verification is turned off, fancy warm-up JIT techniques disabled) which is why it's possible to get `lein version` to return in under a second in some cases. But there are various compatibility issues that prevent us from being able to perform the same optimizations on project JVMs. These are documented on the "Faster" page of the Leiningen wiki, and you can do some testing to determine whether or not they affect your project in particular; if not then they should provide a good boost. But nothing will ever come close to the speed of keeping the JVM resident, which is why I'm working on `:eval-in :nrepl` and lein.el. For people who don't use Emacs, Jark is the only tool I'm aware of that is working towards this in a way that's decoupled from the editor. They could probably use some help both testing and implementing it. > I hear a lot of talk of compiling, but why would we re-compile things where > none of the dependencies have changed?

Performing a full AOT of all your dependencies will help if you have a large project with lots of dependencies that get loaded at application boot. But that effect would be something along the lines of bringing boot down from 20s to 12s rather than bringing it from 5s to <1s.

org-mode tips

(with-out-str (print-table [{:a 1 :b 2 :c 3} {:b 5 :a 7 :c "dog"}]))

(Using with-out-str is needed because print-table of course returns nil)

But what I get when generating HTML (via "C-c C-e b") is not a table, but the literal text of the table markup. I.e. compiling the above source block yeilds:

Tech stuff

Creating network and connecting from anywhere where people are interested.

Prism All your data, in one place

A [work-in-progress] self-hosted, anti-social RSS reader

Best Content Discovery Application ?

Flipboard Instapaper Pinterest Prismatic Tumblr



programming school

experiments howto teachkids to programs scratch

Potential projects to be completed:

  • hacking and reverse-engineering projects (TBD)
  • web crawling projects: how many Facebook accounts are duplicate or dead? Or categorize Tweets
  • taxonomy creation or improving an existing taxonomy
  • optimal pricing for bid keywords on Google
  • create a web app that provide (in real time) better-than-average trading signals
  • find low-frequency and Botnet fraud cases in a sea of data
  • internship in computational marketing with a data science start-up
  • automated plagiarism detection
  • use web crawlers, assess whether Google Search favors
    • (1) its products over competitors [is this an unfair business practice?],
    • (2) local over non-local results and
    • (3) returns different results to web robots and humans. Identify other bias and patterns in Google search results.

Classroom flipping means assigning lectures as homework, leaving actual classroom time for hands-on instruction and group work. Ng told me his class at Stanford is already doing this, and he’s encouraging other professors to adopt the approach for their Coursera classes as well.


Douglas C. Engelbart A research for augmenting human Intellect




Author: Maximilien

Validate XHTML 1.0