Sur la toile

Table of Contents

Work

Easier Decision-Making: Conduct Experiments By Leo Babauta http://zenhabits.net/test

The science of time perception: stop it slipping away by doing new things http://blog.bufferapp.com/the-science-of-time-perception-how-to-make-your-days-longer

Buufer : A blog about productivity, life hacks, writing, user experience, customer happiness and business. http://blog.bufferapp.com/

series of articles about the challenges of growing an organization http://mdzlog.alcor.net/2013/06/20/scaling-human-systems-organizational-design-and-growth/

Record Your Terminal Share it with no fuss http://ascii.io/

The Importance of Scheduling Nothing by Jeff Weiner CEO at LinkedIn http://www.linkedin.com/today/post/article/20130403215758-22330283-the-importance-of-scheduling-nothing

The tech industry is a meritocracy. We hire people based on their skills alone. http://aphyr.com/posts/275-meritocracy-is-short-sighted

Your life is too short and too valuable to fritter away in work. http://www.brainpickings.org/index.php/2012/12/14/how-to-avoid-work/

how to increase productivity with evernote http://trunk.evernote.com/en

http://blog.zololabs.com/2012/10/16/the-10-secrets-that-make-networking-easy-fun-and-ridiculously-effective/

You have to earn the right to be heard about what you do and what you want to accomplish. People really don’t care about what you do until they know that you care about what they do. Secret #1: Assume the burden of other people’s discomfort Secret #2: Give and expect nothing in return Secret #3: Be proud of who you are Secret #4: Compliment early and often Secret #5: Look for common ground immediately Secret #6: Tap your sphere of influence cautiously Secret #7: Do not keep your personal and professional lives separate Secret #8: Pull—never push Secret #9: Include social media into your networking Secret #10: Lose control of your marketing

Tips and tricks for conf attendee

If a hallway conversation stalls, ask what they're working on. Discover the project they're passionate about. @chrishouser

  1. take notes in Emacs/Vim/etc.
  2. meet people and go to the evening events and such to have good conversations.

@Baranosky

It helps tremendously to read up on the topics before a presentation, so that you aren't entering cold. @AustinTHaas

take notes. revisit the notes. read them again. then write about them. @darevay

My two rules: leave the laptop at hotel and introduce yourself to everyone. It opens so many doors to learning. @jackdanger

the talks will be great, but the hallway conversations are better. be aggressive about meeting people. Take advantage of the face time. @jimduey

Talk to people about the presentations during the breaks. Meeting people is the most rewarding part. @ericnormand

Agile

The easiest way to teach yourself C++ in 21 days http://abstrusegoose.com/249

Estimates in Software Development. New Frontiers. http://agile.dzone.com/articles/estimates-software-development

mumble Low-latency, high-quality voice communication for gamers http://mumble.sourceforge.net/

property-based testing http://blog.jessitron.com/2013/04/property-based-testing-what-is-it.html

Writing tests first forces you to think about the problem you're solving. Writing property-based tests forces you to think way harder.

Gerrit web based code review system https://code.google.com/p/gerrit/

devops

Marelle logic programming (prolog) for devops http://quietlyamused.org/blog/2013/11/09/marelle-for-devops/

openstack thoughts by Alex Gaynor http://alexgaynor.net/2013/jul/11/thoughts-openstack/

cloudmonkey : command line interface for Apache CloudStack http://rohityadav.in/logs/cloudmonkey/

The Cloudcast: From DevOps to Private PaaS http://architects.dzone.com/articles/cloudcast-devops-private-paas

Ansible is the easiest way to deploy, manage, and orchestrate computer systems you've ever seen http://ansible.cc/

Monitoring setup of amara (riemann , graphite better than nagios) http://labs.amara.org/2012-07-16-metrics.html

configuration with clj https://github.com/sonian/carica https://github.com/weavejester/environ

Two ways: one is to use .clj data files on the classpath and take advantage of the fact that different profiles put different resources directories on the classpath. This is the approach taken by Carica (https://github.com/sonian/carica) and works great if you have complex config with nested values.

The other approach is to use environment variables; the best tool for that is Environ: https://github.com/weavejester/environ

bug tracker https://github.com/ragnard/clj-squash with squash

http://logstash.net/ better than splunk ? send data to graphite , librato , gnaglia or graylog ?

innovation

Big Data : nouvelle étape de l’informatisation du monde http://www.internetactu.net/2013/05/14/big-data-nouvelle-etape/

converstion with Alan Kay http://queue.acm.org/detail.cfm?id=1039523

design

hardware

[Kultpfunzel: Kult=cult. Funzel=dim light.] macht hell & ist hackable http://kultpfunzel.ch/

school for poetic computation http://sfpc.io/index.html

A Hardware Accelerated Regular Expression Matcher http://bkase.github.io/CUDA-grep/finalreport.html

How-to build your own GPS Receiver http://www.holmea.demon.co.uk/GPS/Main.htm

ASM Embedded CPU FORTH Verilog Spartan 3 FPGA C++ Raspberry Pi

Dual Boot Windows/Android 2.2 Tablet Straight Out Of Shenzhen http://www.gizchina.com/2011/03/18/dual-boot-windowsandroid-22-tablet-straight-shenzhen/

Tech Preis Vergleich http://www.heise.de/preisvergleich

linux

delete from line 3 up to and including first blank line: sed 3,/^$/d filename

Major Linux Vs UNIX Kernel Differences http://www.thegeekstuff.com/2012/01/linux-unix-kernel/

Read-only Guest tmux Sessions http://brianmckenna.org/blog/guest_tmux

FEstival To build own voice http://festvox.org/

http://everythingsysadmin.com/2012/09/unorthodoxunix.html

  • grep . *.txt
  • more * | cat
  • "fmt -1" (split lines into individual words)
  • gnt-job list | egrep –color=always 'running|waiting'

networks

Source Multiplayer Networking for multi-players games https://developer.valvesoftware.com/wiki/Source_Multiplayer_Networking

Programming Distributed Computing Systems A Foundational Approach By Carlos A. Varela https://mitpress.mit.edu/books/programming-distributed-computing-systems

The network is reliable by aphyr http://aphyr.com/posts/288-the-network-is-reliable

network partition http://damienkatz.net/2013/05/dynamo_sure_works_hard.html

Amazon Dynamo Paper. It has some very interesting concepts, but ultimately fails to provide a good balance of reliability, performance and cost.

Web Dev

Static sites are fast, secure, easy to deploy, and manageable using version control http://jaspervdj.be/hakyll/

check elm-lang presentation

erlang web framework : ezwebframe

aloha https://github.com/ztellman/aloha webserver on top of netty

slides in HMTL5 http://www.htmlfivewow.com/

complete web site example clojars https://github.com/ato/clojars-web

headless web testing with Gecko https://github.com/laurentj/slimerjs

search

distributed programming

Call me maybe Series by Aphyr : zookeeper kafka cassandra nuaDB http://aphyr.com/posts/291-call-me-maybe-zookeeper

programming

Learn [clojure|elixir|go|haskell|…] in Y minutes http://learnxinyminutes.com/

Free books on Computer Science by @okalotieno http://hackershelf.com/browse/

The Original 'Lambda Papers' by Guy Steele and Gerald Sussman http://library.readscheme.org/page1.html

The Anti-Human Consequences of Static Typing by Jay McCarthy http://jeapostrophe.github.io/2013-08-12-types-post.html

Teach Yourself Programming in Ten Years http://norvig.com/21-days.html

Name your arguments by Jamie Wong http://jamie-wong.com/2011/11/28/name-your-arguments/

Learnable Programming : Designing a programming system for understanding programsBret Victor http://worrydream.com/LearnableProgramming/

Part of a series exploring Concepts, Techniques, and Models of Computer Programming. http://michaelrbernste.in/2013/06/20/what-is-declarative-programming.html

The Definitive Reference To Why Maybe Is Better Than Null (error handling) http://nickknowlson.com/blog/2013/04/16/why-maybe-is-better-than-null/

Bertrand Meyer's blog http://bertrandmeyer.com/

tern : editor-independent static analysis engine in javascript http://ternjs.net/ http://marijnhaverbeke.nl/blog/tern.html

Go and Rust — objects without class http://lwn.net/Articles/548560/

Bloom language http://www.bloom-lang.net/ Testing distributed systems by Neil Conway

Design patterns -> theorems -> Language & tool support "When I see patterns in my programs, I consider it a sign of trouble… a sign that I'm not using abstractions that aren't enough powerful" Paul Graham Consistency As Logical Monotinicity

Brian McKenna blog http://brianmckenna.org/blog/

Turing complete http://en.wikipedia.org/wiki/Turing_completeness

The notion of Turing-completeness does not apply to languages such as XML, JSON, YAML and S-expressions, because they are typically used to represent structured data, not describe computation.

The fruits of misunderstanding by prof.dr.Edsger W.Dijkstra http://www.cs.utexas.edu/~EWD/transcriptions/EWD08xx/EWD854.html

How anthropomorphism and analogies make concepts in computer programming harder to understand:

http://www.infoq.com/interviews/erik-meijer-programming-language-design-effects-purity

Erlang has very cheap threads now you can use concurrency as a control structure very close to object oriented programming and dynamic dispatch. what the Reactive framework is, it’s just the continuation monad … it is the observer observable is the dual of enumerable enumerator

Joe Armstrong on languages http://www.codewiz51.com/blog/post/2013/01/24/Post-from-John-Armstrong-inventor-of-Erlang.aspx

What would I recommend learning?

  • C
  • Prolog
  • Erlang (I'm biased)
  • Smalltalk
  • Javascript
  • Hakell / ML /OCaml
  • LISP/Scheme/Clojure

A couple of years should be enough (PER LANGUAGE).

Notice there is no quick fix here - if you want a quick fix go buy "learn PHP in ten minutes" and spend the next twenty years googling for "how do I compute the length of a string"

The crazy think is we still are extremely bad at fitting things together - still the best way of fitting things together is the unix pipe

find … | grep | uniq | sort | …

and the fundamental reason for this is that components should be separated by well-defined protocols in a universal intermediate language.

Fitting things together by message passing is the way to go - this is basis of OO programming - but done badly in most programming languages.

If ALL applications in the world were interfaced by (say) sockets + lisp S expressions and had the semantics of the protocol written down in a formal notation - then we could reuse things (more) easily.

Today there is an unhealthy concentration on language and efficiency and NOT on how things fit together and protocols - teach protocols and not languages.

And teach ALGORITHMS.

rust http://www.rust-lang.org/ http://static.rust-lang.org/doc/tutorial.html

Rust is a curly-brace, block-structured expression language. It visually resembles the C language family, but differs significantly in syntactic and semantic details. Its design is oriented toward concerns of “programming in the large”, that is, of creating and maintaining boundaries – both abstract and operational – that preserve large-system integrity, availability and concurrency. It supports a mixture of imperative procedural, concurrent actor, object-oriented and pure functional styles. Rust also supports generic programming and metaprogramming, in both static and dynamic styles.

Learning How To Learn Programming http://michaelrbernste.in/2013/02/23/notes-on-teaching-with-the-kernel-language-approach.html

from Van Roy and Haridi's book

data parallezisation : incremental datalog computation http://research.microsoft.com/en-us/projects/naiad/

http://channel9.msdn.com/posts/Frank-McSherry-Introduction-to-Naiad-and-Differential-Dataflow Naiad is an investigation of data-parallel dataflow computation in the spirit of Dryad and DryadLINQ, but with a focus on incremental computation. Naiad introduces a new computational model, differential dataflow, operating over collections of differences rather than collections of records, and resulting in very efficient implementations of programming patterns that are expensive in existing systems.

var text.SelectMany(x => x.Split(' ')) .Count(y => y, (k, c) => k " : " c) .subscribe(l => {foreach (var element in l) Console.writeLine(element)})

concurrency / parallelism http://www.maymounkov.org/clash-concurrency-parallelism-practice

Concurrency is a property of the algorithm that you are designing. It determines which parts of your data-processing logic are intrinsically independent (under all inputs and circumstances).

Parallelism is a property of the realization of your algorithm. This is not your source code, but the final executable or — even more abstractly — the behavior of your program when executed.

creative coding

Julia Buntaine‘s artwork provides conceptual footholds for issues in neuroscience http://thebeautifulbrain.com/2013/07/interview-julia-buntaine/

phenomenon of creative computing http://10print.org/

creator of Processing : Casey Reas http://reas.com/

Processing

hardware : kinect (detect human motion windows-based) arduino, touchOSC, Monome, Leap motion ()

Generative Art Matt Pearson. / Learning Processing: A Beginner's Guide to Programming Images, Animation, and Interaction Daniel Shiffman.

Algorithms for Visual Design Using the Processing Language Kostas Terzidis

concurrency

Adopting Ideas from Erlang and Clojure for a Highly Concurrent, Simple and Maintainable Application http://blog.paralleluniverse.co/post/64210769930/spaceships2

RiconEast distributed system http://www.jkemp.net/blog/review-ricon-east/

great explanation of concurrency concepts in clojure http://www.youtube.com/watch?v=wASCH_gPnDw at the End

  • CAS semantics : Atom
  • Coordinated change inside a transaction : ref

probabilistic programming

probabilitic programming in clojure by Nils Bertschinger bertschi@mis.mpg.de https://github.com/bertschi/ProbClojureNice

haskell

Haskell from C: Where are the for Loops? https://www.fpcomplete.com/blog/2013/06/haskell-from-c

School of Haskell https://www.fpcomplete.com/

anatomy of programming language http://www.cs.utexas.edu/~wcook/anatomy/anatomy.pdf

Programming in Haskell, Graham Hutton http://www.cs.nott.ac.uk/~gmh/book.html

c#

go

Go on App Engine: tools, tests, and concurrency by The Go Blog http://blog.golang.org/appengine-dec2013

The examples from Tony Hoare's seminal 1978 paper "Communicating sequential processes" implemented in Go. http://godoc.org/github.com/thomas11/csp

python

Learn Python The Hard Way http://learnpythonthehardway.org/book

recognizing numbers http://www.johndcook.com/blog/2013/04/30/recognizing-numbers/

>>> from sympy import * >>> nsimplify(4.242640687119286) 3*sqrt(2)

redo: a top-down software build system https://github.com/apenwarr/redo

Writing clean, testable, high quality code in Python http://www.ibm.com/developerworks/aix/library/au-cleancode/

scala

Applicatives are too restrictive, breaking Applicatives and introducing Functional Builders http://sadache.tumblr.com/post/30955704987/applicatives-are-too-restrictive-breaking-applicatives

Designing scala librairies (slides) http://scalapenos.com/2013/04/26/scala-presentation.html

Ztream is POC P2P-assisted Web music streaming built with WebRTC, Media Source API, AngularJS, Play, ReactiveMongo http://ztream.atamborrino.cloudbees.net/

easy to write MapReduce jobs in Hadoop on top of cascading https://github.com/twitter/scalding/wiki

Gabbler, a Reactive Chat App – part 2 by hseeberger http://hseeberger.github.io/blog/2013/07/10/gabbler-part2/

Abstract Algebra for Scala https://github.com/twitter/algebird

approximate set size (in much less memory with HyperLogLog), approximate item counting (using CountMinSketch)

Jscala blog

Programmer Fast Track in Atomic Scala book http://www.atomicscala.com/

javascript

Roundup of HTML-Based Slide Deck Toolkits http://www.impressivewebs.com/html-slidedeck-toolkits/

json editor http://jsonlint.com/

JavaScript Library for Mobile-Friendly Interactive Maps http://leafletjs.com/

Algo

Dijkstra's Algorithm as a Sequence (clojure implementation) http://hueypetersen.com/posts/2013/07/09/dijkstra-as-a-sequence/

Create perfect maze : Eller's Algorithm http://www.neocomputer.org/projects/eller.html

Implementations of Monoids for interesting approximation algorithms, such as Bloom filter, HyperLogLog and CountMinSketch https://github.com/twitter/algebird

Multivariate Change of Variables in Integration Theorem (MCVIT, that’s a mouthful http://onehappybird.com/2012/12/03/whats-the-most-important-theorem/

Math ∩ Programming A place for elegant solutions http://jeremykun.com/2013/01/22/depth-and-breadth-first-search/

http://awelonblue.wordpress.com/2013/01/24/exponential-decay-of-history-improved/ @cgrand implementation https://gist.github.com/cgrand/4722914

Exponential decay of history is a pattern that competes with ring-buffers, least-recently-used heuristics, and other techniques that represent historical information in a limited space.

Math

Data Driven: The New Big Science https://www.simonsfoundation.org/quanta/20131004-the-mathematical-shape-of-things-to-come/

Topologic Data Analysis , NBA example (Ayasdi)

Probability (Theory) Tutorials by Noel Vaillant http://www.probability.net/

Classical Mechanics: A Computational Approach by Jack Wisdom Gerald Jay Sussman http://groups.csail.mit.edu/mac/users/gjs/6946/

Counting selections with replacement ((n k)) http://www.johndcook.com/select_with_replacement.html

The theorems of Frobenius and Suzuki on finite groups by Terence Tao http://terrytao.wordpress.com/2013/04/12/the-theorems-of-frobenius-and-suzuki-on-finite-groups/

The Probabilistic Method : How many lights can you turn on? http://www.johndcook.com/blog/2013/06/04/how-many-lights-can-you-turn-on/

Blog I wasnt prepared to work http://symbo1ics.com/blog/?p=1803

Math Primer for programmers http://jeremykun.com/primers/

Math with Bad Drawings : blog http://mathwithbaddrawings.com/

Graph Partitioning and Expanders http://venture-lab.stanford.edu/expanders

algorithms for graph partitioning and clustering, constructions of expander graphs, and analysis of random walks

blog Norman Wildberger http://njwildberger.wordpress.com

The Life and Times of the Central Limit Theorem (History of Mathematics) William J. Adams

divine proportion http://web.maths.unsw.edu.au/~norman/

The new form of trigonometry developed here is called rational trigonometry, to distinguish it from classical trigonometry, the latter involving cos θ, sin θ and the many trigonometric relations currently taught to students. An essential point of rational trigonometry is that quadrance and spread, not distance and angle, are the right concepts for metrical geometry (i.e. a geometry in which measurement is involved).

AI

OSCON 2013: Carin Meier "The Joy of Flying Robots with Clojure" http://www.youtube.com/watch?v=Ty9QDqV-_Ak with roomba, drone https://github.com/gigasquid/clj-drone

UI

APIs: The Future Is Now http://www.uie.com/articles/api_future/

Category theory in practice

Of Algebirds, Monoids, Monads, and Other Bestiary for Large-Scale Data Analytics http://www.michael-noll.com/blog/2013/12/02/twitter-algebird-monoid-monad-for-large-scala-data-analytics/

Algebra for Analytics by P. Oscar Boykin https://speakerdeck.com/johnynek/algebra-for-analytics

category theoretic approach to optimizing MapReduce-like pipelines http://blog.ezyang.com/2013/05/category-theory-for-loop-optimizations/

You Could Have Invented Monads! (And Maybe You Already Have.) http://blog.sigfpe.com/2006/08/you-could-have-invented-monads-and.html?m=1

Distributed System

The Raft Consensus Algorithm http://raftconsensus.github.io/

Distributed Systems Archaeology: Works Cited by Michael R. Bernste http://michaelrbernste.in/2013/11/06/distributed-systems-archaeology-works-cited.html

Messaging

event sourcing for functional programmers http://danielwestheide.com/talks/flatmap2013/slides/index.html#/

RabbitMQ on the cloud AWS http://www.cloudamqp.com/

Rabbit farms is a standalone service for publish RabbitMQ messages https://github.com/erlang-china/rabbit_farms

Rabbitmq vs. kafka http://www.quora.com/RabbitMQ/RabbitMQ-vs-Kafka-which-one-for-durable-messaging-with-good-query-features

  • but clearly large amounts of persistent messages sitting in the broker was not the main design case for AMQP in general."
  • (It's contrasted with Kafka, which is "designed for holding and distributing large volumes of messages"
  • longer-lived work queues are really more of a Hadoop thing, not an in-memory queue thing

Use Kafka if you have a fire hose of events (100k+/sec) you need delivered in partitioned order 'at least once' with a mix of online and batch consumers, you want to be able to re-read messages, you can deal with current limitations around node-level HA (or can use trunk code), and/or you don't mind supporting incubator-level software yourself via forums/IRC.

Use RabbitMQ if you have messages (20k+/sec) that need to be routed in complex ways to consumers, you want per-message delivery guarantees, you don't care about ordered delivery, you need HA at the cluster-node level now, and/or you need 24x7 paid support in addition to forums/IRC.

An Express + Socket.io based chat app that uses Redis as session store & RabbitMQ for PubSub https://github.com/rajaraodv/rabbitpubsub

AMQP resources

AMQP resources:

Servers: RabbitMQ (Rabbit Technologies, Erlang/OTP, MPL) - http://rabbitmq.com ZeroMQ (iMatix/FastMQ/Intel, C++, GPL3) - http://www.zeromq.org OpenAMQ (iMatix, C, GPL2) - http://openamq.org ActiveMQ (Apache Foundation, Java, apache2) - http://activemq.apache.org

Steve Vinoski explains AMQP in his column, Towards Integration http://steve.vinoski.net/pdf/IEEE-Advanced_Message_Queuing_Protocol.pdf

John O'Hara on the history of AMQP http://www.acmqueue.org/modules.php?name=Content&pa=showpage&pid=485

Dmitriy's presentation on RabbitMQ/AMQP http://somic-org.homelinux.org/blog/2008/07/31/slides-for-my-amqprabbitmq-talk/

ZeroMQ's analysis of the messaging technology market http://www.zeromq.org/whitepapers:market-analysis

Pieter Hintjens's background to AMQP http://www.openamq.org/doc:amqp-background

Barry Pederson's py-amqplib http://barryp.org/software/py-amqplib/

Ben Hood on writing an AMQP client http://hopper.squarespace.com/blog/2008/6/21/build-your-own-amqp-client.html

Dmitriy Samovskiy introduces Ruby + QPid + RabbitMQ http://somic-org.homelinux.org/blog/2008/06/24/ruby-amqp-rabbitmq-example/

Ben Hood's as3-amqp http://github.com/0x6e6562/as3-amqp http://hopper.squarespace.com/blog/2008/7/4/server-side-as3.html http://hopper.squarespace.com/blog/2008/3/24/as3-amqp-client-first-cut.html

RabbitMQ's protocol code generator http://hg.rabbitmq.com/rabbitmq-codegen/

Erlang Exchange presentation on the implementation of RabbitMQ http://skillsmatter.com/podcast/erlang/presenting-rabbitmq-an-erlang-based-implementatio-nof-amqp http://www.lshift.net/blog/2008/07/01/slides-from-our-erlang-exchange-talk

Jonathan Conway's series on RabbitMQ and using it with Ruby/Merb http://jaikoo.com/2008/3/20/daemonize-rabbitmq http://jaikoo.com/2008/3/14/oh-hai-rabbitmq http://jaikoo.com/2008/2/29/friday-round-up-2008-02-29 http://jaikoo.com/2007/9/4/didn-t-you-get-the-memo

Open Enterprise's series on messaging middleware and AMQP http://www1.interopsystems.com/analysis/can-amqp-break-ibms-mom-monopoly-part-1.html http://www1.interopsystems.com/analysis/can-amqp-break-ibms-mom-monopoly-part-2.html http://www1.interopsystems.com/analysis/can-amqp-break-ibms-mom-monopoly-part-3.html

Messaging and distributed systems resources:

A Critique of the Remote Procedure Call Paradigm http://www.cs.vu.nl/~ast/publications/euteco-1988.pdf

A Note on Distributed Computing http://research.sun.com/techrep/1994/smli_tr-94-29.pdf

Convenience Over Correctness http://steve.vinoski.net/pdf/IEEE-Convenience_Over_Correctness.pdf

Metaprotocol Taxonomy and Communications Patterns http://hessian.caucho.com/doc/metaprotocol-taxonomy.xtp

Joe Armstrong on Erlang messaging vs RPC http://armstrongonsoftware.blogspot.com/2008/05/road-we-didnt-go-down.html

SEDA: scalable internet services using message queues http://www.eecs.harvard.edu/~mdw/papers/seda-sosp01.pdf)

A Node.js app that shows the power for RabbitMQ's Work-queue https://github.com/rajaraodv/rabbitworkers

Erlang

Java

Data NLP

Natural Language Toolkit for python http://nltk.org/

The World's Best Grammar Checker http://www.grammarly.com/

Open Data

Our aim is to track every government financial transaction across the world http://openspending.org/

Digital Public Library of America http://dp.la

Europe : think culture http://europeana.eu/

FORMA Forest Monitoring for Action project in cascalog https://github.com/reddmetrics/forma-clj

GDELT Global Data on Events, Location and Tone : data for historians http://eventdata.psu.edu/data.dir/GDELT.html

Thoughts on GDELT http://johnbeieler.org/blog/2013/04/12/gdelt/ http://badhessian.org/2013/04/gdelt-and-social-movements/

Data tells you whether to use A or B. Science tells you what A and B should be in the first place.

Politis Data : Militarized Interstate Disputes http://www.correlatesofwar.org/COW2%20Data/MIDs/MID310.html

Data

financial, economic and social datasets http://www.quandl.com/

The Free Wiki World Map http://www.openstreetmap.org/

The MNIST database of handwritten digits http://yann.lecun.com/exdb/mnist/

The Harvard Dataverse Network social science research data http://dvn.iq.harvard.edu/dvn/

dataset contains contains 1,362,109 reviews of Amazon products http://www.mblondel.org/data/

http://www.mblondel.org/data/amazon7.pkl.tar.bz2 try: import joblib except ImportError: from sklearn.externals import joblib

data = joblib.load("amazon7.pkl") X = data["X"] y = data["y"] print X.shape print y.shape print data["categories"]

Data Mining Community's Top Resource kdnuggets http://www.kdnuggets.com/2013/05/added-to-kdnuggets-in-april.html

the industry's online resource for big data practitioners http://www.datasciencecentral.com/

5-part video series: Exploring the @IBMbigdata #BigData Accelerator for Machine Data #Analytics http://www.youtube.com/watch?v=qnCtMKpYt3E

data analytics stories blog http://www.analyticstory.com/kovas-boguta/

linked data RDF book http://www.manning.com/dwood/

LDB: The BigData In-Memory database built with Erlang, C and LISP http://www.erlang-factory.com/conference/SFBay2013/speakers/JohnVlachoyiannis

Fogus references about events and history ariadne

[Out of the Tarpit](http://lambda-the-ultimate.org/node/1446) by Marks and Moseley

[Fundamental concepts of plugin infrastructures](http://eli.thegreenplace.net/2012/08/07/fundamental-concepts-of-plugin-infrastructures/) by Eli Bendersky

[Jess in Action](http://www.jessrules.com/jesswiki/view?JessInAction) by Ernest Friedman-Hill

innovative data companies http://www.zdnet.com/are-these-the-worlds-most-innovative-big-data-companies-7000011135/

Operations-improver Splunk Tech-trend tracker Quid Data scientist tournament host Kaggle Credit rating revolutionary ZestFinance Electronic medical record streamliner Apixio Business intelligence visualizer Datameer Marketing modeler BlueKai Enterprise social media simplifier Gnip Brick-and-mortar customer analyzer RetailNext Compliance catalyst Recommind

Supersonic is intended to be used as a back-end for various data warehousing projects https://code.google.com/p/supersonic/

Supersonic is an ultra-fast, column oriented query engine library written in C++. It provides a set of data transformation primitives which make heavy use of cache-aware algorithms, SIMD instructions and vectorised execution, allowing it to exploit the capabilities and resources of modern, hyper pipelined CPUs. It is designed to work in a single process.

financial dataset http://www.quandl.com

search dataset http://www.zanran.com/q/ Open data @CTIC

Data Analysis

Transportation optimization starts with math –> understanding human behavior. http://nautil.us/issue/3/in-transit/unhappy-truckers-and-other-algorithmic-problems

A Statistical Analysis of Nerf Blasters and Darts By Shawn O'Neil http://shawntoneil.com/index.php/pages/nerftest1

videos from datagotham conference http://www.datagotham.com/videos/

The Dangers of Overfitting or How to Drop 50 spots in 1 minute http://blog.kaggle.com/2012/07/06/the-dangers-of-overfitting-psychopathy-post-mortem/

implementation for a Restricted Boltzmann Machine and a Deep Belief Network http://tjake.github.io/blog/2013/02/18/resurgence-in-artificial-intelligence/

Mobile Phone Data Proves Humans Are Predictable During Chaos http://www.fastcolabs.com/3009706/mobile-phone-data-proves-humans-are-predictable-during-chaos

inclass challenge https://inclass.kaggle.com/

Data API

Data Computing

Play Framework Grid Deployment with Mesos http://typesafe.com/blog/play-framework-grid-deployment-with-mesos

GO BEYOND "DEBUG": WIRE TAP YOUR APP FOR KNOWLEDGE WITH HADOOP by leg Zhurakousky http://oredev.org/2013/wed-fri-conference/go-beyond-debug-wire-tap-your-app-for-knowledge-with-hadoop

How to write a crawler by Emanuele Minotto http://www.emanueleminotto.it/how-to-write-a-crawler

Quick tour of hive pigh data scientists tools via hortonworks http://hortonworks.com/get-started/analyze/

Evolutionary Computing with Push http://faculty.hampshire.edu/lspector/push.html

ETL tools

  • AMPLab – Mesos, plus BDAS Berkeley Data Analytics Stack
  • Cascading/Cascalog/Scalding, not limited to Hadoop since other topologies are possibles;
  • Twitter – Summingbird, Storm, etc.;
  • Facebook – Presto;
  • Anaconda/IPython/Pandas;
  • Actian/ParAccel/Knime,

Mesos framework for long running services https://github.com/mesosphere/marathon

cascading

History, patterns and future of Scalding by P. Oscar Boykin https://speakerdeck.com/johnynek/history-patterns-and-future-of-scalding

Why all this interest in Spark? by Denny Lee http://dennyglee.com/2013/08/19/why-all-this-interest-in-spark/

Python library for dealing with messy tabular data in several formats, guessing types and detecting headers. https://messytables.readthedocs.org/en/latest/

Stream summarizer and cardinality estimator in java https://github.com/clearspring/stream-lib

hRaven collects run time data and statistics from MapReduce jobs in an easily queryable format https://github.com/twitter/hraven

Open Platform for Visual Analytics http://www.datapad.io/

cascading Paco Nathan http://hadoopsummit.org/san-jose-blog/speaker-interview-paco-nathan/

"That workflow abstraction is important. For example, PMML has excellent features for ensembles and other complex patterns encountered in the more competitive areas of industry."

Introduction to Data Processing with Python http://opentechschool.github.io/python-data-intro/

Building a Classification Framework with Hive and Python http://www.impermium.com/blog/building-a-classification-network-with-hive-python/

how twitter uses nosql : FlockDB pig http://readwrite.com/2011/01/02/how-twitter-uses-nosql

DEAP is a novel evolutionary computation framework for rapid prototyping and testing of ideas. https://code.google.com/p/deap/

Big Data Cloud Classes by Bill Howe http://homes.cs.washington.edu/~billhowe/bigdatacloud/

mrjob : Run MapReduce jobs on Hadoop or Amazon Web Service https://github.com/Yelp/mrjob

A set of tutorial codes about matrix methods in Hadoop with mrjob https://github.com/dgleich/matrix-hadoop-tutorial

Implementation of some deep learning algorithms (python C) build on top of cudamat https://github.com/nitishsrivastava/deepnet

Trident-ML is a realtime online machine learning library built on top storm https://github.com/pmerienne/trident-ml

map-reduce algorithms explained slides http://de.slideshare.net/amundtveit/mapreduce-algorithms

Heka, a tool for high performance data gathering, analysis, monitoring, and reporting http://blog.mozilla.org/services/2013/04/30/introducing-heka

Hacking Redis (data structure server): Adding Interval Sets http://www.starkiller.net/2013/05/03/hacking-redis-adding-interval-sets

R integration in strom https://github.com/quintona/storm-r

HP research : Presto Distributed R for big data http://www.hpl.hp.com/research/documentation.htm

Serengeti to enable the rapid deployment of Hadoop clusters on a virtual platform. http://serengeti.cloudfoundry.com/

Large Scale Math with Hadoop MapReduce @hortonworks http://de.slideshare.net/hortonworks/large-scale-math-with-hadoop-mapreduce

Twitter search use case : storm + kafka + Mechanical Turk http://engineering.twitter.com/2013/01/improving-twitter-search-with-real-time.html

Hadoop and the Data Warehouse: When to Use Which http://hortonworks.com/blog/hadoop-and-the-data-warehouse-when-to-use-which/

linkedin archtecture : kafka , hadoop , voldemort , nodejs http://engineering.linkedin.com/mobile/linkedin-mobile-introducing-personalized-navigation

Saddle is a data manipulation library for Scala http://saddle.github.com/doc/index.html

Data Science

Text Feature Extraction (td-idf) part-2 by Christian S. Perone http://pyevolve.sourceforge.net/wordpress/?p=1747

Estimating User Lifetimes : pyMCMC by Cam Davidson-Pilon @cmrndp http://blog.yhathq.com/posts/estimating-user-lifetimes-with-pymc.html

Towards Linked Statistical Data Analysis http://csarven.ca/linked-statistical-data-analysis

alternating direction method of multipliers is well suited to distributed convex optimization http://www.stanford.edu/~boyd/papers/admm_distr_stats.html

3 Big Data Tech Talks You Can’t Miss by Christos Faloutsos Deepak Agarwal Jay Kreps http://engineering.linkedin.com/event/video-3-big-data-tech-talks-you-can%E2%80%99t-miss

Block Coordinate Descent Algorithms for Large-scale Sparse Multiclass Classification by Mathieu Blondel http://www.mblondel.org/code/mlj2013/

Machine Learning in python : blog http://www.mblondel.org/

The World’s Top 7 Data Scientists before there was Data Science http://conductrics.com/the-worlds-7-top-data-scientists-before-there-was-datascience/

The Multi-Armed Bandit Problem with examples and visualization http://camdp.com/blogs/multi-armed-bandits

Recommendation System

myrrix succesor of mahout ? http://myrrix.com/quick-start/

java -Dmodel.features=100 -Dmodel.als.lambda=2 -Xmx512m -jar myrrix-serving-1.0.1.jar –port 8080

How Hacker News ranking algorithm works in Paul raham lisp http://amix.dk/blog/post/19574

Deconstructing Recommender Systems : Amazon and Netflix use cases http://spectrum.ieee.org/computing/software/deconstructing-recommender-systems

Deep Learning

Recent Developments in Deep Learning http://www.youtube.com/watch?v=VdIURAu1-aU

Deep Neural Networks for Speech and Image Processing http://www.youtube.com/watch?v=DYu9D1M5rII

Deep Learning tutorial http://deeplearning.net/tutorial/

Graph / Network

Apache Giraph : scalable iterative graph processing system open-source counterpart to Pregel http://giraph.apache.org/

Probabilistic Data Structures for Web Analytics and Data Mining http://highlyscalable.wordpress.com/2012/05/01/probabilistic-structures-web-analytics-data-mining/

loglog counting , Frequency Estimation: Count-Min Sketch, Heavy Hitters: Stream-Summary Range Query: Array of Count-Min Sketches Membership Query: Bloom Filter

Why every statistician should know about cross-validation http://robjhyndman.com/hyndsight/crossvalidation/

Truthy is a research project that helps you understand how communication spreads on Twitter http://truthy.indiana.edu/

AI web-site about agent, neural network, genetic algo http://ai-junkie.com/

extensive list of SVM tutorials http://svms.org/tutorials/

clustering with Neural Networks : Kohonen's Self Organizing Feature Maps http://ai-junkie.com/ann/som/som1.html

Andrew Ng - Machine Learning via Large-scale Brain Simulations http://www.youtube.com/watch?v=5elcmFNRCWk

masters of machine learning "The Large Scale Learning class" http://cilvr.cs.nyu.edu/doku.php?id=courses:bigdata:slides:start

introduction: nline linear learning: Lecture 2 2nd order methods and analysis of convergence: Demos in Torch BFGS and Limited Storage BFGS: Lecture 3 Online learning for non-linear/non-convex models: Boosted decision trees (guest lecture by Tong Zhang) Example code in R Lecture 4 Hadoop All-Reduce; Lecture 5 Torch tutorial; torch basics; machine learning tutorial; video CUDA tutorial (by Matthew Zeiler); PDF part 1; PDF part 2; video; video for Torch 7 CUDA demo Lecture 6 Feature learning, representation learning; Lecture 7 Feature learning, deep learning;

Lecture 8 Inverted Indicies and predictive indexing, hashing Project ideas description video John Langford's projects Xiang Zhang's projects Yann LeCun's projects Lecture 9 The ad problem, advertising placement and such (guest lecturer: Leon Bottou, Microsoft Research) Lecture 10 Classic and advanced bandits (John Langford) Lecture 11 Counterfactual reasoning (Leon Bottou) Advanced topics (John Langford) Lecture 12 Active Learning, Indexing (John Langford) Slides: PDF Video Lecture 13 Deep Learning in Text and Speech Recognition Lecture 14 : Many Classes, Logarithmic-Time Prediction

Analyze Text Similarity with R: Latent Semantic Analysis and Multidimentional Scaling http://bodongchen.com/blog/?p=301

Project: Supervised Classification for Sentiment Analysis http://www.umiacs.umd.edu/~resnik/ling773_sp2009/project/sentiment_project.html

Le Macroscope by Joal Rosnay http://fr.wikipedia.org/wiki/Le_Macroscope

What are the Top 10 Problems in Machine Learning for 2013? http://www.quora.com/Machine-Learning/What-are-the-Top-10-Problems-in-Machine-Learning-for-2013

Churn Prediction , Sentiment Analysis, Truth Veracity ,Recommendations, online Ads, News Aggregations, Scalability , Content Discovery/Search Inteligent learning , medicine

Classifying Websites with Neural Networks http://blog.datafiniti.net/?p=34

Numerical optimizers for Logistic Regression in python : Trust Region better than BFGS http://fa.bianp.net/blog/2013/numerical-optimizers-for-logistic-regression/

Great introduction of macho

CART explained with R as laternatives to logistic regression http://statistical-research.com/a-brief-tour-of-the-trees-and-forests/

real experiment using conditional probabilities http://nerds.airbnb.com/location-relevance/

Concordance and Discordance in Logistic Regression http://statour.blogspot.ch/2012/12/concordance-and-discordance-in-logistic.html

Machine learning in-depth tutorial based on scikit-learn http://scikit-learn.org/dev/user_guide.html

Naives Bayes for sentiment analysis http://phpir.com/bayesian-opinion-mining

Support Vector Machine in PHP http://phpir.com/support-vector-machines-in-php

Deep Unsupervised learning with sparse filtering applied to Kaggle : Black Box http://fastml.com/deep-learning-made-easy/

Job salary prediction at Kaggle resolved with logistic regression http://fastml.com/regression-as-classification/

Best Open Source Data Mining Software : Weka Orange RapidMiner Knime JHepWork http://www.junauza.com/2010/11/free-data-mining-software.html

Data Mining: Practical Machine Learning Tools and Techniques by Hall Witten Frank http://www.cs.waikato.ac.nz/ml/weka/book.html

Data science blog @CmrnDP http://camdp.com/blogs/

Neuro Science @coursera https://coursera.org/compneuro based on :

Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems by Peter Dayan and Larry Abbott http://www.gatsby.ucl.ac.uk/~dayan/book/

Temporal Networks by Petler Holme http://arxiv.org/abs/1108.1780

netsci conference : http://tdn2013.wix.com/tdn2013

Scalable Machine Learning by Alex Smola http://alex.smola.org/teaching/berkeley2012/index.html

gradient descent blog from Daniel Duckworth http://stronglyconvex.com/blog.html

Yurii Nesterov established the Accelerated Gradient Method http://stronglyconvex.com/blog/accelerated-gradient-descent.html

Microsoft Focus in France on Machine Learning http://research.microsoft.com/en-us/news/features/mlsqa-042213.aspx

berkeley intro data science course : material http://datascienc.es/schedule/

30 Most Influential Data Scientists on Twitter http://storify.com/Kalido/most-influential-data-scientists-on-twitter

Conditional (Partitioned) Probability — A Primer http://jeremykun.com/2013/03/28/conditional-partitioned-probability-a-primer/

de Bruijn Graphs for Genome Assembly http://www.homolog.us/Tutorials/index.php?p=1.1

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems http://www.princeton.edu/~sbubeck/book.html

Random topics on optimization, probability, and statistics. By Sébastien Bubeck https://blogs.princeton.edu/imabandit/

Slides: The Evolution of Regression [Part 1] from @salfordsystems http://1.salford-systems.com/blog/bid/273493/Video-The-Evolution-of-Regression-Part-1

data mining book in python http://guidetodatamining.com/

blog http://aicoder.blogspot.ch/ Neal Richter

Understanding the Bias-Variance Tradeoff http://scott.fortmann-roe.com/docs/BiasVariance.html

Accurately Measuring Model Prediction Error http://scott.fortmann-roe.com/docs/MeasuringError.html

Top-down particle filtering for Bayesian decision trees http://arxiv.org/abs/1303.0561

Paul Lam cascalog data scientist http://www.quantisan.com/ incanter for the future

get dataset http://datakind.org/

data tools as unix tools http://www.cascading.org/multitool/

Thoughts on Statistics and Machine Learning http://normaldeviate.wordpress.com/

overview of cascading http://vimeo.com/59610496 (Paco Nathan) scalding http://vimeo.com/59610497 from Chicago Hadoop User Group

R

finance use case : Minimal variance asset allocation for Stocks ISA http://www.quantisan.com/minimal-variance-asset-allocation-for-stocks-isa/

database

RethinkDB to store JSON documents http://rethinkdb.com/docs/architecture/

MongoDB basics for everyone – Part 5 Using find() and findOne() http://paulscott.co.za/blog/mongodb-basics-for-everyone-part-5-using-find-and-findone/

Distributed Algorithms in NoSQL Databases http://highlyscalable.wordpress.com/2012/09/18/distributed-algorithms-in-nosql-databases/

data consistency , data placement, system coordination

Inside HyperLevelDB : makes LevelDB faster http://hackingdistributed.com/2013/06/17/hyperleveldb/

Building Cloud Storage Services with Riak http://architects.dzone.com/articles/building-cloud-storage

atomic commit explained : http://www.sqlite.org/atomiccommit.html

Next Generation Databases http://nosql-database.org/

google F1 The Fault-Tolerant Distributed RDBM http://research.google.com/pubs/pub38125.html

Data Visualization

Sparpaket des Kantons Bern visualisiert by Thomas Preusse und Oleg Lavrovsky http://www.stuermer.ch/maemst/2013/07/asp-2014/

Python interactive visualization library for large dataset https://github.com/ContinuumIO/Bokeh based on https://github.com/JosephCottam/Stencil

Twitter hastags viz by QuatarComputingResearchIinstitute http://scd1.qcri.org/tca/

JS data projects from okfnlabs http://okfnlabs.org/projects/

Recline.js : relax with your data http://okfnlabs.org/recline/docs/

GED VIZ is a new online-tool for visualizing complex economic relations http://viz.ged-project.de/?lang=en

Jason Davies's blog http://www.jasondavies.com/

financial map viz : between map and flowchart http://opencorporates.com/viz/financial/index.html

maps

GDAL - Geospatial Data Abstraction Library http://www.gdal.org/ : translator library for raster geospatial data formats

Best maps tools http://bashooka.com/freebie/great-tools-for-building-interactive-maps/

  • leafletjs
  • mapbox
  • polymaps
  • maptales
  • modestmaps
  • INTERACTIVE WORLD MAPS worldpress plugin
  • JQUERY INTERACTIVE SVG MAP PLUGIN
  • POINT OF INTEREST (POI) AUTO MAP
  • zeemaps
  • MAPS.STAMEN.COM
  • Kartograph

Compare Urban Life Around the Globe With New Side-by-Side City Maps http://www.wired.com/wiredscience/2013/07/urban-observatory/

Sattelite raster http://www.jasondavies.com/maps/raster/satellite/

But remember this started with vector tiles. And the vector tiles are in the Mercator projection. It’s much harder to take Mercator tiles and reproject them to a different projection because you don’t know which tiles are visible.

Hard problems like this are Jason Davies’ bread and butter. Jason saw the above examples and set out to determine which tiles would be visible in an arbitrary projection. He then created the above visual demonstration of his algorithm. The red tiles are the ones that are visible, and as you zoom in and out, you can see it recalculate the set of needed tiles instantly.

How to Map Where You've Mapped in OpenStreetMap with tilemill http://www.mapbox.com/blog/how-to-map-contributions-openstreetmap/

data.stories on maps with Mike Migurski http://datastori.es/data-stories-20-maps-migurski/

modestmaps js library for maps made by stamen team http://modestmaps.com/

Stamen a design and technology studio in San Francisco maps and data visualization. The next most obvious thing. http://stamen.com/

help maps to be better http://walking-papers.org/ by Michal Migurski http://mike.teczno.com/

Convert Address to long,lat http://www.gpsvisualizer.com/geocoder/

satellite maps explained by MapBox http://www.wired.com/design/2013/05/a-cloudless-atlas/

Jerome Cukier's blog : communicating with data http://www.jeromecukier.net/

the prefuse visualization toolkit http://prefuse.org/

Brett Victor videos http://worrydream.com/May2013/

NBA stats vizu http://www.nytimes.com/interactive/2012/06/11/sports/basketball/nba-shot-analysis.html?ref=sports&_r=0

@wardnyt Sports Graphics Editor http://nytimes.com Matthew Ericson @mericson Deputy Graphics Director at The New York Times New York, NY · ericson.net Jeremy White @blueshirt Graphics editor for The New York Times, while also pursuing a PhD in geography with an emphasis on interactive cartography New York City · blueshirt.com

The Art of Data Visualization by Edward Tufle http://datascientistinsights.com/2013/05/10/the-art-of-data-visualization/

http://www.openculture.com/2013/05/the_art_of_data_visualization_.html Data Visualization History goes together with Science history ( Maps, Galileo …)

Creating a hexagonal cartogram by Ralph Straumann http://www.ralphstraumann.ch/blog/2013/05/creating-a-hexagonal-cartogram/

A (personal) blog of data sketches from the New York Times Graphics Department http://chartsnthings.tumblr.com/

Visualize big graph data by mathieu-bastian http://de.slideshare.net/mathieu-bastian/visualize-big-graph-data

Viz example : Location of Every Photo From the InternationalSpaceStation http://natronics.github.io/ISS-photo-locations/

Vega is a visualization grammar, a declarative format for creating, saving and sharing visualization designs. https://github.com/trifacta/vega/wiki

Nathan Yau Data Points Visualization that Means Something http://flowingdata.com/book/

Functional Art : An introduction to information graphics and visualization by Alberto Cairo fhttp://www.thefunctionalart.com/

Languages usage in github http://langpop.corger.nl/

javascript

A streaming parser for the ESRI Shapefile spatial data format https://github.com/mbostock/shapefile

simple console for learning and experimenting with d3.js data nesting. http://bl.ocks.org/d/4748131/

D3

Climbing the d3.js Visualisation Stack : rCharts cubism … http://schoolofdata.org/2013/08/12/climbing-the-d3-js-visualisation-stack/

D3 gallery with description http://visualizing.org/galleries/made-d3js

UTM zones with D3.js http://bl.ocks.org/turban/5866872

plotting the sensors in my Android phone with d3.js and three.js http://enja.org/2012/12/08/plotting-the-sensors-in-my-android-phone-with-d3-js-and-three-js/

Binify + topoJSON + D3 = How to create awesome binned hexagon maps http://mechanicalscribe.com/notes/binify-d3-topojson-tutorial/

online book "Interactive Data Visualization for the Web" http://ofps.oreilly.com/titles/9781449339739/

Handbook of Graph Drawing and Visualization , Roberto Tamassia http://cs.brown.edu/~rt/gdhandbook/

topoJSON https://github.com/mbostock/us-atlas http://bost.ocks.org/mike/map/

install gdal to be able to convert shape file into GeoJSON : ~/dev/misc/gdal-1.9.2/apps/ogr2ogr -f GeoJSON -where "isoa2 = 'CH' AND SCALERANK < 8" chplaces.json ~/tmp/ne10m/ne10mpopulatedplaces.shp to get shapefiles : http://gadm.org/download

constraint programming

Using JuMP to Solve a TSP with Lazy Constraints http://iaindunning.com/2013/mip-callback.html

medium-level constraint modelling language http://www.minizinc.org/ student job : http://t.co/UntuMZ1sRG

Pierre Schaus operational research in scala https://bitbucket.org/oscarlib/oscar/wiki/Home

programmation par contraintes rencontres http://www.lsis.org/jfpc-jiaf2013/jfpc/

core.logic

persistent database in core.logic https://github.com/threatgrid/pldb

Path expressions through graphlike structures for clojure using core.logic https://github.com/ReinoutStevens/damp.qwal

Applicative logic meta-programming using Clojure's core.logic against an Eclipse workspace https://github.com/cderoove/damp.ekeko

cascalog

top tuples per group by Nathan Marz https://groups.google.com/forum/#!msg/cascalog-user/ih8yqyCqiT4/SqSeez15TBsJ

Usage of name-vars

(?- (stdout) (c/first-n (name-vars age ["?person" "?age"]) 10 :sort "?age" :reverse true))

The name-vars portion is necessary because the age dataset is just a vector without named fields.

clj pratice

Prismatic's Engineering Practices https://github.com/Prismatic/eng-practices

clj programming

[ANN] riddley: code-walking without caveats https://groups.google.com/forum/#!topic/clojure/a68aThpvP4o

https://github.com/ztellman/riddley

riddley.walk> (walk-exprs number? inc '(let [n 1] (+ n 1))) (let* [n 2] (. clojure.lang.Numbers (add n 2)))

how to write a correct macroexpand-all (which requires a code walker) in Common Lisp: http://www.merl.com/publications/TR1993-017/

clj math

clj matrix https://github.com/mikera/matrix-api with 2 implementions https://github.com/mikera/vectorz-clj and native BLAS

clj API

fold unfold : deep-merge

I think functions like this become pretty clear if you pull out 'unfold' and 'fold' utilites, like: https://github.com/Prismatic/plumbing/blob/master/src/plumbing/map.clj#L42

Their 'flatten' generates a seq [path value] pairs, and 'unflatten' turns that back into a map. With these, you can write your functions

(defn to-map [kv-seq] (into {} kv-seq)) ;; utility

(defn flatten-map [m kf vf] (->> m flatten (map (fn ks v [(kf ks) (vf v)])) to-map))

(defn mapf [m f & args] (->> m flatten (map (fn ks v [ks (apply f v args)])) unflatten))

(defn deep-merge-with [f & ms] (->> ms (map flatten) (map to-map) (reduce (fn [res m] (merge-with f res m))) ;; could use 'partial' unflatten))

(defn deep-merge [a b] (deep-merge-with (fn [x y] y) a b))

;; bonus: also useful for fns that don't return a map (defn max-depth [m] (->> m flatten (map (comp count first)) (apply max 0)))

deep merge https://groups.google.com/forum/?fromgroups=#!topic/clojure/UdFLYjLvNRs

(defn deep-merge "Recursively merges maps. If keys are not maps, the last value wins." [& vals] (if (every? map? vals) (apply merge-with deep-merge vals) (last vals)))

Destructuring can expressions as key

(let [{x (+ 1 1)} {2 "two"}] x)

arrows https://github.com/rplevy/swiss-arrows to compare with new threading macros as-> some-> cond->

keywordize

(into {} (for …)) (defn keywordize-keys "Recursively transforms all map keys from strings to keywords." {:added "1.1"} [m] (let [f (fn k v] (if (string? k) [(keyword k) v] [k v]))] ;; only apply to maps (postwalk (fn [x] (if (map? x) (into {} (map f x)) x)) m))

some clj patterns

Union

(set (mapcat #(… …)

monadic bind in the set monad ?

(set (apply concat (for […] […])))

(defn union-of [colls] (reduce into #{} colls))

zipmap

(into {} (map #(vector …)))

fmap in the hash-map functor ?

remove empty?

(filter seq …)

load optional dependency

https://github.com/sonian/carica/commit/eae079f4bfd1a0d50a75b11cd0f23ca73ec81797 (require 'cheshire.core) (apply (ns-resolve (symbol "cheshire.core") (symbol "parse-stream")) args)

multimethod usage on config file

(memfn getPath) instead of #(.getPath %) (defmulti load-config (comp second (partial re-find #"\.([^..]*?)$") (memfn getPath)))

(defmethod load-config "clj" [resource])

load properties file

(into {} (doto (java.util.Properties.) (.load (-> (Thread/currentThread) (.getContextClassLoader) (.getResourceAsStream "log4j.properties")))))

reduce + lazy seq : blow up ? https://groups.google.com/forum/?fromgroups=#!topic/clojure/0pcSxK9reSc

user> (defn test1 [coll] (reduce + coll)) user> (test1 (take 10000000 (iterate inc 0))) 49999995000000 user>

Now if we do:

user> (defn test2 [coll] [(reduce + coll) (reduce + coll)]) user> (test2 (take 10000000 (iterate inc 0))) OutOfMemoryError Java heap space [trace missing]

Clojure has a feature called locals clearing, which sets 'coll to nil before calling reduce in test1, because the compiler can prove it won't be used afterwards. In test2, coll has to be retained, because reduce is called a second time on it. https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/Compiler.java#L3458

deep-merge-with http://clojuredocs.org/clojure_contrib/clojure.contrib.map-utils/deep-merge-with

Like merge-with, but merges maps recursively, applying the given fn only when there's a non-map at a particular level.

(deepmerge + {:a {:b {:c 1 :d {:x 1 :y 2}} :e 3} :f 4} {:a {:b {:c 2 :d {:z 9} :z 3} :e 100}}) -> {:a {:b {:z 3, :c 3, :d {:z 9, :x 1, :y 2}}, :e 103}, :f 4}

core.reducers

Improving your Clojure code with core.reducers http://adambard.com/blog/clojure-reducers-for-mortals/

reducers https://github.com/cgrand/berlin-profiling/blob/master/src/berlin_profiling/life.clj

http://clojure.com/blog/2012/05/15/anatomy-of-reducer.html http://clj-me.cgrand.net/2013/02/11/from-lazy-seqs-to-reducers-and-back/ (defn reverse-conses ([s tail] (if (identical? (rest s) tail) s (reverse-conses s tail tail))) ([s from-tail to-tail] (loop [f s b to-tail] (if (identical? f from-tail) b (recur (rest f) (cons (first f) b))))))

(defn seq-seq [f s] (let [f1 (reduce #(cons %2 %1) nil (f (reify clojure.core.protocols.CollReduce (coll-reduce [this f1 init] f1))))] ((fn this [s] (lazy-seq (when-let [s (seq s)] (let [more (this (rest s)) x (f1 more (first s))] (if (reduced? x) (reverse-conses @x more nil) (reverse-conses x more)))))) s)))

(defmacro seq->> [s & forms] `(seq-seq (fn [n#] (->> n# ~@forms)) ~s))

(take 2 (seq->> (range) (r/map #(str (doto % prn))) (r/take 25) (r/drop 5)))

core.typed

clj concurrency

promise future agent channels by tbc++ Timothy Baldrigde https://groups.google.com/forum/#!topic/clojure/e6Tg4wXLcug

promise - creates a object that can be deref'd. The result of the promise can be delivered once, and deref-ing a undelivered will cause the deref-ing thread to block. A single producer can give a single value to multiple threads

future - just like a promise, but it the delivering code is given to the future and the future will go off and execute that code in a different thread. Single producer delivers a single value produced in a undefined thread, to multiple consumers

agents - couples a unbounded queue of functions with a single mutable value. Mutating that value is accomplished by enqueue'ing functions to be executed against that mutable state. Multiple producers use functions to modify a mutable ref. Can be deref-ed by may different consumers

channels - allow multiple producers to provide data to multiple consumers on a one-to-one basis. That is to say, a single value put into a channel can only be taken by a single consumer. However, multiple values can be inflight at a single time. This is all delivered by a bounded queue (notice the difference with unbounded agents). This allows for back-pressure, where slow producers can block faster consumers. So perhaps the best way to think about channels is a bounded mutable queue of promises

What is a "state monad binding plan" (referring to code in core.async) https://groups.google.com/forum/#!searchin/clojure/core.async/clojure/soewFCS8dAI/kaJ09e_eA7gJ

In-depth article : CLOJURESCRIPT CORE.ASYNC DOTS GAME http://rigsomelight.com/2013/08/12/clojurescript-core-async-dots-game.html

CSP is Responsive Design by David Nolen http://swannodette.github.io/2013/07/31/extracting-processes/

100k independent go blocks all running at the same time http://swannodette.github.io/2013/08/02/100000-processes/

Hoare examples implemente with core.async https://github.com/nodename/async-plgd

core.async: communicating termination https://groups.google.com/forum/#!topic/clojure/_KzEoq0XcHQ

clj image

Image analysis with Clojure and OpenCV: A face detection example http://nils-blum-oeste.net/image-analysis-with-clojure-up-and-running-with-opencv/#.UafoMfH2kuk

Music

coursera class on music technology https://www.coursera.org/course/musictech

clj devops

use leiningen for scala project scalding https://github.com/masverba/scalding-on-leiningen

clj data computation

Experimental combination of core.logic and core.matrix to allow reasoning with vectors / mathematical expressions https://github.com/clojure-numerics/expresso

client cassandra thrift https://gist.github.com/daveray/5464943

clj java

jav.nio2 wrapper https://github.com/juergenhoetzel/clj-nio2

(ns test.nio2.test.tail (:use clojure.java.io nio2.io nio2.watch nio2.files))

(defn tail [n p] "Print the last n lines of path p to stdout" (with-open [rdr (reader p)] (doseq [l (take-last n (line-seq rdr))] (println l)) (doseq [e (watch-seq (parent (real-path p)) :modify)] (when (= (real-path (:path e)) (real-path p)) (while (.ready rdr) (println (.readLine rdr)))))))

clj libraries

A macro-based refactoring library for Clojure https://github.com/ctford/poker

Utility libraries and dependency hygiene https://groups.google.com/group/clojure/browse_frm/thread/5ae4b7d514a2cff0

Parallel universes for namespaces https://github.com/technomancy/metaverse

Twitter-api [twitter-api "0.7.4"] https://github.com/adamwynne/twitter-api

Geohash library for clojure by @sunng https://bitbucket.org/sunng/clojure-geohash

misc clj

detect language with com.cybozu.labs.langdetect.DetectorFactory https://gist.github.com/cemerick/5457242

ssierra lib on namespace

slamhound to install on emacs to write require/import for you http://www.lispplusplus.com/2012/12/slamhound-130-cleaning-up-all-your.html

fast idiomatic pretty-printer https://github.com/brandonbloom/fipp

display vector , hash as ASCII table https://github.com/owainlewis/tabular clojure.pprint/print-table is for maps only

clojure table layout https://github.com/joegallo/doric

clj machine learning

review code on levenshtein algo and memoization https://groups.google.com/forum/#!topic/clojure/w6SRYE4n6pc

clojure wrapper on top various nlp libs https://github.com/jimpil/hotel-nlp

clj server

clj perf

Proteus: local mutable variables for the masses by Zach Tellman https://github.com/ztellman/proteus https://groups.google.com/forum/#!topic/clojure/7HNNiJJTte4

A simple IO library for using Clojure's reducers https://github.com/thebusby/iota/

clj webdev

webframework à la django https://github.com/caribou

websockets with http-kit https://github.com/cgmartin/clj-wamp

JSON on steroid inpired by EDN https://github.com/lynaghk/json-tagged-literals

Building an iOS weather app with Angular and ClojureScript http://keminglabs.com/blog/angular-cljs-mobile-weather-app/

clj GUI

cljs

Purnam - AngularJs Language Extensions for Clojurescript Inspired by lispyscript, coffescript and clang https://github.com/zcaudate/purnam

cljs properties access http://dev.clojure.org/display/design/Unified+ClojureScript+and+Clojure+field+access+syntax

good (.-MAXNUMBER js/Math) and (.ceil js/Math 3.14) not clojure compatible js/Math.MAXNUMBER and (js/Math.ceil 3.14)

parser

Functional parsing library from chapter 8 of Programming in Haskell http://www.cs.nott.ac.uk/~gmh/Parsing.lhs

IDE

Cursive is the Clojure IDE that understands your code http://cursiveclojure.com/

lein faster https://github.com/technomancy/leiningen/wiki/Faster

Fast JVM launching without the hassle of persistent JVMs. https://github.com/flatland/drip/

lein startup https://groups.google.com/group/clojure/browse_thread/thread/7b96718933962f35

> I take this to mean that there's no widely accepted solution.

The widely-accepted solution is to leave a single process running. It certainly has limitations, but it's the way most people deal with the problem. > Really, I just want `lein run` to be faster. Can someone explain where all > this time is spent?

Basically it comes from having to load two JVMs, one for Leiningen itself and one for the project. Leiningen itself is fairly optimized for this (fully AOTed, bytecode verification is turned off, fancy warm-up JIT techniques disabled) which is why it's possible to get `lein version` to return in under a second in some cases. But there are various compatibility issues that prevent us from being able to perform the same optimizations on project JVMs. These are documented on the "Faster" page of the Leiningen wiki, and you can do some testing to determine whether or not they affect your project in particular; if not then they should provide a good boost. But nothing will ever come close to the speed of keeping the JVM resident, which is why I'm working on `:eval-in :nrepl` and lein.el. For people who don't use Emacs, Jark is the only tool I'm aware of that is working towards this in a way that's decoupled from the editor. They could probably use some help both testing and implementing it. > I hear a lot of talk of compiling, but why would we re-compile things where > none of the dependencies have changed?

Performing a full AOT of all your dependencies will help if you have a large project with lots of dependencies that get loaded at application boot. But that effect would be something along the lines of bringing boot down from 20s to 12s rather than bringing it from 5s to <1s.

org-mode tips

(with-out-str (print-table [{:a 1 :b 2 :c 3} {:b 5 :a 7 :c "dog"}]))

(Using with-out-str is needed because print-table of course returns nil)

But what I get when generating HTML (via "C-c C-e b") is not a table, but the literal text of the table markup. I.e. compiling the above source block yeilds:

Tech stuff

Creating network and connecting from anywhere where people are interested. http://guifi.net/en/node/38392

Prism All your data, in one place http://prism.andrevv.com/

A [work-in-progress] self-hosted, anti-social RSS reader https://github.com/swanson/stringer

Best Content Discovery Application ?

Flipboard Instapaper Pinterest Prismatic Tumblr

Fablab

Education

programming school http://codeclub.org.uk/

experiments howto teachkids to programs scratch http://snapcircuits.net/ http://technomancy.us/167

Potential projects to be completed: http://www.datasciencecentral.com/profiles/blogs/proposal-for-an-apprenticeship-in-data-science

  • hacking and reverse-engineering projects (TBD)
  • web crawling projects: how many Facebook accounts are duplicate or dead? Or categorize Tweets
  • taxonomy creation or improving an existing taxonomy
  • optimal pricing for bid keywords on Google
  • create a web app that provide (in real time) better-than-average trading signals
  • find low-frequency and Botnet fraud cases in a sea of data
  • internship in computational marketing with a data science start-up
  • automated plagiarism detection
  • use web crawlers, assess whether Google Search favors
    • (1) its products over competitors [is this an unfair business practice?],
    • (2) local over non-local results and
    • (3) returns different results to web robots and humans. Identify other bias and patterns in Google search results.

http://www.slate.com/articles/technology/top_right/2011/08/flipping_the_classroom.html

Classroom flipping means assigning lectures as homework, leaving actual classroom time for hands-on instruction and group work. Ng told me his class at Stanford is already doing this, and he’s encouraging other professors to adopt the approach for their Coursera classes as well.

Research

Douglas C. Engelbart A research for augmenting human Intellect http://sloan.stanford.edu/mousesite/1968Demo.html

Marketing

Security

Jobs

Author: Maximilien

Validate XHTML 1.0