Posts Tagged ‘statistics’
The Mathematics of Changing Your Mind
Drupal contributor statistics
I recently extracted some data from the Drupal project's CVS and Git logs to see how the number of code contributors and total contributions have changed over time. If there was any doubt of our continual growth, the resulting charts demolish it.
Aggregated results from core and contributed modules.
Aggregated results from core and contributed modules.
As can be seen from the graphs, there is a pretty big spike in commit activity post-Git migration.
jStat – A Statistical Library With JavaScript
jStat is a JavaScript library which enables you to perform advanced statistical operations without the need of a dedicated statistical language.
Simply, it focuses on being a real JavaScript-based alternative for languages like R and MATLAB.
The library is standalone, however, for the plotting functionality, it requires jQuery, jQuery UI and jQuery-flot plugin.
Special Downloads:
Ajaxed Add-To-Basket Scenarios With jQuery And PHP
Free Admin Template For Web Applications
jQuery Dynamic Drag’n Drop
ScheduledTweets
Advertisements:
Professional XHTML Admin Template ($15 Discount With The Code: WRD.)
Psd to Xhtml
SSLmatic – Cheap SSL Certificates (from $19.99/year)
Twitter API Calls Doubled Since April: Now Serving 70,000 Every Second
Have you ever wondered how much traffic Twitter handles in a given day, or what software sits behind the curtain of the popular service? A recent presentation reveals some of the answers. Twitter’s incredible growth becomes obvious when you compare the recent numbers to those announced at Chirp.
On September 9th, Twitter’s university recruiting team stopped by UC Berkeley to talk about the company and what it does. The slides from platform engineer Raffi Krikorian’s talk, Twitter by the Numbers, are now online, and they disclose some fascinating technical details about the social media giant’s operations.
Twitter serves over 70 million tweets per day, totaling over 12GB of tweet text alone. Many of those messages are delivered to client apps and web sites through the Twitter API to the tune of six billion API calls per day (double what was announced at Chirp in April), or about 70,000 API calls per second. All told, the service generates 8TB of data every day, which is eight times more than the New York Stock Exchange.
Pop quiz, engineers: Your web service needs to deliver real-time message traffic to an asymmetric digraph of over 150 million users. What database do you use? WHAT DATABASE DO YOU USE?
- Shoot the hostage
- Oracle
- MySQL
- Write your own database
If you’re Twitter, the correct answer is #4: Create your own database software, call it FlockDB, and release it on github. (By the way, if you picked option 1, maybe software engineering isn’t the right career choice for you.)
FlockDB is just one of the home-grown, high-performance software systems Twitter uses to support its tremendous growth. Others include:
- hosebird, a “near real-time†streaming API back-end (instead of REST, which is only “pseudo real-timeâ€); and
- snowflake (also on github), a network service to generate unique IDs at high scale (MySQL couldn’t keep up, and was a single point of failure).
With a stated goal of supporting “half the world and all its devices,†Twitter faces many engineering challenges. This peek under the hood (full slides are embedded below) shows that they’re aware of the potential problems, and are working hard to steer clear of the fail whale.