Search This Blog

Monday, October 17, 2011

Java Async IO Package

We need Async IO Package for sheer performance and scalability!
Performance and scalability are key attributes of the IO system for IO-intensive applications. IO-intensive applications are typically, although not exclusively, server-side applications. Server-side applications are characterized by the need to handle many network connections to many clients and also by the need to access many files to serve requests from those clients. The existing standard Java facilities for handling network connections and files do not serve the needs of server-side applications adequately. The java.io and java.net packages provide synchronous IO capabilities, which require a one-thread-per-IO-connection style of design, which limits scalability since running thousands of threads on a server imposes significant overhead on the operating system. The New IO package, java.nio, addresses the scalability issue of the one-thread-per-IO-connection design, but the New IO select() mechanism limits performance.

SQL Vs NoSQL


The advantage of a relational database is the ability to relate and index information. Most key-value systems don't provide that.
Does switching to nosql really make sense for the intended use case?
You have kind of missed the point. The point is, you don't have an index. You don't have a centralized list of records, or the ability to relate it together in any easy way. What makes nosql key-value stores so quick is that you store and retrieve what you need in a name-based approach. You need that blurb on someone's profile page? Just go fetch it. No need to maintain a table with everything in it. This being said, NoSQL has a number of novel structure which make many usecases trivially easy, e.g. Redis is a data-structure oriented DB well-suited to rapidly building anything with queues. MongoDB is a freeform document database which stores documents as JSON.
Not everything really needs to be tabular.
There's advantages and disadvantages. Sometimes using a mix of both can also make sense. SQL for most, and something along the lines of CouchDB for random things that have no need to be clogging up an SQL table.
You can liken a key-value system to making an SQL table with two columns, a unique key and a value. This is quite fast. You have no need to do any relations or correlations or collation of data. Just find the value and return it. This is an oversimplification, NoSQL databases do have a lot of nifty functions beyond simple K,V stores.
You'll find a simple K,V store is also fast in SQL databases. I've used it in place of actual key-value systems before NoSQL databases matured a bit.
I do not think scientific data is well suited to a nosql implementation, but if you look at HBase, it may well suit a scientist's needs.
The efficiency comes from the following areas:
1. The database has far fewer functions: there is no concept of a join and lessened or absent transactional integrity requirements. Less function means less work means faster, on the server side at least.
2. Another design principle is that the data store lives in a cloud of servers so your request may have multiple respondents. These systems also claim the multi-server system improves fault tolerance through replication.