tag:blogger.com,1999:blog-7202566144881721121.post1720846150936751331..comments2023-08-13T07:42:21.728-07:00Comments on Matt On Stuff: Hadoop For The Rest Of UsMatt Kapilevichhttp://www.blogger.com/profile/17382952856628088427noreply@blogger.comBlogger6125tag:blogger.com,1999:blog-7202566144881721121.post-57158181405207011562013-08-10T19:43:41.321-07:002013-08-10T19:43:41.321-07:00Shash - great question. 150 files will be stored, ...Shash - great question. 150 files will be stored, and 150 Mappers will be launched. Small files are a problem for Hadoop, it's not efficient at processing them. Possible workarounds are to combine files together into bigger files before processing them, or to use CombineFileRecordReader when reading in these files. If you google "hadoop small file problem", the first few hits have a lot of great info on the topic.Matt Kapilevichhttps://www.blogger.com/profile/17382952856628088427noreply@blogger.comtag:blogger.com,1999:blog-7202566144881721121.post-48929259127945196892013-07-23T01:45:06.633-07:002013-07-23T01:45:06.633-07:00Nice article...just a quick question....Suppose I ...Nice article...just a quick question....Suppose I have 150 files of 1 MB each. And my HDFS block size is 64 MB. In this case, how the files will be stored and how many mappers will be launched?Shashhttps://www.blogger.com/profile/09724100924652232787noreply@blogger.comtag:blogger.com,1999:blog-7202566144881721121.post-15502477918692640002012-06-19T14:05:51.741-07:002012-06-19T14:05:51.741-07:00You're very welcome, glad you found it useful....You're very welcome, glad you found it useful.Matt Kapilevichhttps://www.blogger.com/profile/17382952856628088427noreply@blogger.comtag:blogger.com,1999:blog-7202566144881721121.post-59611825036136585822012-06-19T10:11:34.578-07:002012-06-19T10:11:34.578-07:00First thanks for such a nice article.I am newbie t...First thanks for such a nice article.I am newbie to the hadoop and i want to know about the small size problems in hadoop and why the block size of hadoop is 64MB.This article helps me out to solve my problem...thanks again for this knowledge-full article.vaibhavhttps://www.blogger.com/profile/14290388529122269908noreply@blogger.comtag:blogger.com,1999:blog-7202566144881721121.post-86956276618983368702012-05-05T09:26:03.000-07:002012-05-05T09:26:03.000-07:00CouchDB is a NoSQL database, primarily used for st...CouchDB is a NoSQL database, primarily used for storing documents. Hadoop is a data-processing framework. It's not a database. Within the Hadoop ecosystem, HBase is most similar to CouchDB. Hope this helps.Matt Kapilevichhttps://www.blogger.com/profile/17382952856628088427noreply@blogger.comtag:blogger.com,1999:blog-7202566144881721121.post-35575495751497002082012-05-04T01:27:12.228-07:002012-05-04T01:27:12.228-07:00Nice article! Can you also tell how Hadoop isdiffe...Nice article! Can you also tell how Hadoop isdifferent from datastore like CouchDB? They also seem to support map/reduce for data processing. I read somewhere that Hadoop was better for processing large data set and CouchDB was more oriented towards Web application.Testerhttps://www.blogger.com/profile/04247034587095886474noreply@blogger.com