Tuesday, August 17, 2004
    
	
	
        
           
        
	
	Abstract:  The web harbors a large number of communities -- groups of content-creators sharing a common interest -- each of which manifests itself as a set of interlinked web pages.  Newgroups and commercial web directories together contain of the order of  20000 such communities; our particular interest here is on emerging communities -- those that have little or no representation in such fora.  The subject of this paper is the systematic enumeration of over 100,000 such emerging communities from a web crawl: we call our process trawling.  We motivate a graph-theoretic approach to locating such communities, and describe the algorithms, and the algorithmic engineering necessary to find structures that subscribe to this notion, the challenges in handling such a huge data set, and the results of our experiment.
 
        
	
			Comments:
			
                         
			
	
	
 
 
  
