www.dataworldwide.org

Home

Search for Data

Data Sources by Topic

All Indexes & Rankings

Find Index

Global Demography

Urbanization

Population Aging

Public Health

Education

Economy

Poverty

Policy, Society, Culture 

Global Financial System

Environment

Conflict, War, Strategy

Country Profiles

City Data & Statistics

Data on Africa

Data Sources: Institutions

All Statistical Offices  

All Central Banks

Selected "Think Tanks"

"Deep Web" Sites

Other Institutions with Data

Economic Research Centers

Intelligence / Surveillance

Data Visualization

Tools

Examples

News

WorldNewsLinks.org

Other

About

Support

Mobile Version

Privacy Policy

Data in "Deep Web" Sites

Updated: 7 July 2014

When you use one of the popular search engines, such as Google, Bing or Yahoo, the search results reflect only a very thin layer of the information available on the Internet. It is like fishing the shallow surface layer of the ocean. There is an enormous volume of information available in the "deep waters" of the Internet. This has several reasons:

1. Search engines usually cannot index web pages that are dynamically created from a database depending on specific user requests. What is in the database is invisible to a search engine crawler.

2. Search engines typically scan and index only a few levels of hierarchically organized websites. What is hidden away deep down in a large website is not necessarily indexed.

3. Search engines cannot index content that is constantly updated. Their crawlers typically visit web pages only from time to time - so their index is always outdated to some extent.

4. Search engines usually do not index web pages for which the web developer has set up certain crawler restrictions. Websites of government agencies, the military, intelligence organizations, certain large corporations, and research centers rigorously control what the crawlers of search engines can "see" and index.

5. Finally, there are also special technologies available that can make websites and certain activities on the Internet "invisible" or at least extremely hard to trace (TOR-technology, IP-hiding, etc.)

One group of websites in the Deep Web are the on-line shopping, real-estate and auction sites, such as e-Bay, REALTOR, or Amazon, which have huge amounts of content (such as product information, ads, reviews etc.) stored in databases. Of course, one can find these websites in search engines, but not necessarily the specific content they publish from their database. Fortunately, these commercial sites are not relevant in our context.

Another group of websites and web-activities that usually cannot be found with the common search engines are those that intentionally obscure or hide their existence. This "Dark Web" consists of the sites and services of criminals, terrorist groups or Internet users who utilize certain technologies to hide or obscure their activities (TOR-Technology).

Secret activities of intelligence agencies on the Internet, military communications and the protected systems used by law enforcement are also part of the "Dark Web", which cannot be found by everyone through a simple Google search.

However there are also websites of libraries, registers, government agencies, research networks or international organizations that have large amounts of information in specialized databases. Deep down in these public websites there are vast amounts of textual, numerical, visual or acoustic data - hidden away in databases or deeply layered link collections. The information in these on-line databases can be most interesting and relevant to all kinds of research activities. Unfortunately, the data are hard to find, because they can be only accessed through their respective websites - which are often rather obscure, badly organized, or excessively hard to use. Some governments, international organizations and even private companies have therefore tried to consolidate their various databases and make them more easily accessible through "open data" platforms. For instance, the World Bank is accumulating all kinds of data from United Nations Agencies to provide a unified open data platform. But there are numerous such initiatives from governments, organizations and private businesses, so that in the end a data analyst has to know and visit hundreds of such sites for a complete picture of what data are available.

Below is a preliminary list of some of these websites that contain particularly large amounts of data and other kinds of information, which cannot be found through the usual search engines.

Open data platforms

The World Bank: Open Data project

United Nations, Statistics Division: UN Data (Unified data service for the global community)

World Health Organization (WHO) of the United Nations: Global Health Observatory

U.S. Government Open Data Site

U.S. Census Bureau

U.S. Library of Congress Online Catalog

U.S. Government Printing Office: Official Information Products

U.S. PTO - Trademarks + Patents

U.S. National Cancer Institute: Cancer Statistics

InfoUSA: Business & Consumer Data

U.S. National Oceanographic (combined with Geophysical) Data Center (NOAA)

U.S. National Climatic Data Center (NOAA)

NASA Image Exchange: Technical Reports Server

NASA: Earth Observing System Data and Information System (OSDIS)

NASA: High Energy Astrophysics Science Archive Research Center (HEASARC)

U.S. National Libarary of Medicine (National Insitutes of Health) PubMed / MEDLINE

U.S. National Library of Medicine: Digital Collections

U.S. National Institutes of Health: GenBank (National Center for Biotechnology Information)

U.S. Center of Disease Control (CDC): Wide-ranging Online Data for Epidemiological Research

U.S. Securities and Exchange Commission (SEC)

Australia: Government Open Data Site

United Kingdom: Government Open Data Site

Integrum World Wide (Russia)

Informedia (Carnegie Mellon Univ.)

Astronomer's Bazaar

 

Advertisement

Don MacLeod (2012) How to Find Out Anything: From Extreme Google Searches to Scouring Government Documents, a Guide to Uncovering anything About Everyone and Everything. Prentice Hall Press

Conrad Jaeger (2013) Deep Web Secrecy and Security - including Deep Search. DeepWebGuides.com

Gary Price / Chris Sherman (2014) The Invisible Web: Uncovering Information Sources Search Engines Can't See. Information Today

Lance Henderson (2014) Darknet: A Beginner's Guide to Staying Anonymous Online. CreateSpace Independent Publishing Platform

In Association with Amazon.com


 

Comment & Suggestion:

Unfortunately, many governments, non-governmental organizations and private businesses are developing their own "open data" platforms - which are, of course, not compatible with each other. Data organization, query forms, documentation, delivery and data visualization are completely different from platform to platform.

As usual, it is profit-orientation, excessive patriotism, technological obstinacy or plain arrogance which are motivating the various data providers to develop their own "open data" solutions. A globalized world, however, would need one global platform for (economic, social, environmental, and other) data that could be of benefit to all people worldwide.

The right place for developing an international "Open Data" platform would be the United Nations. They already have systems in place to collect, organize and publish statistical (and other) data from all countries of the world. Usually, these data are reported voluntarily by the member countries and can be used free of charge by everyone.

 
No IP tracking
 
 

Copyright 2014, 2015 by Gerhard K. Heilig. All rights reserved.

Updated: 20 May 2015