Sunday, 30 June 2013

What Happens When Municipalities Use Rich Data Mining Against Home Businesses to Collect Tax?

Do you will realize how many Americans run a small home business? The number is staggering, and did you know that 10% of our population is self-employed, and that is something like 30 million Americans. That same 30 million Americans, also represents a group that hires over 65% of our population in their small businesses. Folks that started these little firms might expand their business and eventually hire someone, grow their business larger, actually make it into a real company. I'd say that's a good thing, and it shows that the entrepreneurial spirit in the US is alive and well.

Many people don't seem to be aware of these figures or how important they are. Did you know that 10% of our population is self-employed? Don't worry, you're not the only one who hasn't figured this out, even the President of the United States doesn't understand, or obviously he wouldn't have made that political faux pas telling small business people that they didn't build that, or that they couldn't have built their business had it not been for the government providing such a wonderful civilization and society for them to participate in.

Yes, I was a little miffed when he said that as well, because it isn't true, and I've been self-employed my entire life and I've loved my country my entire adult life as well, as have you. Now then, many municipalities are stretched thin with their budgets. Often they owe 60% of all the money they take in, in legacy cost, that is to say pensions, retirement, and health care for people who have already retired from their city employment. That means only 40% of all the money they take collect taxes actually goes to the current city services.

How can any business, much less a government operate on 40% of its income? It can't, and perhaps that's why three cities in California have filed for bankruptcy, along with a couple of other big bankruptcy municipality cases; Birmingham Alabama and Harrisburg Pennsylvania. With city budgets stretched thin they have no choice but to collect more money, and that means finding more ways to tax more people. Most cities require that if you start a business you have to get a business license, and it is considered a tax.

In some cities these taxes are only a $100 or less depending on the type of business you run, but in other cities they can run as much as $500. Most people that start a small business, especially a little home-based business don't bother to register for their business license. They don't make enough money to even afford that when they first start. But guess what? Soon I am almost positive that all these municipalities will be running rich data mining programs, and/or pay other companies to give them information about anyone who resides in their city was running a business.

They will then of course check this data and all these names against all their business licenses. If you run a business and you don't have a business license but you are doing business online, or it is mentioned on your Facebook page, you will not only have to pay the business license registration fee, you will also be charged with a penalty which could but be two or three times that amount. The cities will then have more revenue to spend by attacking small businesses just barely getting off the ground. Welcome to the future of data mining and your government. Please consider all this and think on it.


Source: http://ezinearticles.com/?What-Happens-When-Municipalities-Use-Rich-Data-Mining-Against-Home-Businesses-to-Collect-Tax?&id=7277878

Friday, 28 June 2013

Using Forensic Social Media Data Mining to Discover Work Comp Fraud

As an employer, most of your employee based work comp claims are completely legitimate and should be handled in the best interest of the injured employee. Unfortunately, some are also fraudulent, causing increasing costs on baseless claims. Historically, it's been challenging to contest some of these claims, though recently a new road has opened, allowing enlightened employers to more rapidly travel this road to a truthful outcome.

Employers should now consider the usefulness of Facebook, YouTube, LinkedIn and other social media sites which can contain posts negating the claims of allegedly injured workers participating in activities that are beyond the restrictions placed by the treating physician. These posts can happen on any given day, clearly elucidating a fraudulent claim. For example, let's say that an employee is out of work based on a "work comp" (workers compensation) claim which has restrictions, yet they post links, discussions, comments and photos that are clearly incriminating. This is an area in which companies should be mining regularly to protect your company against out of scope workers compensation claims.

Recently, a transportation attorney based in central Pennsylvania was a guest speaker at an insurance transportation web seminar. His topic was the aggressive defense of trucking lawsuits and he elaborated extensively on an example of forensic social media investigation to assist in the aggressive defense of frivolous lawsuits. I recall that the metrics were impressive; one example noted that a $250,000 claim which was reduced to $2,500 when forensic social media data mining found New Years Eve photos proving the claimant's mobility to be much greater than stipulated in the law suit.

Social media offers a surprising if not inadvertent glimpse into the nuances of the lifestyles of anyone using it, and in certain cases, it also offers important evidence into the veracity of work comp and work comp lawsuit based claims. Employers should be aware of this avenue and investigate accordingly.

The D'Camera Group http://www.dcameragroup.com partners with businesses, creating long range plans to manage their Total Cost of Risk. D'Camera Group's proprietary approach bridges the gap between the client and the insurance marketplace through a series of engagements in which we Discover, Design and Implement a Risk Reduction Plan™. Our specialized business insurance services include risk discovery and assessment, risk reduction strategies, insurance loss management, workers compensation, experience mod factors and improving marketplace competitiveness.


Source: http://ezinearticles.com/?Using-Forensic-Social-Media-Data-Mining-to-Discover-Work-Comp-Fraud&id=5027122

Wednesday, 26 June 2013

Data Mining Process - Why Outsource Data Mining Service?

Overview of Data Mining and Process:
Data mining is one of the unique techniques for investigating information to extract certain data patterns and decide to outcome of existing requirements. Data mining is widely use in client research, services analysis, market research and so on. It is totally based on mathematical algorithm and analytical skills to drive the desired results from the huge database collection.

Information mining is mostly used by financial analyzer, business and professional organization and also there are many growing area of business that are get maximum advantages of data extract with use of data warehouses in their small to large level of businesses.

Most of functionalities which are used in information collecting process define as under:

* Retrieving Data

* Analyzing Data

* Extracting Data

* Transforming Data

* Loading Data

* Managing Databases

Most of small, medium and large levels of businesses are collect huge amount of data or information for analysis and research to develop business. Such kind of large amount will help and makes it much important whenever information or data required.

Why Outsource Data Online Mining Service?

Outsourcing advantages of data mining services:
o Almost save 60% operating cost
o High quality analysis processes ensuring accuracy levels of almost 99.98%
o Guaranteed risk free outsourcing experience ensured by inflexible information security policies and practices
o Get your project done within a quick turnaround time
o You can measure highly skilled and expertise by taking benefits of Free Trial Program.
o Get the gathered information presented in a simple and easy to access format

Thus, data or information mining is very important part of the web research services and it is most useful process. By outsource data extraction and mining service; you can concentrate on your co relative business and growing fast as you desire.

Outsourcing web research is trusted and well known Internet Market research organization having years of experience in BPO (business process outsourcing) field.

If you want to more information about data mining services and related web research services, then contact us.


Source: http://ezinearticles.com/?Data-Mining-Process---Why-Outsource-Data-Mining-Service?&id=3789102

Monday, 24 June 2013

One of the Main Differences Between Statistical Analysis and Data Mining

Two methods of analyzing data that are common in both academic and commercial fields are statistical analysis and data mining. While statistical analysis has a long scientific history, data mining is a more recent method of data analysis that has arisen from Computer Science. In this article I want to give an introduction to these methods and outline what I believe is one of the main differences between the two fields of analysis.

Statistical analysis commonly involves an analyst formulating a hypothesis and then testing the validity of this hypothesis by running statistical tests on data that may have been collected for the purpose. For example, if an analyst was studying the relationship between income level and the ability to get a loan, the analyst may hypothesis that there will be a correlation between income level and the amount of credit someone may qualify for.

The analyst could then test this hypothesis with the use of a data set that contains a number of people along with their income levels and the credit available to them. A test could be run that indicates for example that there may be a high degree of confidence that there is indeed a correlation between income and available credit. The main point here is that the analyst has formulated a hypothesis and then used a statistical test along with a data set to provide evidence in support or against that hypothesis.

Data mining is another area of data analysis that has arisen more recently from computer science that has a number of differences to traditional statistical analysis. Firstly, many data mining techniques are designed to be applied to very large data sets, while statistical analysis techniques are often designed to form evidence in support or against a hypothesis from a more limited set of data.

Probably the mist significant difference here, however, is that data mining techniques are not used so much to form confidence in a hypothesis, but rather extract unknown relationships may be present in the data set. This is probably best illustrated with an example. Rather than in the above case where a statistician may form a hypothesis between income levels and an applicants ability to get a loan, in data mining, there is not typically an initial hypothesis. A data mining analyst may have a large data set on loans that have been given to people along with demographic information of these people such as their income level, their age, any existing debts they have and if they have ever defaulted on a loan before.

A data mining technique may then search through this large data set and extract a previously unknown relationship between income levels, peoples existing debt and their ability to get a loan.

While there are quite a few differences between statistical analysis and data mining, I believe this difference is at the heart of the issue. A lot of statistical analysis is about analyzing data to either form confidence for or against a stated hypothesis while data mining is often more about applying an algorithm to a data set to extract previously unforeseen relationships.


Source: http://ezinearticles.com/?One-of-the-Main-Differences-Between-Statistical-Analysis-and-Data-Mining&id=4578250

Friday, 21 June 2013

Data Recovery 101

Almost all computer users have experienced this at least once - the need to get back a deleted /lost data file. This could happen as a result of a software failure, hardware failure, human error, power related problems, damage caused by flood / water, vandalism, virus damage, damage by fire / heat / smoke and sabotage. Whatever the cause and reason that you need data recovery there is no reason to panic, for help is at hand. The need and urgency to recover data has resulted in a plethora of data recovery software to rescue you from a crisis like situation.

Unless the hard disk is not working normally, the need for professional service is almost rendered unnecessary. If the hard disk is not making any weird noise like scratching, scraping or ticking (which means it is in good condition) data recovery can be done with the use of proper data recovery software, without the help of any technical personnel. The data recovery software that is available can be used for Mac, NT/2000/XP and RAID data recovery. The data recovery software is also FAT and MFT compliant.

Hard drive data recovery is possible from small hard drives of 2 GBs to big hard drives of 120 GBs. Hard drive data recovery requires the presence of technicians if there is a hard drive crash.

Data recovery software used for NT data recovery provides recovery of deleted files from the recycle bin, partition recovery from deleted partition or formatted logical drives, from lost folders and performs data recovery even if MFT is severely corrupted. NT data recovery software also recovers emails and all forms of files. Mac data recovery software recovers HFS and HFS+ File System Data. Mac data recovery software also recovers partition if partitions are deleted or formatted, files from Lost or Missing Mac folders. Mac data recovery software recognizes and preserves long file names when recovering Mac files and folders as well as provides full support for IDE, EIDE, SCSI and SATA drives.

'Redundant Array of Inexpensive Disks' or RAIDS offers better data recovery chances as long as the drives are cloned. RAID is a collection of hard disks that act as a single better hard disk than the individual ones. The hard disks of RAID operate independent of each other. A single drive failure is absorbed by RAID and does not result in loss of data. However, when RAID fails, it fails big time and then RAID data recovery software is used to retrieve data. Raid data recovery software recovers both RAID software and hardware.

Natalie Aranda writes about Internet [http://www.rectonet.com/Internet-24/], information technology and computers. Data recovery software used for NT data recovery provides recovery of deleted files from the recycle bin, partition recovery from deleted partition or formatted logical drives, from lost folders and performs data recovery even if MFT is severely corrupted. NT data recovery software also recovers emails and all forms of files. Mac data recovery software recovers HFS and HFS+ File System Data. Mac data recovery software also recovers partition if partitions are deleted or formatted, files from Lost or Missing Mac folders.


Source: http://ezinearticles.com/?Data-Recovery-101&id=149174

Thursday, 20 June 2013

Online Data Entry and Data Mining Services

Data entry job involves transcribing a particular type of data into some other form. It can be either online or offline. The input data may include printed documents like Application forms, survey forms, registration forms, handwritten documents etc.

Data entry process is an inevitable part of the job to any organization. One way or other each organization demands data entry. Data entry skills vary depends upon the nature of the job requirement, in some cases data to be entered from a hard copy formats and in some other cases data to be entered directly into a web portal. Online data entry job generally requires the data to be entered in to any online data base.

For a super market, data associate might be required to enter the goods which have sold in a particular day and the new goods received in a particular day to maintain the stock well in order. Also, by doing this the concerned authorities will get an idea about the sale particulars of each commodity as they requires. In another example, an office the account executive might be required to input the day to day expenses in to the online accounting database in order to keep the account well in order.

The aim of the data mining process is to collect the information from reliable online sources as per the requirement of the customer and convert it to a structured format for the further use. The major source of data mining is any of the internet search engine like Google, Yahoo, Bing, AOL, MSN etc. Many search engines such as Google and Bing provide customized results based on the user's activity history. Based on our keyword search, the search engine lists the details of the websites from where we can gather the details as per our requirement.

Collect the data from the online sources such as Company Name, Contact Person, Profile of the Company, Contact Phone Number of Email ID Etc. are doing for the marketing activities. Once the data is gathered from the online sources into a structured format, the marketing authorities will start their marketing promotions by calling or emailing the concerned persons, which may result to create a new customer. So basically data mining is playing a vital role in today's business expansions. By outsourcing the data entry and its related works, you can save the cost that would be incurred in setting up the necessary infrastructure and employee cost.


Source: http://ezinearticles.com/?Online-Data-Entry-and-Data-Mining-Services&id=7713395

Tuesday, 18 June 2013

Data Mining for Dollars


The more you know, the more you're aware you could be saving. And the deeper you dig, the richer the reward.

That's today's data mining capsulation of your realization: awareness of cost-saving options amid logistical obligations.

According to global trade group Association for Information and Image Management (AIIM), fewer than 25% of organizations in North America and Europe are currently utilizing captured data as part of their business process. With high ease and low cost associated with utilization of their information, this unawareness is shocking. And costly.

Shippers - you're in prime position to benefit the most by data mining and assessing your electronically-captured billing records, by utilizing a freight bill processing provider, to realize and receive significant savings.

Whatever your volume, the more you know about your transportation options, throughout all modes, the easier it is to ship smarter and save. A freight bill processor is able to offer insight capable of saving you 5% - 15% annually on your transportation expenditures.

The University of California - Los Angeles states that data mining is the process of analyzing data from different perspectives and summarizing it into useful information - knowledge that can be used to increase revenue, cuts costs, or both. Data mining software is an analytical tool that allows investigation of data from many different dimensions, categorize it, and summarize the relationships identified. Technically, data mining is the process of finding correlations among dozens of fields in large relational databases. Practically, it leads you to noticeable shipping savings.

Data mining and subsequent reporting of shipping activity will yield discovery of timely, actionable information that empowers you to make the best logistics decisions based on carrier options, along with associated routes, rates and fees. This function also provides a deeper understanding of trends, opportunities, weaknesses and threats. Exploration of pertinent data, in any combination over any time period, enables you the operational and financial view of your functional flow, ultimately providing you significant cost savings.

With data mining, you can create a report based on a radius from a ship point, or identify opportunities for service or modal shifts, providing insight regarding carrier usage by lane, volume, average cost per pound, shipment size and service type. Performance can be measured based on overall shipping expenditures, variances from trends in costs, volumes and accessorial charges.

The easiest way to get into data mining of your transportation information is to form an alliance with a freight bill processor that provides this independent analytical tool, and utilize their unbiased technologies and related abilities to make shipping decisions that'll enable you to ship smarter and save.


Source: http://ezinearticles.com/?Data-Mining-for-Dollars&id=7061178

Sunday, 16 June 2013

Data Extraction Services For Better Outputs in Your Business

Data Extraction can be defined as the process of retrieving data from an unstructured source in order to process it further or store it. It is very useful for large organizations who deal with large amount of data on a daily basis that need to be processed into meaningful information and stored for later use. The data extraction is a systematic way to extract and structure data from scattered and semi-structured electronic documents, as found on the web and in various data warehouses.

In today's highly competitive business world, vital business information such as customer statistics, competitor's operational figures and inter-company sales figures play an important role in making strategic decisions. By signing on this service provider, you will be get access to critivcal data from various sources like websites, databases, images and documents.

It can help you take strategic business decisions that can shape your business' goals. Whether you need customer information, nuggets into your competitor's operations and figure out your organization's performance, it is highly critical to have data at your fingertips as and when you want it. Your company may be crippled with tons of data and it may prove a headache to control and convert the data into useful information. Data extraction services enable you get data quickly and in the right format.

Few areas where Data Extraction can help you are:

    Capturing financial data
    Generating better sales leads
    Conducting market research, survey and analysis
    Conducting product research and analysis
    Track, extract and harvest product pricing data
    Searching for specific job postings
    Duplicating an online database
    Acquiring real estate data
    Processing auction information
    Searching online newspapers for latest pricing information
    Extracting and summarize news stories from online news sources

Outsourcing companies provide custom made data extraction services to the client's requirements. The different types of data extraction services;

    Web extraction
    Database extraction

Outsourcing is the beneficial option for large organizations seeking to manage large information. Outsourcing this services helps businesses in managing their data effectively, which in turn enables business to experience an increase in profits. By outsourcing, you can certainly increase your competitive edge and save costs too!


Source: http://ezinearticles.com/?Data-Extraction-Services-For-Better-Outputs-in-Your-Business&id=2760257

Thursday, 13 June 2013

Outsource Your Work To Data Entry Services To Convert Your Paperwork To An Electronic Format

Among the many services that are outsourced, data entry services are much in demand. While the job profile might seem simple it does in fact require a certain degree of exactness and an eye for detail. Maintaining and handling the client confidentiality is also very important. Data needs to be processed and the first step is always entering the information in the system. An operator needs to be careful while entering information in the system as often this data is used to collate data and for statistical reports and is also the foundation for all the information on the company. These services include much more than just basic information in this technology driven age. An operator today has projects that require Image entry, card Entry, legal document's entry, medical claim entry, entry for online survey forms, online indexing, copying, pasting and sorting of data etc.

A Data entry operator is competent at handling online as well as offline data and even to excel. Specialized services like Image editing, image clipping and cropping services are also available with this service. BPO companies offer these services at very cost effective rates and the work is processed 24x7 ensuring that the work is constantly auctioned. Many data sensitive projects are also completed even in a 24 hour. There are many online services to choose from and each specializes in various features with ample industry experience. These services use the latest technology to ensure that paperwork is processed in the shorted possible time and is converted into electronic data that is easier to store.

A professional service must be able to offer the following features like data conversion and even storage, effective management of databases and an adherence to turnaround times, 100% accuracy of the data entered, 24x7 webs and phone support, a secure and accurate data capture, data extraction and data processing and importantly a cost effective solution for quality data services. A professional company will also ensure that there is a Quality Assurance department monitoring the quality of the work being handled with relevant feedback to both the client and to the operator.

Before deciding on outsourcing your work to a data entry service ensures that the company is known for its reliability and quality. A company that offers data backup is also a good option as it will take care of all the paperwork while forwarding the converted electronic data back. This paperwork could be extracted in the case of a claim or any legal requirement. There are many BPO companies online advertising their services, browse through their features and find one that suits your requirements.

The writer is a Data entry service provider who specializes as data entry operator. Inquire for a free quote for data entry services. If you want services as data entry operators or data entry for your organizations. We are able to provide data entry services at affordable low cost.



Source: http://ezinearticles.com/?Outsource-Your-Work-To-Data-Entry-Services-To-Convert-Your-Paperwork-To-An-Electronic-Format&id=7270797

Tuesday, 11 June 2013

Business Intelligence Data Mining

Data mining can be technically defined as the automated extraction of hidden information from large databases for predictive analysis. In other words, it is the retrieval of useful information from large masses of data, which is also presented in an analyzed form for specific decision-making.

Data mining requires the use of mathematical algorithms and statistical techniques integrated with software tools. The final product is an easy-to-use software package that can be used even by non-mathematicians to effectively analyze the data they have. Data Mining is used in several applications like market research, consumer behavior, direct marketing, bioinformatics, genetics, text analysis, fraud detection, web site personalization, e-commerce, healthcare, customer relationship management, financial services and telecommunications.

Business intelligence data mining is used in market research, industry research, and for competitor analysis. It has applications in major industries like direct marketing, e-commerce, customer relationship management, healthcare, the oil and gas industry, scientific tests, genetics, telecommunications, financial services and utilities. BI uses various technologies like data mining, scorecarding, data warehouses, text mining, decision support systems, executive information systems, management information systems and geographic information systems for analyzing useful information for business decision making.

Business intelligence is a broader arena of decision-making that uses data mining as one of the tools. In fact, the use of data mining in BI makes the data more relevant in application. There are several kinds of data mining: text mining, web mining, social networks data mining, relational databases, pictorial data mining, audio data mining and video data mining, that are all used in business intelligence applications.

Some data mining tools used in BI are: decision trees, information gain, probability, probability density functions, Gaussians, maximum likelihood estimation, Gaussian Baves classification, cross-validation, neural networks, instance-based learning /case-based/ memory-based/non-parametric, regression algorithms, Bayesian networks, Gaussian mixture models, K-means and hierarchical clustering, Markov models and so on.



Source: http://ezinearticles.com/?Business-Intelligence-Data-Mining&id=196648

Saturday, 8 June 2013

Unleash the Hidden Potential of Your Business Data With Data Mining and Extraction Services


Every business, small or large, is continuously amassing data about customers, employees and nearly every process in their business cycle. Although all management staff utilize data collected from their business as a basis for decision making in areas such as marketing, forecasting, planning and trouble-shooting, very often they are just barely scratching the surface. Manual data analysis is time-consuming and error-prone, and its limited functions result in the overlooking of valuable information that improve bottom-lines. Often, the sheer quantity of data prevents accurate and useful analysis by those without the necessary technology and experience. It is an unfortunate reality that much of this data goes to waste and companies often never realize that a valuable resource is being left untapped.

Automated data mining services allow your company to tap into the latent potential of large volumes of raw data and convert it into information that can be used in decision-making. While the use of the latest software makes data mining and data extraction fast and affordable, experienced professional data analysts are a key part of the data mining services offered by our company. Making the most of your data involves more than automatically generated reports from statistical software. It takes analysis and interpretation skills that can only be performed by experienced data analysis experts to ensure that your business databases are translated into information that you can easily comprehend and use in almost every aspect of your business.

Who Can Benefit From Data Mining Services?

If you are wondering what types of companies can benefit from data extraction services, the answer is virtually every type of business. This includes organizations dealing in customer service, sales and marketing, financial products, research and insurance.

How is Raw Data Converted to Useful Information?

There are several steps in data mining and extraction, but the most important thing for you as a business owner is to be assured that, throughout the process, the confidentiality of your data is our primary concern. Upon receiving your data, it is converted into the necessary format so that it can be entered into a data warehouse system. Next, it is compiled into a database, which is then sifted through by data mining experts to identify relevant data. Our trained and experienced staff then scan and analyze your data using a variety of methods to identify association or relationships between variables; clusters and classes, to identify correlations and groups within your data; and patterns, which allow trends to be identified and predictions to be made. Finally, the results are compiled in the form of written reports, visual data and spreadsheets, according to the needs of your business.

Our team of data mining, extraction and analyses experts have already helped a great number of businesses to tap into the potential of their raw data, with our speedy, cost-efficient and confidential services. Contact us today for more information on how our data mining and extraction services can help your business.


Source: http://ezinearticles.com/?Unleash-the-Hidden-Potential-of-Your-Business-Data-With-Data-Mining-and-Extraction-Services&id=4642076

Thursday, 6 June 2013

Beneficial Data Collection Services

Internet is becoming the biggest source for information gathering. Varieties of search engines are available over the World Wide Web which helps in searching any kind of information easily and quickly. Every business needs relevant data for their decision making for which market research plays a crucial role. One of the services booming very fast is the data collection services. This data mining service helps in gathering relevant data which is hugely needed for your business or personal use.

Traditionally, data collection has been done manually which is not very feasible in case of bulk data requirement. Although people still use manual copying and pasting of data from Web pages or download a complete Web site which is shear wastage of time and effort. Instead, a more reliable and convenient method is automated data collection technique. There is a web scraping techniques that crawls through thousands of web pages for the specified topic and simultaneously incorporates this information into a database, XML file, CSV file, or other custom format for future reference. Few of the most commonly used web data extraction processes are websites which provide you information about the competitor's pricing and featured data; spider is a government portal that helps in extracting the names of citizens for an investigation; websites which have variety of downloadable images.

Aside, there is a more sophisticated method of automated data collection service. Here, you can easily scrape the web site information on daily basis automatically. This method greatly helps you in discovering the latest market trends, customer behavior and the future trends. Few of the major examples of automated data collection solutions are price monitoring information; collection of data of various financial institutions on a daily basis; verification of different reports on a constant basis and use them for taking better and progressive business decisions.

While using these service make sure you use the right procedure. Like when you are retrieving data download it in a spreadsheet so that the analysts can do the comparison and analysis properly. This will also help in getting accurate results in a faster and more refined manner.


Source: http://ezinearticles.com/?Beneficial-Data-Collection-Services&id=5879822

Tuesday, 4 June 2013

Data scraping with YQL and jQuery

For a project that I’m currently working on I need a list of all the US National Parks in XML format. Google didn’t come up with anything so I decided that I would need to somehow grab the data from this list on Wikipedia. The problem is that the list is in messy HTML but I want some nice clean XML ready for parsing with E4X in Flash.

There are a number of ways I could parse the data. If I knew Ruby and had an environment set up I’d probably use hpricot. Or I could get my hands dirty again with PHP and DOMDocument. Or if the Wikipedia page was XML or could be converted into XML easily then I could use an XSL transform. Or I’m sure there are hundreds of other approaches… But in this case I just wanted to very quickly and easily write a script which would grab and translate the data so I could get on with the rest of the project.

That’s when I thought of using jQuery to parse the data – it is the perfect tool for navigating a HTML DOM and extracting information from it. So I wrote a script which would use AJAX to load the page from Wikipedia. And that’s where I hit the first hurdle: “Access to restricted URI denied” – you can’t make crossdomain AJAX calls because of security restrictions in the browser :(
At this point I had at least a couple of ways to proceed with my jQuery approach:

    Copy the HTML file from Wikipedia to my server thus avoiding the cross domain issues.
    Write a quick serverside redirect script to live on my server and grab the page from Wikipedia and echo it back out.

I didn’t like the idea of either of those options but luckily at this point I remembered reading about YQL:

    The YQL platform provides a single endpoint service that enables developers to query, filter and combine data across Yahoo! and beyond.

After a quick flick through the documentation and some testing in the YQL Console I put together a script which would grab the relevant page from Wikipedia and convert it into a JSONP call which allows us to get around the cross-domain AJAX issues. As an added extra it’s really easy to add some XPath to your YQL so I’m grabbing only the relevant table from the Wikipedia document which cuts down on the complexity of my javascript. Here’s what I ended up with:
SELECT * FROM html WHERE url="http://en.wikipedia.org/wiki/List_of_United_States_National_Parks_by_state" AND xpath="//table[@class='wikitable sortable']"

If you run this code in the console you’ll see that it grabs the relevant table from wikipedia and returns it as XML or JSON. From here it’s easy to make the AJAX call from jQuery and then loop over the JSON returned creating an XML document. If you are interested in the details of that you can check out the complete example.

I was really impressed with how easy it was to quickly figure out YQL and I think it’s a really useful service. Even if you just use it to convert a HTML page to a valid XML document then it is still invaluable for all sorts of screen scraping purposes (it’s always much easier to parse XML than HTML tag soup). One improvement I’d love to see the addition of a CSS style selection engine as well as the XPath one. And the documentation could maybe be clearer (I figured out the above script by checking examples on other blogs rather than by reading the docs). But overall I give Yahoo! a big thumbs up for YQL and look forward to using it again soon…


Source: http://www.kelvinluck.com/2009/02/data-scraping-with-yql-and-jquery/