Imagine you peeking at a neighbour when he's making love to his wife. And at the same time, you videotape the process. And then you offer your other neighbour to buy the video or you upload it to YouTube. That's about what parsing looks like.
But you are peeking at the data on the site. And in the case with social networks, over personal data and "intimate contact" of the network with users.
If services, sites, social networks or programs want to share data, they create an API and open access to it. This is an interface where programs can interact with each other. For example, airline ticket aggregators collect data from sites that sell these tickets. By mutual agreement and benefit.
The most ardent parser struggle is LinkedIn. It sued the anonymous parsers, accusing them of fraud, abuse, violation of the criminal code, copyright law, trespassing, and even theft.
The parsers have indirect victims, apart from the platforms itself. For example, bloggers or accounts whose subscribers you want to parry.
The network spends money to attract users, to maintain capacity, to pay employees. A blogger spends money on converting users to subscribers. When you "steal" subscriber database, it is not just a set of data, it is money stolen spent for every person in that database.
What then can I do? To collect data from your own website for further use, under an agreement about personal data.
For example, when you want to filter out your audience, upload only those who recently got married, celebrates their birthday this month, published posts with keywords ... and used for retargeting. But there's a Facebook pixel for that.
How do you check if the site allows parsing?
Enter url/robots.txt in your browser line. You will see a file with the permissions and bans for scanning and indexing the site automatically. You can collect data from the site if you get written permission from the site owners.
To some extent, monitoring the media and social networks for mentioning your name or brand name is also parsed. But it is best to collect such data through monitoring services, that have received all necessary permissions from the resources where the information was published.